Steven's Knowledge
Guiding Principles

Prevent Recurrence

Solve the system that produced the problem, not just the problem itself

The Pattern

A bug appears in production. Someone fixes it. Two weeks later, a similar bug appears. Someone fixes that one too. This cycle repeats until it becomes "normal" — the team accepts a baseline level of recurring problems as the cost of doing business.

It doesn't have to be this way. Every recurring problem is a signal that something in the system is broken. Fixing the symptom is necessary. Fixing the cause is what separates improving teams from stagnant ones.

The Five Whys (and When to Stop)

When a problem recurs, ask "why" until you reach a systemic cause:

  1. Why did the deployment fail? — A config value was wrong.
  2. Why was the config value wrong? — It was manually edited and someone made a typo.
  3. Why is config manually edited? — We don't have a config validation step.
  4. Why don't we have config validation? — Nobody built it — it wasn't prioritized.
  5. Why wasn't it prioritized? — We only address config issues reactively.

The fix at level 1 is correcting the typo. The fix at level 4 or 5 is building config validation and adding it to the deployment pipeline. The second fix prevents the entire class of problems from recurring.

How to Build a Prevention Culture

After every incident, ask two questions

  • What do we do right now to fix this? — The immediate response.
  • What do we change so this can never happen again? — The systemic fix.

Both are required. The first without the second guarantees recurrence. The second without the first is irresponsible.

Categorize your fixes

Fix TypeExampleRecurrence Risk
PatchFix the broken config valueHigh — same class of error will happen again
GuardAdd a validation check before deploymentMedium — catches this error type, but relies on the guard working
EliminateAuto-generate config from source of truthLow — the manual step that caused the error no longer exists

Always aim for the highest level of fix that's practical. Patches are for emergencies. Guards are for common errors. Elimination is for problems you never want to see again.

Track recurrence explicitly

  • When a bug is filed, check: "Have we seen this before?" If yes, escalate it to a systemic fix.
  • Maintain a "recurring issues" tag or label. If the same tag appears three times, it becomes a priority.
  • In retrospectives, specifically ask: "What problems did we see this sprint that we've seen before?"

Common Recurring Problems and Systemic Fixes

Recurring ProblemSystemic Fix
Same type of bug in productionAdd automated tests or linting rules targeting that bug class
Miscommunication about requirementsIntroduce a requirements checklist or sign-off step
Knowledge lost when people leaveBuild documentation into the Definition of Done
Same questions from new hiresImprove onboarding docs based on actual questions asked
Deployment failuresAutomate deployment steps that are error-prone when done manually

The Mindset

Tolerating recurring problems is tolerating waste. Every time the team fixes the same issue twice, it spends time that could have been invested in building something new. Prevention isn't extra work — it's an investment that pays back every time the problem doesn't happen.

The goal is simple: no problem should surprise the team twice.

On this page