Prevent Recurrence
Solve the system that produced the problem, not just the problem itself
The Pattern
A bug appears in production. Someone fixes it. Two weeks later, a similar bug appears. Someone fixes that one too. This cycle repeats until it becomes "normal" — the team accepts a baseline level of recurring problems as the cost of doing business.
It doesn't have to be this way. Every recurring problem is a signal that something in the system is broken. Fixing the symptom is necessary. Fixing the cause is what separates improving teams from stagnant ones.
The Five Whys (and When to Stop)
When a problem recurs, ask "why" until you reach a systemic cause:
- Why did the deployment fail? — A config value was wrong.
- Why was the config value wrong? — It was manually edited and someone made a typo.
- Why is config manually edited? — We don't have a config validation step.
- Why don't we have config validation? — Nobody built it — it wasn't prioritized.
- Why wasn't it prioritized? — We only address config issues reactively.
The fix at level 1 is correcting the typo. The fix at level 4 or 5 is building config validation and adding it to the deployment pipeline. The second fix prevents the entire class of problems from recurring.
How to Build a Prevention Culture
After every incident, ask two questions
- What do we do right now to fix this? — The immediate response.
- What do we change so this can never happen again? — The systemic fix.
Both are required. The first without the second guarantees recurrence. The second without the first is irresponsible.
Categorize your fixes
| Fix Type | Example | Recurrence Risk |
|---|---|---|
| Patch | Fix the broken config value | High — same class of error will happen again |
| Guard | Add a validation check before deployment | Medium — catches this error type, but relies on the guard working |
| Eliminate | Auto-generate config from source of truth | Low — the manual step that caused the error no longer exists |
Always aim for the highest level of fix that's practical. Patches are for emergencies. Guards are for common errors. Elimination is for problems you never want to see again.
Track recurrence explicitly
- When a bug is filed, check: "Have we seen this before?" If yes, escalate it to a systemic fix.
- Maintain a "recurring issues" tag or label. If the same tag appears three times, it becomes a priority.
- In retrospectives, specifically ask: "What problems did we see this sprint that we've seen before?"
Common Recurring Problems and Systemic Fixes
| Recurring Problem | Systemic Fix |
|---|---|
| Same type of bug in production | Add automated tests or linting rules targeting that bug class |
| Miscommunication about requirements | Introduce a requirements checklist or sign-off step |
| Knowledge lost when people leave | Build documentation into the Definition of Done |
| Same questions from new hires | Improve onboarding docs based on actual questions asked |
| Deployment failures | Automate deployment steps that are error-prone when done manually |
The Mindset
Tolerating recurring problems is tolerating waste. Every time the team fixes the same issue twice, it spends time that could have been invested in building something new. Prevention isn't extra work — it's an investment that pays back every time the problem doesn't happen.
The goal is simple: no problem should surprise the team twice.