Self-Reflection & Verification
Making models check their own work — patterns that catch mistakes before users do
The most reliable way to improve model output isn't generating better answers — it's catching bad ones. Self-reflection is the family of techniques where the model (or a second model) reviews, critiques, and revises its own output before it's finalized. It sounds like a hack, but it works because verification is fundamentally easier than generation.
Why Verification Is Easier Than Generation
A model asked "write a function that sorts by frequency" might produce buggy code. But the same model asked "does this function correctly sort by frequency?" will often spot the bug. This asymmetry is well-established:
- Checking is constrained. The verifier has a concrete artifact to evaluate, not an open-ended generation task.
- The failure modes are different. Generation can go wrong in infinite ways; verification mostly needs to check a finite set of properties.
- Models are better critics than creators for structured tasks — they can evaluate against criteria more reliably than they can satisfy those criteria from scratch.
This is the foundation of every self-reflection pattern: separate generation from verification.
Self-Critique
The simplest pattern: generate an answer, then ask the same model to critique it.
- Generate a response to the original task.
- Prompt: "Review your answer. What errors or weaknesses does it have?"
- If issues are found, revise.
What works:
- Catches obvious errors — off-by-one, missing edge cases, logical inconsistencies.
- Cheap to implement — just two calls.
- The critique itself is useful signal, even if you don't auto-revise.
What doesn't work:
- The model has the same blind spots as generator and critic. If it doesn't know something is wrong, it won't catch it.
- Sycophantic self-review — the model says "this looks great!" when it isn't. Use a direct prompt: "List specific errors" rather than "is this correct?"
Self-Verification
More structured than self-critique. Instead of open-ended review, check specific properties:
- Unit-test style checks: "Does this SQL query return results for edge case X?"
- Constraint checking: "Does this plan satisfy all the requirements listed above?"
- Consistency checking: "Does this summary contradict any facts in the source document?"
Self-verification works best when you can decompose "is this correct?" into concrete, checkable questions. The more specific the check, the more reliable the result.
Iterative Refinement
Run the generate-critique-revise loop multiple times:
- Generate v1.
- Critique v1 → list of issues.
- Revise, producing v2.
- Critique v2 → list of remaining issues.
- Revise, producing v3.
- Stop when no issues found or max iterations reached.
Practical limits:
- Diminishing returns after 2-3 iterations for most tasks.
- Each iteration costs tokens and latency.
- The model can start "thrashing" — fixing one thing breaks another. Set a max iteration count.
- Track whether each iteration actually improves things. If v3 isn't better than v2, stop.
The Reflexion Pattern
Reflexion (Shinn et al., 2023) formalizes self-reflection into a persistent loop:
- Act — attempt the task.
- Evaluate — check the result against success criteria (tests pass? output matches spec?).
- Reflect — generate a natural language reflection on what went wrong.
- Store the reflection in memory.
- Retry with the reflection available as context.
The key insight is storing reflections as memory. The model doesn't just retry — it retries with knowledge of its past mistakes. This is especially powerful for agent loops where the same failure can recur.
LLM-as-Judge for Self-Check
Use a separate model call (or a different model entirely) as the verifier:
- Same model, separate call — different context means different failure modes. Better than inline self-review.
- Different model — e.g., use a fast model to generate, a reasoning model to verify. The verifier's strengths complement the generator's.
- Specialized judge prompt — give the judge a rubric, not just "is this good?" Good rubrics check specific dimensions: accuracy, completeness, format, safety.
Building effective judge prompts:
- Define the evaluation dimensions explicitly.
- Use a scoring scale (1-5 works well) with anchored descriptions for each level.
- Ask for dimension-by-dimension scores before an overall score.
- Include examples of good and bad outputs in the judge prompt.
Verification as a Separate Step
The highest-impact pattern in production systems is making verification an explicit, separate step in your pipeline — not something the model does inline:
- Code generation → run tests, linter, type checker.
- SQL generation → execute against a sandboxed database.
- Factual claims → check against a knowledge base or search results.
- Structured output → validate against a JSON schema.
External verification is strictly more reliable than self-verification. Use self-reflection as a complement, not a replacement, for real validation.
When to Use Self-Reflection
Use it when:
- The task is high-stakes and errors are costly.
- External verification isn't available or is expensive.
- The task has clear correctness criteria the model can check against.
- You're building an agent loop where mistakes compound.
Skip it when:
- The task is low-stakes — the cost of review exceeds the cost of occasional errors.
- You have reliable external validators — just use those.
- Latency is critical — reflection doubles (or triples) your response time.
- The model lacks the knowledge to verify — it'll just rubber-stamp its own mistakes.