Steven's Knowledge

Self-Reflection & Verification

Making models check their own work — patterns that catch mistakes before users do

The most reliable way to improve model output isn't generating better answers — it's catching bad ones. Self-reflection is the family of techniques where the model (or a second model) reviews, critiques, and revises its own output before it's finalized. It sounds like a hack, but it works because verification is fundamentally easier than generation.

Why Verification Is Easier Than Generation

A model asked "write a function that sorts by frequency" might produce buggy code. But the same model asked "does this function correctly sort by frequency?" will often spot the bug. This asymmetry is well-established:

  • Checking is constrained. The verifier has a concrete artifact to evaluate, not an open-ended generation task.
  • The failure modes are different. Generation can go wrong in infinite ways; verification mostly needs to check a finite set of properties.
  • Models are better critics than creators for structured tasks — they can evaluate against criteria more reliably than they can satisfy those criteria from scratch.

This is the foundation of every self-reflection pattern: separate generation from verification.

Self-Critique

The simplest pattern: generate an answer, then ask the same model to critique it.

  1. Generate a response to the original task.
  2. Prompt: "Review your answer. What errors or weaknesses does it have?"
  3. If issues are found, revise.

What works:

  • Catches obvious errors — off-by-one, missing edge cases, logical inconsistencies.
  • Cheap to implement — just two calls.
  • The critique itself is useful signal, even if you don't auto-revise.

What doesn't work:

  • The model has the same blind spots as generator and critic. If it doesn't know something is wrong, it won't catch it.
  • Sycophantic self-review — the model says "this looks great!" when it isn't. Use a direct prompt: "List specific errors" rather than "is this correct?"

Self-Verification

More structured than self-critique. Instead of open-ended review, check specific properties:

  • Unit-test style checks: "Does this SQL query return results for edge case X?"
  • Constraint checking: "Does this plan satisfy all the requirements listed above?"
  • Consistency checking: "Does this summary contradict any facts in the source document?"

Self-verification works best when you can decompose "is this correct?" into concrete, checkable questions. The more specific the check, the more reliable the result.

Iterative Refinement

Run the generate-critique-revise loop multiple times:

  1. Generate v1.
  2. Critique v1 → list of issues.
  3. Revise, producing v2.
  4. Critique v2 → list of remaining issues.
  5. Revise, producing v3.
  6. Stop when no issues found or max iterations reached.

Practical limits:

  • Diminishing returns after 2-3 iterations for most tasks.
  • Each iteration costs tokens and latency.
  • The model can start "thrashing" — fixing one thing breaks another. Set a max iteration count.
  • Track whether each iteration actually improves things. If v3 isn't better than v2, stop.

The Reflexion Pattern

Reflexion (Shinn et al., 2023) formalizes self-reflection into a persistent loop:

  1. Act — attempt the task.
  2. Evaluate — check the result against success criteria (tests pass? output matches spec?).
  3. Reflect — generate a natural language reflection on what went wrong.
  4. Store the reflection in memory.
  5. Retry with the reflection available as context.

The key insight is storing reflections as memory. The model doesn't just retry — it retries with knowledge of its past mistakes. This is especially powerful for agent loops where the same failure can recur.

LLM-as-Judge for Self-Check

Use a separate model call (or a different model entirely) as the verifier:

  • Same model, separate call — different context means different failure modes. Better than inline self-review.
  • Different model — e.g., use a fast model to generate, a reasoning model to verify. The verifier's strengths complement the generator's.
  • Specialized judge prompt — give the judge a rubric, not just "is this good?" Good rubrics check specific dimensions: accuracy, completeness, format, safety.

Building effective judge prompts:

  1. Define the evaluation dimensions explicitly.
  2. Use a scoring scale (1-5 works well) with anchored descriptions for each level.
  3. Ask for dimension-by-dimension scores before an overall score.
  4. Include examples of good and bad outputs in the judge prompt.

Verification as a Separate Step

The highest-impact pattern in production systems is making verification an explicit, separate step in your pipeline — not something the model does inline:

  • Code generation → run tests, linter, type checker.
  • SQL generation → execute against a sandboxed database.
  • Factual claims → check against a knowledge base or search results.
  • Structured output → validate against a JSON schema.

External verification is strictly more reliable than self-verification. Use self-reflection as a complement, not a replacement, for real validation.

When to Use Self-Reflection

Use it when:

  • The task is high-stakes and errors are costly.
  • External verification isn't available or is expensive.
  • The task has clear correctness criteria the model can check against.
  • You're building an agent loop where mistakes compound.

Skip it when:

  • The task is low-stakes — the cost of review exceeds the cost of occasional errors.
  • You have reliable external validators — just use those.
  • Latency is critical — reflection doubles (or triples) your response time.
  • The model lacks the knowledge to verify — it'll just rubber-stamp its own mistakes.

On this page