Chain-of-Thought Prompting
Zero-shot, few-shot, and structured reasoning techniques that make models think before they answer
Telling a model to "think step by step" before answering was one of the most impactful discoveries in prompt engineering. It works because autoregressive models commit to each token as they generate it — if the first token of the answer is wrong, the rest follows that mistake. Chain of thought gives the model scratch space to reason before committing to a final answer.
Zero-Shot Chain of Thought
The simplest version: append "Let's think step by step" (or similar) to your prompt. No examples needed.
- Works surprisingly well on math, logic, and multi-step reasoning tasks.
- Costs almost nothing to implement — just a suffix.
- The quality of the reasoning varies. The model sometimes generates plausible-looking but wrong chains.
When to use it: quick wins on tasks where you notice the model rushing to an answer. Don't overthink the exact phrasing — "Think through this carefully" works about as well as the canonical phrase.
Few-Shot Chain of Thought
Provide 2-5 examples that demonstrate the reasoning pattern you want, complete with worked-out intermediate steps.
- Pick examples that cover the range of difficulty in your task.
- Write out the reasoning steps yourself — don't let the model generate them.
- Keep the chains concise. Verbose chains teach the model to be verbose, not to be correct.
Few-shot CoT is strictly better than zero-shot when you have good examples. The examples act as both format instructions and implicit rubrics.
Tree of Thought
Tree of Thought (ToT) generalizes CoT from a single chain to a branching search. The model generates multiple candidate next-steps, evaluates each, and prunes bad branches before continuing.
- Best for problems with many dead ends: puzzle solving, constrained planning, game playing.
- Expensive — you're running the model many times per problem.
- Implementation typically involves a controller loop that manages the tree, not just a prompt trick.
In practice, ToT is useful in research and high-value batch settings. For real-time production, the latency is usually unacceptable.
Graph of Thought
Graph of Thought (GoT) extends the tree into a DAG: reasoning steps can merge, reference earlier steps, and form cycles. The idea is that many real problems don't decompose into neat trees — sub-problems interact.
- Still mostly a research technique.
- The coordination overhead is significant.
- If you find yourself wanting GoT, consider whether a reasoning model with extended thinking would solve the problem more simply.
Self-Consistency
Instead of generating one chain, generate multiple chains (with temperature > 0) and take the majority answer. The insight: even if individual chains are unreliable, the correct answer tends to appear more often.
- Simple to implement — sample N times, extract the final answer from each, vote.
- Effective — consistently improves accuracy on math and logic benchmarks by 5-15% over single-chain CoT.
- Cost scales linearly with the number of samples. N=5 to N=10 is the practical sweet spot.
Self-consistency is one of the best "throw more compute at inference" techniques. Use it when accuracy matters more than latency.
When Structured Reasoning Helps
Chain of thought shines when:
- The task requires multiple steps that build on each other.
- The task has a verifiable answer — math, code, factual questions with ground truth.
- The model needs to consider constraints simultaneously.
- You want interpretable outputs — the chain is debuggable.
When It Hurts
Don't apply CoT blindly. It can actually degrade performance when:
- The task is simple — classification, sentiment, entity extraction. The model already knows the answer; forcing it to "think" adds noise.
- The task is creative or subjective — writing, brainstorming. Step-by-step reasoning produces stilted, over-structured output.
- The chain becomes confabulation — the model invents plausible reasoning that leads to a confident wrong answer. This is especially dangerous because the chain looks convincing.
- Latency matters — every reasoning token is a token you're paying for and waiting on.
Practical Tips
- Put the chain before the answer, not after. If the model generates the answer first, the chain becomes post-hoc rationalization.
- Parse the final answer separately from the chain. Don't trust that the model will format the answer consistently inside a chain.
- Consider hiding the chain from users. Show the final answer; log the chain for debugging.
- Monitor chain quality over time. Chains that are getting longer without getting more accurate signal a problem.
- Combine CoT with tool use. The best chains interleave reasoning with concrete actions: "I need to check X → [tool call] → the result is Y → therefore Z."