Coverage and Gates
What coverage actually measures, where it lies, and which gates are worth blocking a merge on
Coverage and Gates
Coverage is the most-quoted and least-understood number in testing. A team with 90% coverage and a team with 40% can both have a fragile suite, a comprehensive suite, or an actively misleading one. The number alone tells you nothing.
What coverage is useful for is asking the right question: "what code is provably not exercised by any test?" The answer to that question is genuinely actionable. The temptation is to flip it around — "this code is covered, therefore it works" — and that's the lie.
Gates are how coverage stops being decoration and starts changing behavior. The trick is picking gates that fail when something real has gone wrong, and not when someone deleted a dead branch.
What Coverage Measures
The standard kinds, ordered by how much they tell you:
| Kind | What it tracks | Catches |
|---|---|---|
| Line / statement | Each line executed at least once | Truly untested files; useful as a floor |
| Branch | Each if/else/switch arm taken | Conditions where only the happy path is tested |
| Function | Each function called at least once | Dead code, exported but unused |
| Path | Each combination of branches through a function | Combinatorial; rarely useful in practice |
| Condition | Each boolean sub-expression evaluated both ways | Missing cases in a && b style logic |
| Mutation | Tests detect deliberate small changes to code | Tests that exist but assert nothing |
Line coverage is what most tools report by default and what most numbers refer to. It is the weakest of the useful measures.
What no coverage measure captures:
- Whether the assertion actually checks the right thing.
- Whether the test would still pass if the implementation were wrong.
- Whether the test is a regression test for a real production bug.
- Whether the test exercises a path that matters to users.
This is why "100% coverage" can coexist with serious bugs: every line ran during the test, but nothing actually asserted that the line did the right thing.
Where Coverage Lies
Real failure modes that high coverage masks:
Tests with no assertions
test('processes order', () => {
service.processOrder(order);
// no expect()
});Counts as 100% line coverage. Asserts nothing. Passes even if processOrder throws — wait, no, it doesn't pass if it throws. But it passes if processOrder does the wrong thing silently.
Detection: mutation testing. Stryker, Mutmut, Pitest — they change one line of source at a time and re-run the suite. If no test fails, the line was "covered" but not actually tested.
Tests that exercise the wrong assertion
test('user is created with correct role', () => {
const user = createUser({ name: 'a' });
expect(user.name).toBe('a'); // tests the wrong thing
});100% coverage on createUser. The role assertion was never written. The test name lies.
Detection: code review, mutation testing.
Tests that assert against mock returns
test('saves user', () => {
db.save.mockResolvedValue({ id: 1 });
const result = service.create({ name: 'a' });
expect(result.id).toBe(1); // asserts the mock, not the code
});100% coverage. Asserts that the mock returned what the mock was told to return. Real bug: service.create could ignore its input and the test wouldn't notice.
Coverage of generated / vendored code
A repo with a large generated client (gRPC stubs, OpenAPI clients) shows artificially high coverage because the generator emits trivially-executed code. Strip generated code from the report or the number becomes vanity.
What Coverage Is Genuinely Good For
Despite the limitations, three uses are real:
- Finding untested files. A new module with 0% coverage is a clear signal. A team can act on "here's a file no test touches."
- Spotting branches no one tested. Branch coverage on critical logic (auth, money, state machines) tells you which
elsearms are silent. - Catching regression in coverage. A PR that drops coverage by 5% is a question worth asking: which tests went away, or which new code came in without tests?
The principle: coverage tells you what to look at, not whether the code works.
Which Gates Earn Their Block
A coverage gate that blocks merges should have a clear, low-false-positive rule. Common gates, ranked by usefulness:
Diff coverage (recommended)
"New and changed lines in this PR must have ≥ N% coverage." Tools: diff_cover, codecov patch coverage, coverage-py reports.
- Why it works: developers can act on it (write tests for what they changed). Doesn't punish legacy code. Doesn't create incentive to game total coverage by deleting tests of untested files.
- Typical threshold: 70–80% for new lines.
Coverage regression on critical files
"Files in src/auth/ or src/billing/ may not drop more than N% between PR and base." Targets the directories where regression is most expensive.
- Works when the team has a coherent list of critical paths.
- Fails when the list isn't maintained.
Absolute floor per file
"Every file must have ≥ N% coverage to merge." Punishes new untested files.
- Works as a soft signal.
- Becomes painful for files where the coverage tool reports incorrectly (heavily generated, mostly types, etc.).
Total project coverage (use cautiously)
"Total project coverage must be ≥ N%." The number everyone quotes.
- Easy to compute, easy to game.
- Gameable by deleting tests of low-coverage code, or by writing tests that hit a lot of lines but assert little.
- Use as a report, not a gate, unless the team is mature enough to defend the number.
Mutation score (advanced)
"Mutation score (proportion of mutants killed by the suite) must be ≥ N%." Far more meaningful than line coverage; far more expensive to compute.
- Run weekly, not per-PR.
- Surface mutants that survive as a backlog.
Anti-Patterns
Demanding 100% coverage. Forces tests on trivial code (getters, constructors, error paths the team will never see). Trains developers to write tests that exist for the metric, not for the bug.
Coverage gate set above the current number. Coverage is 67%; gate is at 80%. Every PR is blocked. Either the gate gets lowered (no signal) or developers add coverage-padding tests (negative signal).
Single number across mixed code. UI rendering code, business logic, and infrastructure adapters all have different reasonable coverage levels. A single floor punishes the wrong things.
No exclusion list. Generated code, vendored dependencies, test helpers, simple DTOs — all included. Number is meaningless.
Coverage report behind a login. If a developer can't see "what changed in coverage in my PR" from the PR page itself, the gate is friction without information.
Gating on file coverage but not branch coverage. A file that runs every line but only the happy path passes. Branch coverage catches the missing else.
Practical Setup
A reasonable default for most teams:
- Track total coverage in a dashboard, not as a gate. Watch the trend.
- Gate on diff coverage at 70% for new/changed lines.
- Exclude generated code, types-only files, test helpers.
- Surface branch coverage alongside line coverage in PR comments.
- Run mutation testing nightly on critical packages; track mutation score over time.
- When a critical file drops in coverage, post on the PR. Don't block — explain.
This setup catches the most common regressions without turning coverage into a hazing ritual.
Pre-merge Checklist
Before declaring the coverage strategy done:
- Does the team know what coverage can't tell them?
- Is the gate something a developer can act on within the PR?
- Are generated and vendored files excluded from the report?
- Is branch coverage visible, not just line?
- When coverage drops, is the cause investigated, or is the threshold quietly lowered?
- Is there a path from "this file has low coverage" to "here's a test we wrote"?
If the answer to the last one is "we just check the box," the gate is theater.