Pipeline Shape
Pre-commit, PR, merge queue, nightly, release — deciding what runs when, and why each stage has its own job
Pipeline Shape
A test pipeline isn't one thing. It's a sequence of stages, each with a different budget, blocking policy, and scope of failure tolerance. Treating the pipeline as one big "run tests on push" job is how teams end up with a 40-minute PR check that catches the same flake on the fifth commit of the day.
The shape question is: at which point does each kind of test earn its keep?
The Stages
A mature pipeline usually has five layers. Not every team needs all of them, but the order is the same:
| Stage | Trigger | Budget | Blocks? | Goal |
|---|---|---|---|---|
| Pre-commit | Editor save / git commit | < 5s | Local only | Catch the obvious before the dev even pushes |
| PR check | Push to PR branch | < 10m | Yes | Prove this diff is mergeable |
| Merge queue | Just before merging to main | < 15m | Yes | Prove this diff plus current main is mergeable |
| Post-merge / main | After merge to main | < 30m | No (alerts) | Detect what slipped past PR checks |
| Nightly / release | Scheduled or pre-release | Hours OK | No (alerts) | Long, expensive, low-signal-per-minute coverage |
The principle: the closer to the keystroke, the smaller and faster. The closer to production, the broader and slower.
Pre-commit
Runs on the developer's machine, before code leaves it. Right tool for the things you can prove without a network round-trip:
- Formatters and linters (auto-fix where possible).
- Type checks (incremental).
- Affected unit tests for the changed files.
- Secret scanners.
What does not belong in pre-commit:
- The full unit suite. A 90-second pre-commit hook just trains people to use
--no-verify. - Anything requiring a database, container, or network.
- E2E. Ever.
A pre-commit hook that takes more than ~5 seconds is a hook that gets bypassed. If you can't keep it under that, move it to a pre-push hook instead — same idea, less blocking.
PR Check
This is the gate. It runs on every push to a PR branch and is the primary signal a reviewer trusts. Its job is: "Is this diff, as written, safe to merge?"
A typical PR check on a healthy codebase:
- Lint + type check (full repo, not just diff).
- Unit suite (full, parallelized).
- Integration suite (full or affected-only depending on size).
- Build / bundle.
- A handful of critical-path E2E (login, checkout, the top 3 user journeys — not the whole catalog).
- Security scanning on dependencies and the diff itself.
What this stage gets wrong, in order of frequency:
- Too long. Above 10–15 minutes, developers context-switch and the feedback loop is broken. The fix is parallelization and affected-test selection, not deleting tests.
- Too narrow. Only running tests for changed files misses cross-file regressions. Affected-test selection should be a superset of "files changed."
- Flaky. A 1% flake rate per test means a 100-test suite passes only ~37% of the time. See Flake Management.
- Unreproducible. "Works on my machine" failures kill trust. Pin Node/Python versions, OS image, browser versions.
Merge Queue
The problem merge queue solves: two PRs that each pass on their own, but conflict semantically when both land. PR A and PR B don't touch the same files, but A renames a function B calls. Both green; main breaks.
A merge queue serializes merges. Each PR is rebased onto current main, re-runs the full check, and only merges if green. If you skip this step, the post-merge stage exists to catch what the queue would have caught — at the cost of breaking everyone behind you.
When you need one:
- More than a few merges per day.
- More than one team committing to the same repo.
- Long PR checks where rebases happen frequently.
When you don't:
- Small team, low merge volume, fast CI: a clean rebase before merge is enough.
Post-merge
Runs on every commit to main. It is not a gate — main has already been written to. Its jobs:
- Catch what the PR check sampled around. Full test suite, including the bits skipped for speed in PR checks.
- Update artifacts. Coverage reports, dependency graphs, deployment images.
- Trigger downstream. Staging deploys, contract test publishing, doc rebuilds.
A red post-merge build is an incident, not a blocker — main is already broken. Treat it like one: notify, triage, revert or fix-forward fast. The longer main stays red, the more PRs queue up behind broken state.
Nightly and Release
For things that earn their slot only over hours of compute, not on every commit:
- Long-running E2E suites. Full catalog, not just smoke.
- Cross-browser / cross-device matrices.
- Performance regression suites. Benchmarks against a stable baseline.
- Soak tests. Long-duration runs that find leaks and unbounded growth.
- Mutation testing. Generates many small variants of code and runs the suite — too expensive for PR, useful weekly.
- Dependency audit deltas. New CVEs against the locked dependency set.
These never block a PR. They produce alerts that get triaged like any other production signal: who owns it, what's the SLA to fix.
Affected-Test Selection
The lever that keeps PR checks fast as the suite grows: don't run tests that can't possibly be affected by this diff.
Approaches, in order of accuracy:
- File path heuristics. Map
src/foo/*.ts→tests/foo/*.test.ts. Fast, naive, misses cross-module impact. - Static call-graph analysis. Build a dependency graph; run tests reachable from changed files. Used by Nx, Bazel, Turborepo. Misses dynamic dispatch and reflection.
- Coverage-based. Per-test coverage from the last clean run; re-run any test whose covered files changed. Most accurate, requires tooling investment.
Whichever you use, always run the full suite on a known cadence (post-merge or nightly). Affected selection is an optimization for the PR loop; it does not replace eventual full execution.
A common failure mode: the affected-test config drifts from reality (a test that should run for change X doesn't, because the graph missed a transitive dep). Symptom: regressions land that the PR check claimed were safe. Fix: full-suite cadence + alerting on coverage gaps.
Anti-Patterns
One big "test" job. Everything in a single 30-minute step. No parallelism, no early-exit on lint failures, one flaky E2E re-runs the whole thing.
No pre-commit, fat PR check. All friction lives in CI. Developers find out their code doesn't compile 8 minutes after they push.
Block on everything, alert on nothing. Every flake blocks merges. Real signal (post-merge regression, nightly perf delta) goes to a channel no one reads.
Skip on touch. "Don't run this test if these files didn't change" rules that are wrong, never re-audited, and silently grow until the suite covers nothing.
Re-run until green. CI policy that auto-retries failures without recording them. The flake rate goes invisible while flakes multiply.
Pre-merge Checklist
Before declaring the pipeline shape done:
- Is the PR check under 15 minutes for the median PR?
- Can a developer reproduce a failed PR check locally with one command?
- When the PR check fails, does the failure message tell the developer what to do next?
- Is there a known cadence at which the full suite runs against main, regardless of affected-test selection?
- When main goes red, who's paged, and is the SLA written down?
- Do nightly/release stages produce alerts owned by someone, or do they fail silently?
If any answer is "we should probably set that up," the pipeline shape isn't done.