Testable Code
Writing code that is easy to test, and tests that are easy to maintain
Testable Code
Testability is a design property. Code is not made testable by adding tests; tests are made possible by code that has been written with tests in mind.
The good news is that testable code and well-designed code overlap almost completely. The properties that make a function easy to test — clear inputs, clear outputs, no hidden dependencies — are the same properties that make it easy to read, easy to change, and safe to refactor.
Why Tests Matter for Code Craft
Tests serve more than verification:
- They define the contract. A reader who wants to know what a function does can read the tests.
- They are a safety net. Refactoring without tests is a guess; with them, it is a discipline.
- They drive design. Code that is hard to test is usually hard to use; the difficulty surfaces early.
- They catch regressions. A bug that escapes once is preventable; a bug that escapes twice is the same bug.
Code without tests is code whose behavior is whatever happens to run today.
What Makes Code Testable
Pure functions are trivially testable
A function whose output depends only on its inputs, and which has no side effects, is the easiest possible thing to test:
add(2, 3) === 5 // doneAim for a functional core: domain logic written as pure functions, with the impure parts (I/O, time, randomness, persistence) at the edges.
Inject dependencies
A class that constructs its dependencies internally is welded to them; a class that accepts them is composable.
// Hard to test — the database is hard-wired
class OrderService {
constructor() {
this.db = new PostgresDatabase(/* env config */);
}
}
// Easy to test — pass anything that satisfies the interface
class OrderService {
constructor(private db: Database) {}
}
// In production
new OrderService(realDb)
// In tests
new OrderService(fakeDb)The pattern is Dependency Injection; the technique is to make dependencies parameters of the constructor (or function), not internals it creates.
Push side effects to the edges
Functions that read from databases, write to disk, or call external services are harder to test than functions that operate on values. Restructure so that:
- Domain logic takes inputs and returns outputs.
- A thin orchestration layer reads inputs, calls the domain logic, writes outputs.
The orchestration layer has integration tests; the domain logic has fast, deterministic unit tests. Most of the codebase is the latter.
Avoid hidden inputs
A function whose behavior depends on the current time, a random number, an environment variable, or a global cache has hidden inputs. Tests cannot control them, so the tests are flaky or brittle.
Make the inputs explicit:
// Hidden time dependence — tests are flaky around midnight
function isExpired(token) {
return token.expiresAt < new Date();
}
// Time as a parameter — tests pass any moment they want
function isExpired(token, now) {
return token.expiresAt < now;
}The same idea applies to randomness (inject a random source), the file system (inject a file-system interface), and external services (inject a client).
FIRST — Properties of Good Tests
Robert C. Martin's mnemonic from Clean Code:
Fast
A test suite that runs in seconds is run constantly; one that takes ten minutes is run rarely. Slow tests change the development loop from "edit, run, fix" to "edit, push, wait, fix."
The bulk of the suite should be fast unit tests. Slow tests (integration, end-to-end) earn their place when they catch what unit tests cannot, and are kept out of the inner loop.
Independent
Tests should not depend on each other. Test A passing should not be a precondition for test B; test C failing should not destabilize test D. Independence requires:
- No shared mutable state between tests.
- Each test sets up its own fixtures.
- Test order does not matter.
Frameworks help — beforeEach blocks, isolated databases per test, in-memory fakes — but the discipline is the developer's.
Repeatable
A test should produce the same result every time, on every machine, with no environmental dependencies. Repeatability requires:
- No dependence on wall-clock time.
- No dependence on random sources unseeded.
- No dependence on the network, the file system, or other shared resources unless the test owns them.
Flaky tests are worse than no tests: they erode trust in the suite, and developers learn to ignore failures.
Self-Validating
A test passes or fails — automatically, without human inspection. A test that prints output for the developer to read is not a test; it is a debugging aid. The pass/fail outcome must be in the assertion.
Timely
Tests are written close to the code they cover, ideally before. Test-driven development (TDD) is the strongest form: write the test, watch it fail, write the minimum code to make it pass, refactor. Even without strict TDD, writing tests during the same session as the code prevents the gap that "I'll add tests later" never closes.
What to Test
The behavior, not the implementation
Tests that assert on internal calls or private methods break every time the implementation changes, even when the behavior is unchanged. A test should ask "given this input, did the system produce the right output or perform the right observable action?" — not "did it call this internal method?"
The result is tests that survive refactoring, which is precisely when tests are most useful.
Branches, edges, and invariants
Cover:
- The happy path.
- Error paths and rejected inputs.
- Boundary conditions (empty, single element, exact-size, off-by-one).
- Invariants (sums match, references are intact).
Coverage metrics are useful as a floor, not a ceiling — 100% line coverage with weak assertions tests nothing.
Property-based tests for invariants
When a function has an invariant that holds for all inputs of a certain shape — sorting produces an ordered output; serialize-then-deserialize is the identity; merging is commutative — a property-based test (Hypothesis, fast-check, QuickCheck) generates random inputs and verifies the invariant. These tests find edge cases that example-based tests miss.
Test Structure
Arrange / Act / Assert
The common skeleton:
test('withdraws the requested amount', () => {
// Arrange
const account = new Account(100);
// Act
account.withdraw(40);
// Assert
expect(account.balance).toBe(60);
});Three sections, in order, with the most code typically in Arrange. When Arrange is large, the unit under test has too many collaborators or too much setup — usually a design issue worth fixing.
One assertion per concept
A test should fail for one reason. A single test that asserts ten things makes the failure message ambiguous and the test brittle. When a behavior naturally produces several observable effects, assert them in one test (one concept); when independent behaviors are being verified, separate tests are clearer.
Test names describe behavior
A test name should read as a sentence about what the system does:
withdraw_returnsErrorWhenInsufficientFunds
withdraw_decrementsBalanceOnSuccess
withdraw_emitsAccountUpdatedEventBetter than test1, testWithdraw, or testCase42. The name is a free piece of documentation.
Test Doubles
Stubs return canned answers
A stub stands in for a collaborator and returns whatever the test wants:
const userRepo = { findById: () => Promise.resolve({ id: 1, name: 'X' }) };Useful when the test needs to control inputs to the unit under test.
Fakes have working implementations
A fake is a real implementation that is unsuitable for production but appropriate for tests — an in-memory database, an in-memory queue. Fakes give the test realistic behavior without the cost or fragility of the real thing.
Fakes are usually the best choice when you have to interact with infrastructure in tests; they exercise more of the real interaction than stubs do.
Mocks verify interactions
A mock asserts that specific calls were made:
expect(emailService.send).toHaveBeenCalledWith(/* ... */);Mocks are useful for verifying commands (calls with side effects). They are easy to overuse: a test full of mock assertions is testing the implementation, not the behavior, and breaks under any refactor.
The rule of thumb: prefer fakes and stubs; reserve mocks for the few places where the side effect is the contract.
Spies record calls
A spy is a passive observer — it records what was called without changing behavior. Useful when you want to verify a side effect occurred without dictating exactly how.
Tests as a Design Tool
When tests are hard to write, the code is telling you something:
- A function takes too many parameters. The function probably has too many responsibilities.
- A class needs many collaborators set up. The class is too coupled.
- Tests need to mock many things. The unit is reaching across too many boundaries.
- Tests break on every internal change. The tests are reaching past the public surface.
Listen to these signals. The remedy is usually a design change in the production code, not a more elaborate test setup.
Fixtures and Test Data
- Build fixtures with named constructors.
validUser({ email: 'x@y' })reads better than thirty lines of object literals; the override pattern lets each test specify only what it cares about. - Avoid shared mutable fixtures. A fixture mutated by one test contaminates the next. Build per-test, or freeze.
- Test the behaviors, not the data. Tests asserting on hardcoded values that have nothing to do with the behavior break for irrelevant reasons.
When Not to Test
Not every line deserves a test. Diminishing returns set in for:
- Generated code.
- Trivial property accessors and pure delegation.
- Code that the type system already guarantees.
- Glue code whose only job is to forward to a tested layer (covered by integration tests instead).
Use unit tests where they have leverage; use integration tests for the seams; use end-to-end tests sparingly for critical user-visible flows.
Pre-Commit Checklist
- Are the unit's dependencies injected, not constructed internally?
- Do tests run fast, independently, repeatably, and assert their results automatically?
- Do tests verify behavior, not implementation details?
- Are edge cases (empty, boundary, error) covered?
- Do test names describe the behavior under test?
- Does each test assert one concept?
- When a test was hard to write, did you investigate whether the production code has a design problem?