Steven's Knowledge

Integration Testing

Testing components wired together — what to wire up, what to fake, and how to keep the integration layer fast enough to actually run

Integration Testing

Integration tests are where most production bugs that escape unit tests get caught. The bugs they find — wrong SQL, broken DTOs, misconfigured serializers, drifted schemas — are the ones unit tests structurally can't see, because units have been isolated from each other.

They're also the layer that goes wrong most quietly. Too much wired up, and the suite is slow, flaky, and impossible to reason about. Too little, and you've written a slightly larger unit test that proves nothing about integration.

The integration question is: what is the smallest assembly that exercises the seams I'm worried about?

For where integration tests fit in the larger picture, see Testing Strategy. This page is about the practice.

What "Integration" Means

The word covers a wide range:

VariantWhat's wired upWhat's faked
Narrow / sociable unitA class with its real internal collaboratorsAnything across an architectural boundary
Component testA whole module behind its public APIDatabase, network, time
Repository / DAO testThe data layer + a real databaseNothing important
Handler / API sliceRouting → handler → service → repositoryExternal services
Service-to-serviceTwo services running togetherThird-party APIs

There is no universal "integration test." The taxonomy matters less than knowing which variant you're writing in any given test, and why.

The useful framing: integration tests test the assembly. Each variant defines the assembly differently. The assembly always crosses a boundary that a unit test wouldn't cross.

What Integration Tests Are For

Things only integration tests can reliably catch:

  • Schema drift. Your ORM model says created_at; the migration produced createdAt. Unit tests stub the DB; only an integration test hits the column.
  • Serialization mismatches. DTO field renamed; consumer breaks. Unit tests of producer and consumer both pass; the integration shows the wire format changed.
  • Transactional behavior. Rollback on error; isolation under concurrent writes; constraint violations.
  • Configuration wiring. The right service is bound to the right interface in the DI container; environment variables map to the right defaults.
  • Library behavior under realistic conditions. Your ORM's lazy-load behavior; the cache library's eviction at the size limit; the HTTP client's connection pooling.
  • Middleware order. Auth runs before logging; CORS runs before auth; error handler catches what others throw.

Things integration tests are not for:

  • Combinatorial business logic. That belongs in unit tests. Don't test 30 discount rules through HTTP.
  • Cosmetic UI. That's component testing in the frontend.
  • End-to-end user journeys. That's E2E; see E2E Testing.

The Spectrum: How Much to Wire Up

Imagine a slider. On one end, you're testing a single class with its internal collaborators (sociable unit test). On the other, you've wired up your whole backend with a real database, real cache, real message broker (full integration).

Each step right adds realism. Each step also adds:

  • Setup cost. More moving parts to start.
  • Speed cost. More boundaries to cross.
  • Maintenance cost. More things to break for unrelated reasons.
  • Diagnostic cost. When it fails, more places to look.

The right position on the slider depends on what you're trying to catch:

  • Testing repository code? Wire the real DB, fake everything else.
  • Testing handler behavior end-to-end? Wire DB + handler + middleware, fake outbound HTTP.
  • Testing a publisher's contract with a broker? Wire the broker (or a contract-tested fake); skip everything past it.

A common mistake: wire up everything just in case. The result is a slow suite that doesn't tell you which layer broke.

The Database Question

Most integration test pain comes from how the database is handled. Three choices:

In-memory replacement

SQLite for tests, Postgres for prod. Tempting because it's fast and trivial to set up. Usually a mistake:

  • Behavior differs (transactions, types, locking semantics).
  • Features differ (Postgres jsonb, RETURNING, advisory locks — SQLite doesn't have them).
  • Bugs in production-only behavior never surface.

Acceptable when: the data layer uses an extremely narrow SQL subset, and the team accepts the gap. Rare in practice.

Real engine, ephemeral per-test or per-suite

Testcontainers, Docker Compose, a per-CI-job DB. The DB engine is identical to prod; state is reset between tests.

  • Tradeoff: slower startup (container boot), but realistic behavior.
  • The default for serious backends.

Shared dev/test DB

Everyone connects to the same database in CI. Avoid:

  • Tests step on each other under parallelism.
  • "What's in this table?" is now a function of who ran what.
  • Schema migrations on the shared DB break in-flight tests.

Specifics on isolation and state management live in Test Data & Environments and Parallelization.

Test Boundaries: Real vs. Stubbed

A practical rule for picking what to stub at the integration layer:

Stub at architectural boundaries you control. Use the real thing for boundaries you own.

BoundaryTreatment
Database (you own the schema)Real
Cache (you control invalidation)Real or in-memory fake
Message broker (you produce the messages)Real or contract-tested fake
Third-party API (Stripe, SendGrid, etc.)Stub at the HTTP client layer
Other internal service (microservice)Contract test against a shared contract
The clockAlways inject; never use real time
Random sourcesAlways seed

The principle: own it → use it real (you can fix what breaks). Don't own it → stub (you can't fix what they break).

For service-to-service integration, contract tests are the better tool than spinning up the other service in your CI. See Contract Testing.

Speed

Integration tests are slower than unit tests by definition. They should still be fast enough to run as part of a PR check. Targets:

  • A single integration test: under 500ms.
  • A suite of 200 integration tests: under 2 minutes, parallelized.

When they're slower, the usual causes:

  • Setup per test instead of per suite. Spinning a container for every test multiplies setup cost. Boot once, isolate state.
  • Sequential DB resets. TRUNCATE on a wide table is slow; use transactional rollback where possible.
  • Eager fixtures. Tests loading the entire seed dataset to assert one thing.
  • Tests that wait on time. Replace with controlled clocks; never sleep.

A common pattern that hurts: spinning up a real Kafka cluster per test. Use a contract-tested fake or a single shared broker with per-test topics.

Boundary: Don't Drift Into E2E

The most common scope creep: an integration test starts wiring up more services, eventually exercises HTTP through a real browser, then poll-waits for a UI element to appear. At that point, you have an E2E test misfiled as integration.

Symptoms:

  • The test starts a frontend dev server.
  • The test uses a browser driver.
  • The test asserts on rendered DOM.
  • The test takes more than ~3 seconds.

When you find these, either move the test to E2E (where the flakiness budget is different and the tooling is built for it), or strip it back to a real integration test that hits the API directly.

Common Failure Modes

"Integration" tests that are actually unit tests with extra steps

test('OrderService creates an order', () => {
  const repo = new InMemoryOrderRepository();
  const service = new OrderService(repo);
  service.placeOrder({...});
  expect(repo.findAll()).toHaveLength(1);
});

If everything is in-memory and the only "integration" is "two classes in the same process," this is a sociable unit test. That's fine — but don't claim it tests integration.

Cross-test pollution

beforeAll(() => seedDatabase());

test('list users', () => {
  const users = service.listUsers();
  expect(users).toHaveLength(3);  // expects exact seed
});

test('create user', () => {
  service.createUser({...});
  expect(service.listUsers()).toHaveLength(4);  // depends on order
});

The second test only passes if the first ran first. Run them in isolation and the suite breaks. Reset state between tests, or use unique data per test.

Asserting on infrastructure-specific behavior

expect(error.message).toContain('duplicate key value violates unique constraint "users_email_key"');

Couples the test to a specific DB engine, schema name, and Postgres version. Assert on the application's response to the constraint violation — e.g., a typed error or HTTP 409 — not on the raw DB error string.

Stubbing the database

const db = { query: vi.fn().mockResolvedValue([{ id: 1 }]) };
const repo = new UserRepository(db);
expect(await repo.findById(1)).toEqual({ id: 1 });

The repository is the thing that translates between code and SQL. Stubbing the DB call means you're testing the stub, not the SQL. This belongs in integration with a real DB, or it isn't worth writing.

Real third-party calls in the test path

Tests that hit real Stripe, real SendGrid, real anything that costs money or has rate limits. The first time a sandbox is down, the entire suite goes red for unrelated reasons. Use stubs or contract tests; pay for the integration once in a dedicated contract-test suite.

Pre-commit Checklist

Before an integration test goes in:

  • What boundary does this test actually cross? Is it crossing the boundary you intended to test, and not several others by accident?
  • Does the test depend on other tests having run first?
  • Does it run in under 500ms?
  • If it fails, can a developer tell which layer broke from the message alone?
  • Is the database state reset between tests, deterministically?
  • Are external services stubbed, not real?
  • Would a unit test catch this just as reliably? (If yes, demote.)
  • Would an E2E test catch this more meaningfully? (If yes, consider promoting — or write both at different levels.)

If the test "kind of works most of the time," it's not an integration test — it's a tax.

On this page