Steven's Knowledge

Test Data & Environments

Fixtures, factories, ephemeral environments — how tests get their data, and how the environment they run in stays cheap, isolated, and reproducible

Test Data & Environments

The test suite is the part of the system most people see. The test data and environment is the part that determines whether the suite is fast, reliable, and trustworthy. Most teams underinvest in this layer until a flake hunt or a multi-hour CI run forces the issue.

The two failure modes this section is meant to prevent:

  • Fixture rot. A monolithic seed.sql file that nobody understands, that every test depends on in subtle ways, and that breaks unrelated tests whenever it's touched.
  • Environment rot. Tests pass on a developer's machine, on one CI runner, in one region; fail elsewhere. The cause is a config, a binary version, a TZ, that nobody owns.

The fix is the same in both cases: make the data and the environment something the test produces, not something the test inherits.

The Spectrum of Test Data

Three patterns, ordered by how much state each test owns:

PatternWhat it isWhen to use
Inline literalsData declared inside the testPure functions; small inputs
Builders / factoriesCode that constructs valid instances on demandMost unit + integration tests
Shared fixturesPre-loaded data the test reads fromRead-only reference data; expensive one-time setup
Snapshots / dumpsA captured DB state restored at test startMigrations, large reporting tests

The default is factories. They give each test its own data, with sane defaults and easy overrides.

// Pattern: factory with override
function makeUser(overrides = {}) {
  return {
    id: cuid(),
    name: 'Alice',
    email: `${cuid()}@example.com`,  // unique per call
    createdAt: new Date('2024-01-01'),
    ...overrides,
  };
}

test('blocks admin actions for non-admins', () => {
  const user = makeUser({ role: 'member' });
  expect(canAdmin(user)).toBe(false);
});

What makes this work: every test reads its requirements in the test itself. No hidden coupling to a shared dataset.

Shared Fixtures: Use Sparingly

Shared fixtures look efficient — load once, every test reads. They go wrong predictably:

  • Test A mutates the fixture; test B reads stale data. Order-dependent flake.
  • The fixture grows to support every test. New requirements add rows; old rows are kept "just in case." After a year, no one knows what's actually needed.
  • Tests assert against fixture data by ID. expect(user.id).toBe(1) — the test breaks the moment the seed reorders.

When shared fixtures earn their place:

  • Read-only reference data. Currency codes, country lists, feature catalogs.
  • Truly expensive setup. A snapshot of a 10M-row warehouse for analytics tests.
  • Cross-test scenarios. "Given this 50-step migration history, do these queries still work?"

Outside those cases, factories almost always win.

Builders for Complex Graphs

Once test data involves relationships (an Order has a User, a Cart, line items, a payment), inline construction gets noisy. Builders compose:

const order = orderBuilder()
  .withUser(userBuilder().asAdmin())
  .withItems(3)
  .paid()
  .build();

Properties of a builder that pays off:

  • Sane defaults for everything; tests override only what they care about.
  • Composable — builders can take other builders.
  • Persists on call. build() returns an object; create() persists it. Two-step lets tests build without writing.
  • Returns realistic values. Random emails, not test@test.com. Real-looking strings catch unicode and length bugs.

Libraries: Faker, FactoryBot (Ruby), factory_boy (Python), test-data-bot (JS), Bogus (.NET). All do the same thing; pick the one that matches your runner.

Databases Between Tests

The most expensive part of test data is keeping the database in a known state between tests. Three patterns:

Transactional rollback (fast)

Each test starts a transaction; everything written rolls back at the end. The DB ends each test in the same state it started.

  • Pros: extremely fast, no truncation cost.
  • Cons: tests can't span their own transactions, can't test code that commits explicitly, can't easily test triggers fired by COMMIT.

Truncation between tests (medium)

After each test, truncate all tables (or the affected subset). Re-seed the reference data.

  • Pros: simple mental model, works with any code under test.
  • Cons: slower than rollback; truncate of foreign-keyed tables in the wrong order is its own footgun.

Snapshot / restore (slowest)

Take a binary snapshot of an initialized DB; restore it before each test (or each suite).

  • Pros: any test can do anything; restore is atomic.
  • Cons: snapshot restore is slow; storage cost grows.

The default that works for most teams: transactional rollback for most tests; truncation for the few that need committed state. Pair with per-worker databases.

Test Environments

A test environment is "everything the test depends on that isn't the code under test." That includes:

  • Language runtime + version.
  • OS + base image.
  • Database, cache, message broker versions.
  • Browser versions.
  • Time zone, locale, system clock.
  • Environment variables.
  • Filesystem layout.

The principle: every variable above is pinned, or the test is non-reproducible.

The hierarchy of environments

LevelWhat it isLifetime
Process-localA fake or in-memory adapter inside the test processPer test
Container per test/fileTestcontainers spinning up a real servicePer test or per file
Shared container per suiteOne container, all tests in the suite sharePer suite run
Ephemeral environmentFull app stack provisioned per PRPer PR
Shared stagingLong-lived environment shared across teamsPermanent

Closer to "process-local" is faster and more isolated. Closer to "shared staging" is more realistic but more contested.

The rule of thumb: use the closest-to-local level that gives you the realism your test needs. A unit test for a JSON validator does not need a Kubernetes cluster. An E2E test for "checkout flow with real Stripe" can't avoid one.

Ephemeral Environments

For E2E or contract tests, spinning up a fresh full stack per PR (or per test run) is the gold standard. The full stack lives long enough to test, then disappears.

What "ephemeral" requires:

  • Infrastructure-as-code that can build the stack from scratch.
  • A way to address it (per-PR subdomain, per-PR namespace).
  • Seed data per-environment, not shared.
  • Teardown that actually happens — orphaned environments are how cloud bills spike.

Tools in this space: Vercel/Netlify preview deploys, AWS PR environments via CodeBuild, Kubernetes namespaces per PR, Render preview environments, Coherence, Bunnyshell.

The cost question: ephemeral environments are not free. The math: if a PR keeps the environment up for 30 minutes, and your team merges 50 PRs/day, that's 25 environment-hours per day of compute. Charge it to the right cost center; otherwise it surprises someone.

Time, Randomness, and the Outside World

The non-data parts of "environment" that bite hardest:

  • Clock. Never let tests read system time. Inject a clock. Tests that say "today is March 1" should do so deterministically.
  • Randomness. Seed every RNG. A test that hands an ID assertion to Math.random() will eventually find the failing case.
  • Timezones. Pin TZ in the runner config. Default is "whichever timezone the runner happens to be in," which is how CI fails only on the dev who lives in Auckland.
  • Locale. Pin LANG/LC_ALL. Number formatting, date formatting, string sorting all depend on it.
  • External services. Never let unit/integration tests hit real third-party APIs. Use a contract-tested fake or VCR-style recordings.

Anti-Patterns

seed.sql for everyone. One file, every test reads. Touching it breaks unrelated tests. The history of changes is a graveyard.

Hard-coded IDs. expect(user.id).toBe(1) — fragile against reordering; reveals nothing about behavior.

Tests that depend on yesterday's run. Setup that assumes "if rows exist, use them; otherwise create" leaks state across runs.

Tests that delete shared data on teardown. Worker 1 deletes "test users" while worker 2's test is mid-flight.

"Just connect to staging." A test that depends on the staging DB will eventually fail because staging changed. Staging is for humans; tests need their own data.

Captured production data, unsanitized. PII in test fixtures. Compliance time bomb.

Long-lived ephemeral environments. "We forgot to tear it down." Three months later, a finance review asks about the $40K/month line item.

Pre-merge Checklist

Before declaring test data & environments healthy:

  • Can a single test be read in isolation and tell the reader what data it needs?
  • Is randomness, time, and timezone deterministic in tests?
  • When a test fails on CI but passes locally, is there a documented diff to investigate?
  • Are databases per-worker, not shared across processes?
  • Do ephemeral environments tear down reliably, with cost visibility?
  • Is there a path from "I need a new kind of test data" to "I added a factory method," not "I edited seed.sql"?

If new test data lands by appending to a shared seed file, the fixture layer is the next thing to bankrupt the suite.

On this page