Steven's Knowledge

Property-Based Testing

Asserting properties that should hold for all inputs, then letting a generator try thousands of cases — and shrinking failures to minimal counter-examples

Property-Based Testing

Example-based tests assert that a specific input produces a specific output: "given [3, 1, 2], sort() returns [1, 2, 3]." That's useful, but it tests one of infinitely many cases. The bug your test missed is, by definition, the one you didn't think to write.

Property-based testing flips this. Instead of asserting against a fixed example, you assert a property — a fact that should be true for any valid input — and a generator produces hundreds or thousands of random inputs to challenge it. When a counter-example is found, the framework shrinks it to the smallest input that still fails.

The two failure modes that property-based testing catches that examples don't:

  • Inputs you didn't think of. Empty arrays, single-element arrays, arrays with duplicates, arrays with NaN, very large arrays.
  • Combinations across parameters. An off-by-one bug that only manifests when one input is empty and another is exactly 7.

Examples test what you remembered to test. Properties test what you didn't.

Properties Worth Knowing

A property is any statement of the form "for all valid inputs x, some condition holds." A handful of patterns cover most useful properties:

Round-trip / inverse

decode(encode(x)) === x for all x. Equivalent: parse-then-serialize should match the original.

fc.assert(fc.property(fc.string(), (s) => {
  expect(decode(encode(s))).toBe(s);
}));

Useful for: serializers, parsers, codecs, URL encoding, compression, encryption (with a key fixed).

Idempotence

f(f(x)) === f(x). Applying the operation twice is the same as once.

Useful for: normalization (lowercase, trim), deduplication, set operations, applying a migration.

Commutativity / associativity

f(a, b) === f(b, a) (commutativity); f(f(a, b), c) === f(a, f(b, c)) (associativity).

Useful for: aggregations (sum, max, set union), merging operations.

Invariants

Some property is preserved across the operation: "after sort(), the array has the same elements as before."

fc.assert(fc.property(fc.array(fc.integer()), (arr) => {
  const sorted = sort([...arr]);
  expect(sorted).toHaveLength(arr.length);
  expect(new Set(sorted)).toEqual(new Set(arr));
}));

Useful for: sorting (length and multiset preserved), transformations (no data loss).

Oracles / model comparison

Compare against a known-correct reference implementation: "my fast version produces the same output as the slow obviously-correct version."

Useful for: optimizing existing code, replacing one implementation with another.

Metamorphic relations

If f(x) = y, then f(transform(x)) = transform_y(y). Example: length(s1 + s2) === length(s1) + length(s2).

Useful when the exact output is hard to predict, but its relationship to other outputs is.

Generators

A generator produces random values of a given type. Frameworks supply primitives (integer, string, boolean, array, object) and composition (oneOf, tuple, record, filter, map).

const userGen = fc.record({
  id: fc.uuid(),
  name: fc.string({ minLength: 1, maxLength: 50 }),
  age: fc.integer({ min: 0, max: 120 }),
  email: fc.emailAddress(),
});

fc.assert(fc.property(userGen, (user) => {
  expect(validateUser(user)).toBe(true);
}));

Two skills compound:

  • Generator design. A generator that produces only {id: 'a', name: 'b'} over and over tests very little. A generator that produces values across the realistic distribution catches more.
  • Constraint hygiene. Use filter sparingly. If you filter out 99% of generated values, the framework wastes time generating rejects. Construct valid values directly when possible.

Generators for invalid inputs

For functions that should reject bad inputs, write generators for the invalid set explicitly:

fc.assert(fc.property(fc.string().filter(s => !s.includes('@')), (notEmail) => {
  expect(validateEmail(notEmail).ok).toBe(false);
}));

This catches "I forgot to reject null" and "my validator passes empty strings."

Shrinking

When a property fails on a 500-element array containing weird unicode, that's not a useful bug report. Shrinking is what makes property-based testing tractable: the framework automatically simplifies the failing input until it can't simplify further without the failure going away.

Good shrinking produces minimal counter-examples:

Failed at:    [-2147483648, "abc�def", true, [-1, 0, 0, 1]]
Shrunk to:    [-1, "", false, []]

The shrunk case is usually obviously the bug. Without shrinking, you'd be debugging the original — possibly mistaking incidental complexity for the cause.

Frameworks shrink reasonably out of the box. Custom generators sometimes need help: provide a shrink function for non-trivial types, or compose from primitives that shrink well.

What Property-Based Testing Doesn't Replace

  • Specific regression tests. When you fix a bug, add an example test that pins the exact case. Properties may regenerate equivalent failures, but the specific case documents the bug.
  • Tests for the happy path. Property tests prove the spec holds; example tests prove the feature exists. A new endpoint should have at least one example test that asserts it does what users expect.
  • Performance tests. Property tests assert correctness, not speed.

The right pairing: examples for the cases you specifically care about; properties for the cases you couldn't enumerate.

When It's Worth Using

Strong fit:

  • Pure functions with clear properties. Parsers, encoders, math, data structures.
  • Stateful systems with invariants. A cache that shouldn't grow past a size; a queue that should preserve FIFO; a state machine with reachable-state rules.
  • Replacing existing code. Run the new implementation against the old one as an oracle.
  • Bug-prone areas. A module that historically gets edge-case bugs benefits disproportionately.

Weak fit:

  • UI rendering. "Some valid props produce a valid render" is a property, but it's vague enough to catch little.
  • Glue code with no clear properties. "A controller calls a service" doesn't have a property; it has a contract.
  • Tests that need elaborate setup. Property tests run many iterations; expensive setup multiplies.

Stateful Property-Based Testing

A more advanced pattern: instead of generating values, generate sequences of operations against a stateful system, and check invariants after each step.

// pseudo-code
sequence: [push(3), push(1), pop(), push(5), pop()]
model: empty list, updated mentally after each step
system: real stack
check: at each step, sequence's expected state matches actual state

Useful for testing data structures, state machines, caches, transactions. Tools: fast-check (fc.commands), QuickCheck, Hedgehog, Proper.

Tooling

FrameworkLanguageNotes
HypothesisPythonThe most polished; great shrinking and stateful support
fast-checkJavaScript / TypeScriptSolid generators, integrates with Jest/Vitest
QuickCheckHaskell (the original)The canonical implementation
ScalaCheckScalaMature, integrates with ScalaTest, specs2
PropEr / properErlangStrong stateful testing
HedgehogHaskell, F#Integrated shrinking (vs. type-class-based in QuickCheck)
jqwikJavaThe mainstream Java option
Hypothesis-Ruby / RantlyRubyLess mature than Hypothesis
GopterGoWorkable but less polished

The quality of the framework matters: a tool with bad shrinking gives failures you can't debug, and gets abandoned. Hypothesis and fast-check are notably strong on this.

Common Failure Modes

Property is too weak

fc.assert(fc.property(fc.string(), (s) => {
  expect(typeof reverse(s)).toBe('string');
}));

Always passes. Property says nothing about reversal. Strengthen: reverse(reverse(s)) === s.

Property is too strong

fc.assert(fc.property(fc.array(fc.integer()), (arr) => {
  expect(sort(arr)).toEqual([...arr].sort((a, b) => a - b));
}));

Asserts equality with a specific sort comparator. If sort is stable and [...arr].sort is not (or vice versa), the property fails for valid behavior. Properties shouldn't pin implementation details.

Generators that hide bugs

fc.assert(fc.property(fc.string().filter(s => s.length > 0), (s) => {
  // function under test fails on empty strings
}));

You filtered out the failing case. The test passes; the bug ships. Generators that filter heavily often filter out the inputs that would expose bugs.

Non-determinism

fc.assert(fc.property(fc.integer(), (x) => {
  return Date.now() > 0;  // technically true, but...
}));

The function under test uses Date.now, random numbers, or external state. Property tests rerun with random inputs; non-determinism makes failures unreproducible. Inject clocks and seeds.

Treating it as slow unit testing

Property tests with 100,000 iterations of a 1ms test take 100 seconds. By default, run ~100 iterations per property in PR checks; reserve high iteration counts for nightly or pre-release.

Properties as documentation

fc.property(fc.string(), (s) => true);  // empty property

A property that always passes is dead weight. Periodically audit that each property has a real assertion that could fail.

Pre-commit Checklist

Before adding a property test:

  • Can you state the property as a single sentence? "For all x, this should hold."
  • Does the property fail when the function is broken in a plausible way? (Try it: deliberately break the implementation, run the test.)
  • Does the generator produce realistic inputs, including edge cases (empty, max, negative)?
  • Do failing cases shrink to something readable, or to incomprehensible random noise?
  • Is the test fast enough that it can run with enough iterations to be meaningful?
  • Is there an example test pinning the specific bugs this property test once found?

Property tests are most useful when paired with examples — the property protects against unknown unknowns; the examples document the bugs you've already learned about.

On this page