Property-Based Testing
Asserting properties that should hold for all inputs, then letting a generator try thousands of cases — and shrinking failures to minimal counter-examples
Property-Based Testing
Example-based tests assert that a specific input produces a specific output: "given [3, 1, 2], sort() returns [1, 2, 3]." That's useful, but it tests one of infinitely many cases. The bug your test missed is, by definition, the one you didn't think to write.
Property-based testing flips this. Instead of asserting against a fixed example, you assert a property — a fact that should be true for any valid input — and a generator produces hundreds or thousands of random inputs to challenge it. When a counter-example is found, the framework shrinks it to the smallest input that still fails.
The two failure modes that property-based testing catches that examples don't:
- Inputs you didn't think of. Empty arrays, single-element arrays, arrays with duplicates, arrays with
NaN, very large arrays. - Combinations across parameters. An off-by-one bug that only manifests when one input is empty and another is exactly 7.
Examples test what you remembered to test. Properties test what you didn't.
Properties Worth Knowing
A property is any statement of the form "for all valid inputs x, some condition holds." A handful of patterns cover most useful properties:
Round-trip / inverse
decode(encode(x)) === x for all x. Equivalent: parse-then-serialize should match the original.
fc.assert(fc.property(fc.string(), (s) => {
expect(decode(encode(s))).toBe(s);
}));Useful for: serializers, parsers, codecs, URL encoding, compression, encryption (with a key fixed).
Idempotence
f(f(x)) === f(x). Applying the operation twice is the same as once.
Useful for: normalization (lowercase, trim), deduplication, set operations, applying a migration.
Commutativity / associativity
f(a, b) === f(b, a) (commutativity); f(f(a, b), c) === f(a, f(b, c)) (associativity).
Useful for: aggregations (sum, max, set union), merging operations.
Invariants
Some property is preserved across the operation: "after sort(), the array has the same elements as before."
fc.assert(fc.property(fc.array(fc.integer()), (arr) => {
const sorted = sort([...arr]);
expect(sorted).toHaveLength(arr.length);
expect(new Set(sorted)).toEqual(new Set(arr));
}));Useful for: sorting (length and multiset preserved), transformations (no data loss).
Oracles / model comparison
Compare against a known-correct reference implementation: "my fast version produces the same output as the slow obviously-correct version."
Useful for: optimizing existing code, replacing one implementation with another.
Metamorphic relations
If f(x) = y, then f(transform(x)) = transform_y(y). Example: length(s1 + s2) === length(s1) + length(s2).
Useful when the exact output is hard to predict, but its relationship to other outputs is.
Generators
A generator produces random values of a given type. Frameworks supply primitives (integer, string, boolean, array, object) and composition (oneOf, tuple, record, filter, map).
const userGen = fc.record({
id: fc.uuid(),
name: fc.string({ minLength: 1, maxLength: 50 }),
age: fc.integer({ min: 0, max: 120 }),
email: fc.emailAddress(),
});
fc.assert(fc.property(userGen, (user) => {
expect(validateUser(user)).toBe(true);
}));Two skills compound:
- Generator design. A generator that produces only
{id: 'a', name: 'b'}over and over tests very little. A generator that produces values across the realistic distribution catches more. - Constraint hygiene. Use
filtersparingly. If you filter out 99% of generated values, the framework wastes time generating rejects. Construct valid values directly when possible.
Generators for invalid inputs
For functions that should reject bad inputs, write generators for the invalid set explicitly:
fc.assert(fc.property(fc.string().filter(s => !s.includes('@')), (notEmail) => {
expect(validateEmail(notEmail).ok).toBe(false);
}));This catches "I forgot to reject null" and "my validator passes empty strings."
Shrinking
When a property fails on a 500-element array containing weird unicode, that's not a useful bug report. Shrinking is what makes property-based testing tractable: the framework automatically simplifies the failing input until it can't simplify further without the failure going away.
Good shrinking produces minimal counter-examples:
Failed at: [-2147483648, "abc�def", true, [-1, 0, 0, 1]]
Shrunk to: [-1, "", false, []]The shrunk case is usually obviously the bug. Without shrinking, you'd be debugging the original — possibly mistaking incidental complexity for the cause.
Frameworks shrink reasonably out of the box. Custom generators sometimes need help: provide a shrink function for non-trivial types, or compose from primitives that shrink well.
What Property-Based Testing Doesn't Replace
- Specific regression tests. When you fix a bug, add an example test that pins the exact case. Properties may regenerate equivalent failures, but the specific case documents the bug.
- Tests for the happy path. Property tests prove the spec holds; example tests prove the feature exists. A new endpoint should have at least one example test that asserts it does what users expect.
- Performance tests. Property tests assert correctness, not speed.
The right pairing: examples for the cases you specifically care about; properties for the cases you couldn't enumerate.
When It's Worth Using
Strong fit:
- Pure functions with clear properties. Parsers, encoders, math, data structures.
- Stateful systems with invariants. A cache that shouldn't grow past a size; a queue that should preserve FIFO; a state machine with reachable-state rules.
- Replacing existing code. Run the new implementation against the old one as an oracle.
- Bug-prone areas. A module that historically gets edge-case bugs benefits disproportionately.
Weak fit:
- UI rendering. "Some valid props produce a valid render" is a property, but it's vague enough to catch little.
- Glue code with no clear properties. "A controller calls a service" doesn't have a property; it has a contract.
- Tests that need elaborate setup. Property tests run many iterations; expensive setup multiplies.
Stateful Property-Based Testing
A more advanced pattern: instead of generating values, generate sequences of operations against a stateful system, and check invariants after each step.
// pseudo-code
sequence: [push(3), push(1), pop(), push(5), pop()]
model: empty list, updated mentally after each step
system: real stack
check: at each step, sequence's expected state matches actual stateUseful for testing data structures, state machines, caches, transactions. Tools: fast-check (fc.commands), QuickCheck, Hedgehog, Proper.
Tooling
| Framework | Language | Notes |
|---|---|---|
| Hypothesis | Python | The most polished; great shrinking and stateful support |
| fast-check | JavaScript / TypeScript | Solid generators, integrates with Jest/Vitest |
| QuickCheck | Haskell (the original) | The canonical implementation |
| ScalaCheck | Scala | Mature, integrates with ScalaTest, specs2 |
| PropEr / proper | Erlang | Strong stateful testing |
| Hedgehog | Haskell, F# | Integrated shrinking (vs. type-class-based in QuickCheck) |
| jqwik | Java | The mainstream Java option |
| Hypothesis-Ruby / Rantly | Ruby | Less mature than Hypothesis |
| Gopter | Go | Workable but less polished |
The quality of the framework matters: a tool with bad shrinking gives failures you can't debug, and gets abandoned. Hypothesis and fast-check are notably strong on this.
Common Failure Modes
Property is too weak
fc.assert(fc.property(fc.string(), (s) => {
expect(typeof reverse(s)).toBe('string');
}));Always passes. Property says nothing about reversal. Strengthen: reverse(reverse(s)) === s.
Property is too strong
fc.assert(fc.property(fc.array(fc.integer()), (arr) => {
expect(sort(arr)).toEqual([...arr].sort((a, b) => a - b));
}));Asserts equality with a specific sort comparator. If sort is stable and [...arr].sort is not (or vice versa), the property fails for valid behavior. Properties shouldn't pin implementation details.
Generators that hide bugs
fc.assert(fc.property(fc.string().filter(s => s.length > 0), (s) => {
// function under test fails on empty strings
}));You filtered out the failing case. The test passes; the bug ships. Generators that filter heavily often filter out the inputs that would expose bugs.
Non-determinism
fc.assert(fc.property(fc.integer(), (x) => {
return Date.now() > 0; // technically true, but...
}));The function under test uses Date.now, random numbers, or external state. Property tests rerun with random inputs; non-determinism makes failures unreproducible. Inject clocks and seeds.
Treating it as slow unit testing
Property tests with 100,000 iterations of a 1ms test take 100 seconds. By default, run ~100 iterations per property in PR checks; reserve high iteration counts for nightly or pre-release.
Properties as documentation
fc.property(fc.string(), (s) => true); // empty propertyA property that always passes is dead weight. Periodically audit that each property has a real assertion that could fail.
Pre-commit Checklist
Before adding a property test:
- Can you state the property as a single sentence? "For all
x, this should hold." - Does the property fail when the function is broken in a plausible way? (Try it: deliberately break the implementation, run the test.)
- Does the generator produce realistic inputs, including edge cases (empty, max, negative)?
- Do failing cases shrink to something readable, or to incomprehensible random noise?
- Is the test fast enough that it can run with enough iterations to be meaningful?
- Is there an example test pinning the specific bugs this property test once found?
Property tests are most useful when paired with examples — the property protects against unknown unknowns; the examples document the bugs you've already learned about.