Steven's Knowledge

Concurrency

Principles for writing correct concurrent code

Concurrency

Concurrency multiplies the difficulty of every other concern in this section. A single-threaded function is judged by what it does; a concurrent function is judged by what it does, and what every interleaving of every other function with it might do.

The discipline is to keep the concurrent surface small, well-defined, and tested separately from the logic it coordinates.

What Concurrency Buys, and What It Costs

Concurrency is not free. The reasons to introduce it:

  • Throughput. Doing several things at once when they are independent.
  • Responsiveness. Keeping a UI or a server's main thread free to handle other work while a slow operation completes.
  • Resource utilization. Making use of cores, IO bandwidth, or external services that would otherwise sit idle.

The costs:

  • Correctness. Race conditions, deadlocks, data corruption, lost updates.
  • Reproducibility. Bugs that depend on timing reproduce on the customer's machine and not yours.
  • Reasoning. Every thread of execution makes the state space larger.
  • Testing. Tests that pass under one schedule fail under another.

Reach for concurrency when the benefit is concrete; avoid it when synchronous code suffices.

Separate Concurrency Concerns from Domain Logic

The single most useful structural rule is to keep the concurrent code and the domain code in different modules.

┌──────────────────┐         ┌──────────────────┐
│ Concurrency      │  uses   │ Domain logic     │
│ (thread pools,   │ ──────▶ │ (pure, single-   │
│ queues, locks)   │         │ threaded code)   │
└──────────────────┘         └──────────────────┘

Tested independently:

  • Domain logic is tested without concurrency.
  • Concurrency mechanics are tested with deliberately scheduled inputs.

When the two are mixed, every test of either is a test of both, and most failures are hard to attribute.

Limit Shared Mutable State

The defining cause of concurrency bugs is shared and mutable and unsynchronized state. Eliminate any one of the three and most of the difficulty goes with it.

Prefer immutable data

Immutable values are safe to share across threads without synchronization. Functional cores, persistent data structures, and copy-on-write idioms make broad classes of races structurally impossible.

Confine mutable state

When state must mutate, confine it to a single thread, a single actor, or a single owner. The boundaries of the confinement should be visible: a queue receives messages, an actor processes them in order, no other code reaches in.

Make sharing explicit

When state is genuinely shared, name the synchronization: a lock, a transaction, an atomic primitive, a channel. Implicit sharing — global state, hidden caches, static fields — is the source of most "spooky" bugs.

Choose the Right Concurrency Model

Different problems call for different models. Avoid mixing them in the same module.

Threads + locks

The classical model: shared memory protected by mutexes. Powerful and ubiquitous, but error-prone:

  • Holding a lock too long blocks other threads.
  • Holding several locks invites deadlock.
  • Forgetting to hold the right lock corrupts state silently.

Useful when low-level performance matters and the team has the expertise. Not the default.

Message passing / actors

State lives inside an actor; other actors send messages. The actor processes messages in order. No shared memory, no locks. Goes by many names — Erlang processes, the Actor model, Go channels with goroutines, JavaScript async/await with task queues.

Useful when interactions can be expressed as messages and the system is naturally event-driven. Imposes a discipline that prevents most concurrency bugs.

Asynchronous I/O

A single thread juggles many in-flight operations using callbacks, promises, or coroutines. Common in network servers and UIs. Eliminates thread-related races but introduces a different family — interleaved continuations, callback ordering, accidental sharing across await points.

Useful when the workload is I/O-bound (most network programming). Use the language's structured-concurrency primitives (async/await, structured task groups) to keep the code linear.

Parallelism via pure data

When the problem can be expressed as a transformation over independent data, use parallel collections, parallel pipelines, or map/reduce. No shared state, no synchronization — the framework handles distribution.

Useful for batch processing, data pipelines, and CPU-bound work over partitionable data.

Common Pitfalls

Race conditions

Two threads read-modify-write the same value without synchronization, and one update is lost.

// Each thread does: counter = counter + 1
// Result: counter increases by 1, not 2.

Mitigations: atomic operations, locks, single-writer patterns, or eliminating the shared state.

Deadlocks

Thread A holds lock 1 and waits for lock 2; thread B holds lock 2 and waits for lock 1.

Mitigations:

  • Acquire locks in a consistent order across the codebase.
  • Hold locks for the shortest time possible.
  • Avoid calling out to unknown code while holding a lock — the callee may attempt to acquire a lock you already hold.
  • Use higher-level concurrency primitives (semaphores, channels) where they fit.

Livelocks and starvation

Threads make progress superficially but never complete useful work, or some threads never get to run.

Mitigations: fair scheduling primitives, backoff strategies, bounded retries with jitter.

Memory visibility

A write on one thread is not guaranteed to be visible on another without synchronization, even when both run to completion. The bug is invisible until it isn't.

Mitigations: use the language's standard synchronization primitives (mutexes, atomics, volatile/atomic types) rather than custom flag-based signaling.

Unsafe lazy initialization

if (instance == null) {
  instance = new Singleton();
}

Two threads may both observe null and create two instances. Use the language's idiomatic safe-publication patterns (once, lazy_static, double-checked locking with proper memory barriers, or a class-level constant).

Defensive Practices

Keep critical sections small

The longer a thread holds a lock, the higher the contention and the higher the risk of deadlock. Minimize the work done inside the lock; do not perform I/O, log to a remote service, or call out to extension code while holding one.

Distrust shared library calls under locks

A "harmless" call to a library function while holding a lock is one of the most common deadlock causes — the library may attempt its own synchronization. When in doubt, copy the data you need, release the lock, then call.

Bound queues and pools

An unbounded queue is a memory leak waiting to happen; an unbounded thread pool is a fork bomb. Always set limits, decide a backpressure or rejection strategy, and instrument the queues so the operator sees them filling.

Document concurrency assumptions

Every public method should make its concurrency posture clear:

  • Thread-safe (callers may invoke from any thread).
  • Caller-synchronized (callers must hold a particular lock).
  • Single-threaded (do not call from multiple threads).

A method whose posture is unclear forces callers to guess, and guesses are wrong eventually.

Testing Concurrent Code

Concurrent code is hard to test deterministically, but it is not untestable:

  • Property tests describe invariants that must hold under any interleaving (final balance = sum of deposits − sum of withdrawals). Run them many times; failures often reveal races.
  • Schedule control. Tools that randomize or systematically explore interleavings (deterministic schedulers, model checkers) can catch races that random testing misses.
  • Stress tests. Hammer the system with concurrent load and look for deviations from invariants.
  • Static analysis. Linters and type systems can catch some unsynchronized access patterns.

Tests that depend on sleep() to "give the other thread a chance" are not reliable; they are how flakes enter the test suite.

Reasoning About Time

Concurrency is intertwined with time:

  • Temporal coupling. A function only works if it is called after another function. Make the dependency explicit (a state machine, a typed builder), or eliminate it.
  • Clock dependence. Code that depends on wall-clock time is hard to test and behaves differently across machines. Inject a clock; control time in tests.
  • Eventually consistent state. A response that returns "OK" before the change is observable elsewhere is not necessarily wrong — but the contract should say so explicitly.

Pre-Commit Checklist

  • Is the concurrency genuinely necessary, with a measurable benefit?
  • Is the concurrent code separated from the domain logic, each testable on its own?
  • Is shared mutable state minimized — through immutability, confinement, or explicit synchronization?
  • Are locks acquired in a consistent order across the codebase, held briefly, and never held across calls to unknown code?
  • Are queues and pools bounded, with documented backpressure behavior?
  • Does each public method document its concurrency posture?
  • Are tests resilient to scheduling — without sleep() as a synchronization mechanism?

On this page