Steven's Knowledge

FLP Impossibility

Why deterministic consensus under pure asynchrony is impossible, what real systems give up to ship consensus anyway, and why this is the deepest result in distributed systems

FLP Impossibility

In 1985, Fischer, Lynch, and Paterson proved a result that should have ended a career's worth of research: in a purely asynchronous distributed system, no deterministic consensus protocol can guarantee termination if even one node can crash. The proof is short, the conclusion is absolute, and yet etcd, Raft, ZooKeeper, and Paxos all ship and work. Understanding how that paradox is possible — what real systems trade away to "violate" the result — is the difference between a practitioner who can debug a stuck cluster and one who cannot.

This page is about the theoretical core. For the algorithms that work around it, see Raft and Paxos.

The Setting

FLP applies to a specific model. Get any of these wrong and the impossibility does not bite:

  • Asynchronous network. Messages can be delayed arbitrarily long but are eventually delivered. No timeouts. No clocks.
  • Reliable point-to-point channels. No message loss, no duplication, no Byzantine corruption.
  • Crash failures only. A node can stop, but if it runs it follows the protocol correctly.
  • At most one node may crash. Yes — one. The result holds even with this most generous assumption.
  • Deterministic protocol. No randomization, no oracles.

The Statement

In this model, no algorithm can solve consensus if it must satisfy all three of:

  • Agreement — no two non-faulty nodes decide differently.
  • Validity — the decided value was proposed by some node.
  • Termination — every non-faulty node eventually decides.

The "trilemma" is real: any algorithm that maintains agreement and validity can be forced into a schedule where some non-faulty node never decides. Termination is what you cannot have.

Proof Intuition

The proof works by reasoning about configurations and a property called bivalence.

A configuration of the system at some moment is bivalent if both decisions (e.g., 0 and 1) are still reachable from it. It is univalent if only one outcome remains possible.

   Bivalent configuration C

       ├── schedule S1 ──▶ ... ──▶ decides 0

       └── schedule S2 ──▶ ... ──▶ decides 1

   Univalent (0-valent) configuration U

       └── every schedule from here ──▶ decides 0

Decisions are sticky: once any reachable run from C actually decides, you have entered a univalent region. The proof argues that an adversarial schedule can always keep the system in the bivalent region.

The proof shows three things:

  1. An initial bivalent configuration exists. Because validity requires the decision to match some proposal, and proposals can disagree, there must be runs that end in 0 and runs that end in 1. Some starting point is bivalent.

  2. Every step is recoverable. For any bivalent configuration C, there exists a successor configuration that is also bivalent — meaning the adversary scheduling messages can always keep the system in a state where the outcome is not yet decided.

  3. Therefore the schedule that keeps the system bivalent forever exists, by induction. No node decides, so termination fails.

The clever move is step 2: it shows that no matter what action a process is about to take, the adversary can delay one specific message just long enough that the system stays bivalent. Because the system is asynchronous, "delay this message" is always a legal scheduling decision — you cannot tell whether a message is slow or its sender has crashed.

Why This Is Not a Contradiction with Reality

FLP is about the asynchronous model. Real systems run in something less hostile:

  • Partially synchronous (Dwork-Lynch-Stockmeyer, 1988). The network is asynchronous, but eventually becomes synchronous (or it is "mostly" synchronous with occasional bad periods). Algorithms like Paxos and Raft work in this model. The key trick: timeouts. A timeout lets you suspect a node has failed, even though you cannot truly distinguish "crashed" from "slow." If your suspicion is wrong, you might temporarily lose liveness (a re-election, a retry), but you regain it when the network calms down.

  • Randomized (Ben-Or, 1983; Rabin, 1983). Allow the protocol to flip coins. Algorithms like Ben-Or's solve consensus in the asynchronous model with probabilistic termination: the protocol terminates with probability 1, but no fixed bound on rounds. FLP only rules out deterministic termination.

  • Failure detectors (Chandra-Toueg, 1996). Augment the model with an oracle (a "failure detector") that gives hints about which processes have crashed. Even unreliable detectors are enough to circumvent FLP. The eventually strong (◇S) failure detector is sufficient and implementable in real networks.

Every working consensus algorithm uses at least one of these escape hatches. Raft and Paxos use timeouts (partial synchrony). Bitcoin uses randomization (probabilistic safety, very long termination). PBFT uses synchronous bounds for safety and asynchronous progress for liveness.

What This Means for Your System

The practical implication is unavoidable:

A consensus-based system can guarantee safety (agreement, validity) under pure asynchrony. It cannot guarantee liveness (termination). Any system that claims both is making an assumption that you should locate and verify.

Concretely:

  • Raft and Paxos stop committing during a partition. This is correct behavior; FLP says they have no choice. If your cluster cannot reach majority, it stops. That is safety preserved by sacrificing liveness.
  • Election storms are FLP in costume. If election timeouts are too short relative to actual network jitter or GC pauses, the cluster spends all its time electing and none making progress. Each individual election terminates because randomization breaks ties, but the useful work makes no progress — a liveness failure at the application layer.
  • "Eventually consistent" systems do not violate FLP. They give up agreement (decisions can differ on different nodes temporarily), not termination. Different trade-off.
  • Anyone selling you "instant consensus over the public internet" is selling you a violation of FLP. Read the fine print: they are using timeouts, accepting probability-of-correctness, or assuming bounded latency.

Relationship to Other Impossibility Results

FLP is one of three load-bearing impossibilities in distributed systems:

ResultConstraint
FLP (1985)No deterministic consensus in async with one crash
CAP (Gilbert-Lynch, 2002)No linearizable-and-available system under partition
Two Generals (1975)No deterministic agreement over a lossy channel, finite messages

They are complementary, not redundant. FLP is about termination; CAP is about consistency vs availability; Two Generals is about agreement under a different failure model. A system designer should know all three and which is biting on a given decision.

Further Reading

  • Fischer, Lynch, Paterson, Impossibility of Distributed Consensus with One Faulty Process (JACM 1985) — the paper. Twelve pages, surprisingly readable for an impossibility proof.
  • Dwork, Lynch, Stockmeyer, Consensus in the Presence of Partial Synchrony (JACM 1988) — the model real systems use.
  • Chandra & Toueg, Unreliable Failure Detectors for Reliable Distributed Systems (JACM 1996) — the failure-detector escape hatch.
  • Ben-Or, Another advantage of free choice: Completely asynchronous agreement protocols (PODC 1983) — randomized consensus.
  • Aspnes, Notes on Theory of Distributed Systems (Yale lecture notes, freely available) — chapter on FLP is the most accessible modern explanation.

Pre-commit Checklist

  • Does my consensus system have a documented assumption about timing (partial synchrony, randomization, failure detector)? If not, find one — implicitly it has one.
  • What does my system do when it cannot make progress (no majority, election storm)? Have I observed that state in testing?
  • For "exactly-once" or "guaranteed delivery" claims in my docs, have I checked which FLP-adjacent assumption they require?
  • If my failure detector is timeout-based, are the timeouts orders-of-magnitude longer than my worst-case latency and GC pause?
  • Am I confusing safety failures (data divergence) with liveness failures (cluster stuck)? FLP says you can always have one; choose which.

On this page