Classic Paxos, Multi-Paxos, and what the variants actually solve — the algorithm Raft was designed to replace and still cannot fully retire

Paxos

Paxos is the consensus algorithm that everyone cites and almost no one implements from scratch. Lamport published the original paper in 1989 (it took eight years to be accepted), then wrote a follow-up titled Paxos Made Simple in 2001 because no one understood the first one. Raft was designed in 2014 specifically because Paxos Made Simple still was not simple enough.

So why a page on Paxos? Because it is the algorithm Google, Microsoft, and Amazon actually use in their core infrastructure (Chubby, Megastore, Spanner, the original Dynamo paper); because every distributed-systems paper from the last 30 years assumes you have read it; and because the failure modes that show up in production consensus systems are easier to understand once you have seen the original.

This page is the protocol view. If you only intend to use consensus, Raft is the better page to start with.

What Paxos Solves

The problem statement is identical to Raft's: a group of nodes must agree on a single value despite crashes, message loss, and arbitrary delays. The differences are stylistic and pedagogical, not semantic — anything Paxos can decide, Raft can decide, and vice versa. Both work around the FLP impossibility the same way, by assuming partial synchrony.

Paxos works for one decision. To decide a sequence (i.e., build a replicated log), you run Paxos repeatedly — see Multi-Paxos below.

The Three Roles

Every node may play one or more of:

Proposer — proposes a value to be decided.
Acceptor — votes on proposals. The quorum lives here.
Learner — learns the decided value once acceptors agree.

In practice all three roles run on every node. The separation matters because the safety argument only needs to constrain acceptors.

Single-Decree Paxos

The protocol has two phases. Each proposal carries a globally unique, monotonically increasing proposal number n (typically a tuple of (counter, node_id) so different proposers cannot collide).

Phase 1: Prepare / Promise

Proposer ──Prepare(n)─▶  Acceptors

Acceptor's reply:
  IF n > highest n it has promised:
     promise to reject any proposal with number < n
     reply Promise(n, prev_n, prev_v)  where prev_n/prev_v is the
                                       highest-numbered proposal it
                                       previously *accepted*, if any
  ELSE:
     reject (or ignore)

A proposer needs Promise from a majority of acceptors to continue.

Phase 2: Accept / Accepted

Proposer picks the value v:
  IF any Promise included a (prev_n, prev_v):
     v = the prev_v with the highest prev_n        ← MUST reuse it
  ELSE:
     v = whatever the proposer wanted to propose

Proposer ──Accept(n, v)─▶  Acceptors

Acceptor's reply:
  IF it has not promised a higher number:
     accept (n, v), reply Accepted(n, v)
  ELSE:
     reject

When a majority Accepted(n, v), the value v is chosen. Once a value is chosen, no other value can ever be chosen — that is the safety property.

Why It Is Safe

The non-obvious clause is the rule in Phase 2 that says the proposer must reuse the highest previously-accepted value. This is the rule that prevents two competing proposers from each getting a majority for a different value: any majority intersects the previous majority in at least one acceptor, so the new proposer will see the previously chosen value in at least one Promise reply and is forced to re-propose it.

The proof is short but extremely careful; it is the canonical "I think I get it… wait, no" experience for new distributed-systems practitioners. Paxos Made Simple walks through it; do not skip the section called "The Algorithm."

Multi-Paxos

Single-decree Paxos decides one value, in two RPC rounds, with no leader. Running it independently for every log entry would be unbearably slow (every entry pays two round trips) and produces fights among proposers (livelock during contention).

Multi-Paxos optimizes this:

Elect a stable leader by running one Paxos round on "who is the leader for the next many entries?" Once elected, the leader skips Phase 1 for subsequent entries — Phase 1 only needs to run once per leadership term.
Each new log entry takes one round (Phase 2 only) under the stable leader.
A new leader runs Phase 1 for all unfilled log positions to learn what (if anything) was previously chosen there.

If this description sounds suspiciously like Raft, that is because Raft is, in effect, Multi-Paxos with the algorithm structure rearranged for clarity. Diego Ongaro has said as much.

Variants Worth Knowing

The Paxos family has accumulated dozens of variants. The ones that show up in real systems:

Fast Paxos (Lamport, 2005) — saves one round trip in the uncontested case, at the cost of needing a larger quorum (3/4 instead of majority). Used in some Kubernetes-adjacent systems.
Cheap Paxos — uses auxiliary acceptors only when needed, reducing the steady-state cost.
EPaxos (Egalitarian Paxos, 2013) — leaderless. Any node can propose; commands that do not interfere commit in one round trip. Used in some Cassandra-adjacent systems.
Flexible Paxos (Howard et al., 2016) — proves that Phase 1 and Phase 2 quorums need not be the same size, only that they intersect. Opens up new latency / fault-tolerance trade-offs (see Quorum Systems).
Generalized Paxos — exploits commutative operations to commit non-conflicting ones concurrently.

Most production systems use Multi-Paxos with stable leader. The fancier variants are usually overkill.

Production Lessons

The Google paper Paxos Made Live (2007) is the canonical document on what you actually have to deal with when running Paxos. Key takeaways:

Disk corruption matters. Acceptors must remember their promised numbers across crashes. A misbehaving disk that loses or rewrites this state breaks safety. Real implementations checksum, replicate, and treat disk corruption as a first-class fault mode.
Master leases need careful handling. A leader that thinks it is still the master after losing quorum can serve stale reads. Most production systems use bounded clock leases tied to the master's term.
Snapshotting and log compaction are most of the code. The basic protocol is small; making it run forever is large.
Configuration changes (cluster membership) are the bug hotspot. The original Paxos paper barely covers this; production systems use joint consensus (membership-change rounds in both old and new configs) or the Stoppable Paxos / SMART approach.

These lessons apply equally to Raft.

Paxos vs Raft

Identical safety guarantees. Same fault tolerance ((N-1)/2 failures with N nodes). The differences:

Aspect	Paxos	Raft
Pedagogy	Famously confusing	Designed for understandability
Log structure	Implicit (entries can have holes)	Explicit, contiguous
Leader election	Implicit (proposer with highest n wins)	Explicit, with terms and votes
Membership changes	Underspecified; many variants	Joint consensus, well-defined
Default production choice	Older systems (Chubby, Spanner)	Newer systems (etcd, Consul, CockroachDB)

If you are choosing today: pick Raft. If you are reading code from Google, Amazon, or pre-2014 academia: you need to be able to read Paxos.

Pre-commit Checklist

Did I confirm my Paxos library checksums acceptor state and treats corruption as a fault?
Are proposal numbers globally unique and monotonically increasing across restarts? (Tuple of (persistent_counter, node_id) is the standard recipe.)
For Multi-Paxos, does the new leader's Phase 1 cover all unfilled log positions, not just the next one?
If I am using a Paxos variant (Fast / EPaxos / Flexible), do I understand the quorum size implication?
Do I have an answer for "what does this cluster do during a partition?" — the answer should be "stops committing," not "diverges silently."

Paxos

Paxos

What Paxos Solves

The Three Roles

Single-Decree Paxos

Phase 1: Prepare / Promise

Phase 2: Accept / Accepted

Why It Is Safe

Multi-Paxos

Variants Worth Knowing

Production Lessons

Paxos vs Raft

Further Reading

Pre-commit Checklist

On this page