Two-Phase Commit (2PC) & Three-Phase Commit (3PC)
Classic distributed transactions — blocking semantics, why 2PC stalls participants, what 3PC was supposed to fix, and when 2PC is still the right call
Two-Phase Commit (2PC) & Three-Phase Commit (3PC)
2PC is the textbook algorithm for committing a transaction across multiple participants atomically. It does what people expect of a "distributed transaction": either all participants commit, or all abort, with no partial state. It has been around since 1978 (Gray, Lampson) and is still inside every XA transaction manager, every Postgres PREPARE TRANSACTION, every Kafka transaction coordinator.
It also has one well-known failure mode — blocking — that makes it operationally unattractive at the scales most modern services run at. This page is about how 2PC works, what 3PC adds, and where in the design space these protocols belong relative to Saga and TCC.
What 2PC Solves
The problem: a transaction spans multiple participants — databases, services, message queues. You need an outcome where either all of them commit or all abort. A single-node BEGIN ... COMMIT cannot reach across the network; some coordination protocol is required.
The standard cast:
- Coordinator — the process that runs the protocol. Often the originating client or a transaction manager (XA, Postgres, Kafka).
- Participants — the data stores or services that hold the affected state. Each can independently commit or abort its local part.
The Two Phases
Phase 1: Prepare (Voting)
The coordinator sends PREPARE to every participant.
Each participant:
- Performs the work (writes locks, validates constraints).
- Persists enough state to commit or abort on demand. This is the prepared state — locks are held, but the transaction is not yet committed.
- Replies
YES(ready to commit) orNO(must abort).
If a participant crashes between receiving PREPARE and replying, it must, on recovery, be able to find the prepared transaction and ask the coordinator for the outcome.
Phase 2: Commit / Abort
If all participants voted YES, the coordinator writes a commit decision to its own durable log, then sends COMMIT to all participants.
If any voted NO, or any failed to respond, the coordinator writes an abort decision and sends ABORT to all participants.
Each participant, on receiving the decision, applies it locally and acknowledges.
Coordinator Participants
│ │
│── PREPARE ──────────────────────▶ │
│ │ writes prepared state,
│ │ holds locks
│ ◀── YES / NO ──────────────────── │
│ │
│ (decision: all YES → COMMIT) │
│ write COMMIT to local log │
│ │
│── COMMIT ────────────────────────▶ │ apply, release locks
│ │
│ ◀── ACK ───────────────────────── │The coordinator's local decision log is the point of no return. Once written, even if the coordinator crashes mid-broadcast, it (or its replacement) will replay the same decision.
Why 2PC Blocks
The failure mode everyone learns about: the coordinator crashes after participants have voted but before they have received the decision.
Coordinator Participants
│ │
│── PREPARE ──────────────────────▶ │ prepared, locks held
│ ◀── YES ──────────────────────── │
│ │
│ [coordinator crashes here] │
│ │
│ │ waiting...
│ │ waiting...
│ │ locks still held...The participants cannot decide on their own. They voted YES and must honor that vote until they hear from the coordinator. They cannot abort (the decision might have been commit). They cannot commit (the decision might have been abort). They block — holding locks, accumulating waiters, refusing other work — until the coordinator recovers.
This is the central operational problem with 2PC. In the steady state it works fine. Under coordinator failure, a participant can be stuck for the time it takes to recover the coordinator and replay its decision log. With cross-region setups or hot-standby coordinators, that is minutes; with manual recovery, hours.
3PC: The Attempt to Fix Blocking
Three-Phase Commit (Skeen, 1981) adds an intermediate phase to give participants enough information to recover without waiting for the coordinator.
The three phases:
- CanCommit — coordinator asks each participant if it can commit. Participants reply
YES/NObut do not prepare yet. - PreCommit — if all said
YES, coordinator sendsPRE-COMMIT. Participants now prepare (write to log, hold locks) and replyACK. - DoCommit — coordinator sends
COMMITonce allPRE-COMMITacks arrive.
The crucial property: in 3PC, all participants in PRE-COMMIT state know that all others have also acknowledged PRE-COMMIT. If the coordinator crashes after PRE-COMMIT but before DoCommit, participants can timeout and agree on commit among themselves — they all have evidence that all others were ready.
In theory, this makes 3PC non-blocking.
In practice, 3PC is almost never used. The reasons:
- 3PC assumes a synchronous network. The timeout-and-decide step only works if you can reliably distinguish "participant crashed" from "network is slow." Real networks are partially synchronous, and 3PC is not safe under network partition — it can lead to inconsistent decisions if a partition isolates participants in different phases.
- Extra latency. An extra round trip is paid on every transaction, not just during recovery.
- Higher implementation complexity with little benefit at the scale most systems operate.
The modern approach is to keep 2PC for the steady state and pair it with Paxos / Raft replicating the coordinator state, so coordinator failure no longer means an unrecoverable log. Spanner, CockroachDB, and Percolator all combine 2PC across shards with consensus-replicated coordinators.
When 2PC Is Acceptable
Despite its reputation, 2PC is the right answer in narrower contexts than people fear:
- Same datacenter, few participants, short transactions. Latency is low, coordinator failure is rare, and locks are held briefly. XA across two databases in the same datacenter is a textbook 2PC use case.
- Coordinator backed by consensus. Spanner-style: the coordinator's decision log is replicated via Paxos/Raft, so the "coordinator crash leaves participants stuck" problem becomes "coordinator failover takes a few hundred milliseconds." Recovery is automatic and bounded.
- You actually need strict atomicity. Money movement between accounts, inventory + payment together when partial state is unacceptable. Trade the operational risk for the correctness guarantee, with eyes open.
When 2PC is not the right answer:
- Long-running workflows. Hours-to-days operations should use Saga, not 2PC. Holding locks for hours is operationally impossible at any scale.
- Cross-organizational transactions. You will not get participants from another company to participate in your 2PC protocol. Sagas with compensation, or TCC with explicit reservation, are the right tools.
- High-throughput / low-latency systems. Every cross-shard 2PC costs at least 2 RTTs plus durable writes on every participant. At very high throughput this becomes a dominant cost.
- Operations where eventual consistency is acceptable. Most product analytics, recommendation pipelines, and non-financial systems do not need atomic commits. Pay the right cost for the right requirement.
2PC vs Saga vs TCC
The three families of "distributed transactions" are not interchangeable; each occupies a different cell in the design matrix.
| Property | 2PC | Saga | TCC |
|---|---|---|---|
| Isolation | Strong (locks held until commit) | None (intermediate states visible) | Weak (resources reserved, not locked) |
| Duration | Seconds | Hours to days | Seconds to minutes |
| Coupling | Tight (synchronous, same protocol) | Loose (events / orchestration) | Medium (explicit try/confirm/cancel) |
| Compensation | Built-in (abort phase) | Application-defined per step | Application-defined cancel |
| Failure recovery | Coordinator-driven, can block | Saga-coordinator-driven, never blocks | Coordinator-driven, never blocks if confirm/cancel are idempotent |
| Use case | ACID across databases | Long-running business workflows | Cross-service operations with reservable resources |
The honest framing: 2PC for "must commit atomically right now," Saga for "long process tolerant of intermediate states," TCC for "we need atomicity but cannot afford locks."
Production Lessons
- In-doubt transactions are an operational reality. A coordinator failure in 2PC leaves prepared transactions in participants' logs. Have runbooks for finding and resolving them. Postgres has
pg_prepared_xacts; Oracle has DBA_2PC_PENDING. Know the equivalent for your stack. - Lock duration is the killer. A 2PC transaction holds locks across the full prepare-decide-commit cycle. Network latency or coordinator slowness directly multiplies lock duration, which multiplies contention.
- Coordinator failover must be automatic. A 2PC coordinator that requires manual recovery is the worst case: locks are held until a human notices. Make coordinator recovery automatic, or use a system (CockroachDB, Spanner) where it already is.
- Avoid 2PC across regions. The latency makes it operationally impractical. If you must, replicate the coordinator within each region and use a single-region 2PC plus async replication out.
Further Reading
- Gray, Notes on Data Base Operating Systems (1978) — the original framing of 2PC.
- Lampson & Sturgis, Crash Recovery in a Distributed Data Storage System (1976) — predates Gray, contains the same idea.
- Skeen, Non-Blocking Commit Protocols (SIGMOD 1981) — the 3PC paper.
- Bernstein, Hadzilacos, Goodman, Concurrency Control and Recovery in Database Systems (1987) — textbook treatment; still the most thorough.
- Corbett et al., Spanner: Google's Globally-Distributed Database (OSDI 2012) — production 2PC with Paxos-replicated coordinator.
- Peng & Dabek, Large-scale Incremental Processing Using Distributed Transactions and Notifications (OSDI 2010) — Percolator. 2PC at Bigtable scale.
Pre-commit Checklist
- For each 2PC transaction in my system, do I know what happens if the coordinator crashes mid-protocol? Is recovery automatic?
- Are my participant locks held for milliseconds or seconds, not minutes?
- Do I have runbooks (and monitoring) for in-doubt prepared transactions?
- Have I considered whether Saga or TCC would meet my actual requirement at lower operational cost?
- For cross-region operations, am I avoiding cross-region 2PC?
- Is the coordinator itself fault-tolerant (consensus-backed, not a single process)?