Steven's Knowledge

Distributed Locks

Why "just use Redlock" is rarely the answer — locks across nodes, fencing tokens, lease semantics, and when not to use a lock at all

Distributed Locks

A distributed lock is mutual exclusion across machines: at most one node holds the lock at a time, and the lock holder believes it has exclusive access to some resource. In theory, this is exactly the same as a Mutex in your local process. In practice, it is one of the most subtly broken primitives in distributed systems, and most production uses of it are wrong in ways that surface only under load or partial failure.

This page is about what a distributed lock can actually guarantee, the famous Redlock argument, fencing tokens as the real solution, and the surprisingly common case where you should not be using a lock at all.

What Makes Them Hard

Single-process mutex correctness assumes:

  • The locker either holds the lock or it does not.
  • Lock state is consistent across observers.
  • Time advances at a single rate.

Across machines none of these hold:

  • A client thinks it holds the lock; the lock service thinks the lock expired and gave it to someone else.
  • Different observers (the lock service, the client, the resource) can disagree on who has the lock at any moment.
  • Clocks drift, processes pause for GC, VMs get migrated. "Five seconds" on the client and on the lock server are different durations.

A correct distributed lock therefore needs:

  • Mutual exclusion — at most one client believes it holds the lock at any time, or the resource itself enforces the exclusion.
  • Liveness — eventually the lock can be acquired (no permanent deadlock from a crashed holder).
  • Fault tolerance — survives some node failures.

The first two pull in opposite directions. To get liveness, locks need a timeout / lease. To get mutual exclusion under a lease, the resource needs to reject operations from a stale lock holder — which the lock service alone cannot enforce.

The Lease-Based Model

Almost every production lock service (Chubby, ZooKeeper, etcd, Consul) is lease-based:

  1. Client acquires a lock with a TTL (e.g., 30 seconds).
  2. Client renews the lease before it expires.
  3. If the client crashes or partitions away, the lease expires and the lock becomes available.

The model is sound but subject to a sharp failure mode: GC pause, slow disk, VM migration. The lock holder pauses for 60 seconds, the lease expires at 30 seconds, another client acquires the lock, and then the original holder wakes up still believing it has the lock and proceeds to write to the shared resource. Two clients are now writing concurrently.

Client A: acquires lock, TTL=30s
          [GC pause begins at t=10s]
          
Lock svc: lease expires at t=30s, available

Client B: acquires lock at t=31s
Client B: writes to resource

Client A: [GC pause ends at t=70s]
Client A: still has its "I hold the lock" memory
Client A: writes to resource          ← collision

No amount of cleverness in the lock service prevents this. The fix has to live at the resource.

The Redlock Controversy

Salvatore Sanfilippo proposed Redlock in 2014: acquire the same lock on a majority of independent Redis instances, with carefully chosen TTLs and clock-bound assumptions. It became widely used because Redis was already deployed everywhere.

Martin Kleppmann argued in 2016 that Redlock is broken: under the GC-pause scenario above, it provides no more safety than a single-instance lease. The exchange between Sanfilippo and Kleppmann is worth reading in full; the technical core is:

  • Sanfilippo's claim: Redlock provides safety as long as clocks are bounded and processes do not pause for too long.
  • Kleppmann's counter: "as long as clocks are bounded" is exactly the assumption that fails in real systems, and a lock primitive whose safety depends on bounded clocks and bounded pauses is too weak to use for any operation where two-writer collision is unacceptable.

The practical takeaway: if you need a lock to enforce safety on a resource, the resource must use fencing tokens, regardless of which lock service you choose. Redlock vs etcd-lock vs Chubby is a secondary question; fencing is the primary one.

Fencing Tokens

Kleppmann's prescribed solution. The lock service returns not just "you have the lock," but a monotonically increasing token with each acquisition. The resource records the highest token it has ever processed; any operation with a lower token is rejected.

Lock svc gives client A: token=33  (then A pauses)
Lock svc gives client B: token=34
Client B writes to resource with token=34: accepted, resource records 34
Client A wakes up, writes with token=33: REJECTED (33 < 34)

This works regardless of GC pauses, clock skew, or which lock service is underneath. The resource is the source of truth about who held the lock most recently.

Requirements for fencing:

  • The token must be monotonically increasing. Most lock services provide this (ZooKeeper's czxid, etcd's revision, Consul's modify index).
  • The resource must store and check the token. This is the constraint that bites: many storage systems do not provide CAS on a token, so applications fake it with extra columns or wrappers. Implement carefully.
  • The token must be passed through to the resource on every operation. A lock holder cannot rely on "I checked at the start"; it must include the token in every write.

Lock Services in Practice

  • Chubby (Google, internal) — the canonical lease-based lock service. Built on Paxos. Provides fencing via "instance ID" returned with each lock acquisition.
  • ZooKeeper — ephemeral znodes for locks; czxid provides the fencing token. Wide adoption, mature.
  • etcd — leases with revision numbers. Modern Go ecosystem default. Used by Kubernetes for leader election.
  • Consul — sessions and lock APIs with index numbers.
  • Redis (single instance) — fast and simple, but no replication safety; Redlock attempts to fix this.

If you need a real distributed lock, pick from this list. Build on top of CAS in your existing database before reaching for a separate lock service if you can — fewer moving parts.

When You Should Not Use a Lock

Many uses of distributed locks have better alternatives:

  • Idempotency. If the operation is idempotent (Idempotency), you do not need exclusive access — duplicate execution is safe by construction. "Lock the user record while charging" → "use an idempotency key on the charge."
  • Compare-and-swap. "Only update if version matches" gives single-key exclusion without a lock service. Most databases provide this; SQL UPDATE ... WHERE version = ? is the cheapest distributed lock you will ever build.
  • Single-writer partition. Route all writes for key K to the same node (consistent hashing, partition assignment). Single-writer per partition is structurally safer than locking across writers.
  • Optimistic concurrency. Read the current state, compute the change, write conditionally. Retry on conflict. No lock involved.
  • Leader election for a service-wide singleton. This is what lock services are good at — ZooKeeper or etcd, with fencing tokens for the leader. Different from per-record locking.

The instinct to reach for a distributed lock is often a signal that the design has too much shared state. Restructuring to remove the shared state (sharding, idempotency, CAS) is usually cheaper than building a correct lock-based system.

Common Mistakes

  • Using a TTL-based lock without fencing. The system seems to work in testing. It will fail under load when GC happens at the wrong moment.
  • Renewing the lease in the same critical section as the work. If the work itself blocks (slow database query), the lease expires while you are still working. Renewal should be on a separate timer with conservative thresholds.
  • Using wall-clock time to compute TTL. Clock jumps or NTP corrections can expire your lease early. Use monotonic clocks (CLOCK_MONOTONIC on Linux) for relative timing.
  • Relying on the lock for correctness, not just performance. A lock used only to prevent thundering-herd is much easier than one used for safety. If two writers cannot collide, you need fencing or you need to redesign.
  • Cross-region locks. A lock acquired in us-east-1 whose resource lives in eu-west-1 has 100ms RTT and partition risk. Almost always a sign the design is wrong.

Further Reading

  • Burrows, The Chubby Lock Service for Loosely-Coupled Distributed Systems (OSDI 2006) — Google's foundational paper. Required reading.
  • Kleppmann, How to do distributed locking (2016) — the fencing-token argument; the post that made the case mainstream.
  • Sanfilippo, Is Redlock safe? (2016) — Redlock author's reply. Read both sides.
  • Hunt et al., ZooKeeper: Wait-free coordination for Internet-scale systems (USENIX ATC 2010) — the ZooKeeper paper.
  • Junqueira & Reed, ZooKeeper: Distributed Process Coordination (2013) — the book; in-depth practitioner material.

Pre-commit Checklist

  • Does the resource being protected check fencing tokens, or am I trusting "the lock service said I have it"?
  • Are TTLs and renewals on monotonic clocks, not wall-clock?
  • Is lease renewal in a separate timer from the work, with conservative margin?
  • Did I consider alternatives (idempotency, CAS, single-writer partitioning) before reaching for a lock?
  • For "this lock guarantees safety" claims, what happens if the holder pauses for 60 seconds? Have I tested that?
  • If I cannot add fencing to the resource, can I redesign so that two-writer collision is not catastrophic?

On this page