Event Sourcing
Storing the log, not the snapshot — what event sourcing actually requires, what it costs, and the workloads it is genuinely worth for
Event Sourcing
The default way to persist state is to store the current state: a row in a table that represents what the entity looks like right now. Each update overwrites the previous value. The history of how the entity got there is either lost or kept in audit logs as a side concern.
Event sourcing inverts this. Instead of storing current state, the system stores the sequence of events that produced the current state. The events are the system of record. Current state is a function of events: replay them in order and derive whatever shape you need. Nothing is overwritten; the past is immutable.
This is one of the most powerful and most over-applied patterns in software architecture. Done right, in the right place, it produces systems with audit trails, time-travel debugging, and retroactive analytics that no other pattern can match. Done wrong, in the wrong place, it produces a complex distributed log that the team cannot operate and whose benefits never materialize. This page is about telling the difference.
What Event Sourcing Is
A traditional CRUD service stores a snapshot of state:
table: accounts
| id | balance |
|----|---------|
| 7 | 130 |An event-sourced service stores the events that produced that state:
table: account_events
| seq | account_id | event_type | payload | timestamp |
|-----|------------|-----------------|------------------|-----------|
| 1 | 7 | AccountOpened | {initial: 100} | t=00:00 |
| 2 | 7 | Deposited | {amount: 50} | t=00:01 |
| 3 | 7 | Withdrawn | {amount: 20} | t=00:02 |To answer "what is account 7's balance now?", the service replays all events for account 7:
state = empty
event 1 (AccountOpened, 100) → state = { balance: 100 }
event 2 (Deposited, 50) → state = { balance: 150 }
event 3 (Withdrawn, 20) → state = { balance: 130 }The balance is derived, not stored.
What Makes It Different
The pattern looks superficially similar to keeping an audit log, but the difference is foundational:
| Aspect | Audit log | Event sourcing |
|---|---|---|
| Current state | The source of truth | A derivation |
| Audit log | A secondary record | The source of truth |
| Restore from log | Possible but fragile | Built in |
| Add new derivation | Requires backfill | Replay |
| Schema of "now" | Fixed in the table | Determined by the replay code |
In an audit log system, the table is the truth and the log is a transcript. In event sourcing, the log is the truth and any "table" is a current-state cache (called a projection). You can throw away every projection and rebuild them from the events.
What Event Sourcing Buys You
The wins, where they are real:
- Full history, by construction. Every change is recorded. Auditors, debuggers, and regulators get answers without the team building an audit feature.
- Retroactive views. "We need to know how many users were active during each weekly window over the past year." A traditional database has lost this if you did not capture it; event sourcing can derive it from the existing log.
- Time-travel debugging. "What was the system state when this bug occurred?" Replay events up to that point.
- Correctable history. A bug in your business logic charged customers wrong amounts for two months. With event sourcing, you can rerun the corrected logic against the original events and recompute what the state should be.
- Multiple read models from one source of truth. A new dashboard, a new analytics query, a new ML training set — derive each from the event log without changing the write side.
- Natural fit for event-driven downstream. The event log is, by definition, an event stream. Other services subscribing to your events get the same view your projections do.
What Event Sourcing Costs
The honest other side, often understated in the pattern's enthusiastic literature:
Operational Complexity
- Storage grows monotonically. Without intervention, every event ever written is kept forever. Snapshots (below — and see Distributed Snapshots for the formal underpinning used by stream processors like Flink) help but add their own complexity.
- Schema evolution of events. An event written six months ago must still be replayable today. The replay code must handle every historical event shape. There is no
ALTER TABLEto fix a poorly-named field. - Projections can fall behind, get corrupted, or need rebuilding. A bug in projection code produces wrong reads until detected and fixed. Recovery procedure is rebuilding from events — fine if you have planned for it, hours of downtime if you have not.
- Distributed projections are an eventual-consistency problem. Reads against a projection lag the write side. (CQRS discusses this.)
Cognitive Complexity
- Engineers must think in events, not state. A new developer's first instinct is to update a row; event sourcing requires modeling "what happened" instead. Onboarding is harder.
- Domain modeling is harder upfront. Events must be well-named and well-shaped before they are written, because they cannot be changed afterward. Get the event design wrong and you live with it forever.
- Querying is harder. "Show me all accounts with balance > 1000" is a SQL query against a CRUD table; against an event store it requires either a maintained projection or a full replay.
Specific Hard Problems
- Deleting personal data (GDPR, CCPA). "Right to be forgotten" conflicts with "the log is immutable." The standard answer is crypto-shredding: store sensitive fields encrypted with a per-user key, delete the key when the user requests deletion. The events remain but are unreadable. This works but is real engineering.
- Cross-aggregate consistency. Two events that should be atomic ("debit from A, credit to B") cannot be a single event because each belongs to its own aggregate. The solution is a saga — which is itself complex.
- Bug fixes can rewrite history if you are not careful. Event sourcing tempts the worst version of "let's just re-run with the fix and update the events." Events are immutable. Corrections are new events ("Reversal", "Adjustment") added to the log.
Snapshots
Replaying years of events on every read is impractical. Snapshots are periodic captures of derived state, stored alongside the events. To get current state, load the most recent snapshot and replay only the events since then.
event 1, event 2, ..., event 100, [snapshot at event 100],
event 101, ..., event 153
To get current state: load snapshot at 100, replay events 101..153.Snapshots are an optimization, not part of the source of truth. Throwing away snapshots and rebuilding them is a valid operational step. The event log remains the truth.
The snapshot interval is a tuning parameter: too frequent and you pay storage; too rare and replays are slow.
When Event Sourcing Is Worth It
The patterns where event sourcing pays back its cost:
- Audit is a hard requirement. Financial systems, healthcare, regulated industries. The audit log is not a feature; it is the deliverable. Building event sourcing in is cheaper than maintaining a parallel audit system.
- The history itself is the product. Version control (Git is event-sourced). Bookkeeping (a ledger is event-sourced). Insurance claim processing. Order management systems.
- Retroactive analytics matter. You routinely need to answer questions about historical state shape that were not anticipated when the system was built. Event sourcing makes this a query problem rather than a data-loss problem.
- Domain is genuinely event-shaped. Workflows where "things happen and then more things happen" is the natural mental model. Trading systems, IoT, supply chain.
When Event Sourcing Is Not Worth It
A long list:
- CRUD apps with no audit requirement. A user record, a product catalog, a CMS. The audit log is not the product; event sourcing is overhead.
- Small teams. The operational burden — schema evolution discipline, projection rebuild procedures, replay infrastructure — is real labor that small teams cannot afford.
- Domains you do not understand yet. Event sourcing freezes the event vocabulary. Getting it wrong is expensive. If you do not yet know the domain, do not commit to immutable event shapes.
- Read-heavy systems where freshness matters. Eventual consistency between events and projections means reads lag writes. If your users notice, design around it explicitly or do not adopt the pattern.
- You think it sounds cool. This is the most common reason event sourcing gets adopted, and the most common reason it fails.
Practitioners with experience consistently report: event sourcing is right for a minority of services, even within systems that benefit from it overall.
CQRS and Event Sourcing
The two are often paired but are separate choices. See CQRS for the relationship in detail. Briefly:
- CQRS without event sourcing: common. Write side updates state directly; projections build read models.
- Event sourcing without CQRS: rare. Events stored, but one read model.
- CQRS with event sourcing: the "full" pattern. Events as truth, projections as read models.
The combination is more complex than either alone. Adopt it only when both are independently justified.
Common Mistakes
- Treating events as commands or DTOs. Events describe what happened in business terms (
OrderPlaced,PaymentReceived). Naming them like commands (CreateOrder,ProcessPayment) is a sign the model has not been thought through. - Storing technical events.
RowUpdated,FieldChanged. These are not domain events; they are an audit log of the database. The point of event sourcing is to capture meaning, not bytes. - Mutable events. Editing an old event "to fix" something. The whole premise of the pattern is immutability. Corrections are new events.
- No snapshot strategy. Replay times grow linearly with history. Without snapshots, eventually every read is slow.
- Skipping schema versioning. Events from a year ago must replay against today's code. Without versioning, you cannot evolve.
- Adopting it for the whole system. Event source the few aggregates where it earns its cost; leave the rest as CRUD.
Further Reading
- Greg Young, Event Sourcing (talk and writings, 2010s) — the canonical practitioner. Has spent the past 15 years refining and clarifying.
- Vaughn Vernon, Implementing Domain-Driven Design — Chapter 8 on Domain Events; pairs with the broader DDD context.
- Martin Fowler, Event Sourcing (2005) — early influential essay.
- Mathias Verraes, blog posts on event modeling — the most accessible practitioner writing in the past decade.
- Adam Dymitruk & Greg Young, Event Modeling — a methodology for designing event-sourced systems.
- Marten / EventStoreDB / Axon documentation — production-grade frameworks; reading their docs reveals what the pattern actually requires.
Pre-commit Checklist
- For each event-sourced aggregate, is event sourcing earning its cost via audit, retroactive analytics, or event-shaped domain — not just "it sounded better"?
- Are events named in past-tense business terms, not as commands or technical operations?
- Have I planned for schema versioning of events from day one?
- Is there a snapshot strategy and a rebuild procedure for projections?
- For deletion-of-personal-data requirements, do I have crypto-shredding or equivalent in place?
- For "let me just fix this old event" instincts: am I writing a corrective new event instead?
- Is the rest of the system CRUD, or have I event-sourced things that did not need it?