Event-Driven Architecture
Notification vs event-carried state transfer vs event sourcing — the three flavors of "event-driven" that get conflated, and what each one actually buys
Event-Driven Architecture
"Event-driven" is one of the most overloaded terms in modern architecture. It can mean a single webhook, a Kafka pipeline carrying full state changes, a system that derives its truth from an immutable log, or anything in between. Each of these has different consistency properties, different operational requirements, and different failure modes. Calling them all "event-driven" hides the choice that actually matters.
This page is about the three distinct flavors Martin Fowler identified in his 2017 essay — notification, event-carried state transfer, and event sourcing — what each gives you, what each costs, and how to recognize which one you actually need.
What an Event Is
An event is a record of something that has happened. Past tense matters: an event is a fact about the world, not a request. UserSignedUp, OrderPlaced, PaymentFailed. The producer of the event does not know or care who will consume it; the event simply is.
Contrast with commands, which are imperative — CreateUser, PlaceOrder, ChargeCard — sent to a specific recipient who is expected to act. Commands and events look similar on the wire but have different semantics:
- A command can be rejected. An event is already true.
- A command has one recipient. An event has zero or many.
- A command is in present tense ("do this"). An event is in past tense ("this happened").
Mixing these up — sending events as commands or commands as events — is a common source of architectural confusion. Be deliberate about which one you are publishing.
The Three Flavors
The "event-driven" term covers three patterns that share message-passing infrastructure but differ fundamentally in what the event carries and what consumers do with it.
Flavor 1: Notification
The event signals that something happened. Consumers who care about the event have to ask the producer for more information.
OrderService: publish OrderPlaced { orderId: 7 }
ShippingService:
receives OrderPlaced
→ GET /orders/7 (calls back to OrderService)
→ use response to schedule shipmentProperties:
- The event is small — just an identifier and enough context to look up the rest.
- The producer remains the source of truth; consumers query it as needed.
- Tight coupling on the producer's API: a consumer cannot work without the producer being available.
- Low data duplication, simple replay semantics.
Use when: you want the timing benefits of asynchrony (decouple when work happens) but the source of truth must stay with the producer. Webhooks are notifications. CloudEvents in their minimal form are notifications.
Flavor 2: Event-Carried State Transfer
The event carries enough state for consumers to act without calling back to the producer.
OrderService: publish OrderPlaced {
orderId: 7,
userId: 42,
items: [{ sku: "ABC", qty: 2, price: 50 }],
shippingAddress: {...},
total: 100
}
ShippingService:
receives OrderPlaced
→ uses event payload directly, no callback needed
→ schedules shipmentProperties:
- The event is larger — contains the relevant state at the time of the event.
- Consumers can work without the producer being available.
- Each consumer typically maintains its own materialized view of the data it cares about.
- Stronger decoupling, more data duplication.
Use when: you want consumers to operate independently of producer availability, and the data shape needed by consumers is reasonably stable. This is the most common shape in modern microservice systems.
Flavor 3: Event Sourcing
The events are the system of record. State is derived by replaying events. See Event Sourcing for the dedicated treatment.
OrderService stores no "Order" table.
It stores: OrderCreated, ItemAdded, ItemAdded, AddressUpdated, PaymentReceived, ...
To answer "what is the current state of order 7?":
replay all events for orderId=7 in order
→ derive current stateProperties:
- The log is the truth. Current state is a projection.
- Full audit trail by construction.
- "Time-travel debugging" — you can replay to any point in history.
- Significantly higher complexity than the other flavors.
Use when: the audit log is a first-class requirement (financial systems, regulated industries), or you genuinely need to derive new views from history (analytics, retroactive corrections). Almost never the right starting point.
Comparing the Three
| Property | Notification | State Transfer | Event Sourcing |
|---|---|---|---|
| Event size | Tiny (ID + metadata) | Substantial | Whatever the change was |
| Coupling | Consumers depend on producer's API | Loose | None to producer; coupled to event schema |
| Data duplication | None | Per-consumer materialized views | The log itself is the data |
| Producer availability for consumers | Required | Not required | Not required |
| Complexity | Low | Medium | High |
| Audit trail | If you build it | If you build it | Built in |
| Replay semantics | Re-fetch from producer | Re-process the event stream | Re-derive any historical state |
Many real systems mix flavors. A typical pipeline uses state-transfer events for inter-service communication and adds a notification webhook for external integrations. Event sourcing can be used in a specific service while the rest uses state-transfer events.
Pub/Sub vs Queue Semantics
Orthogonal to the three flavors is the delivery model:
- Pub/sub: the broker fans out each event to every interested consumer. Each consumer maintains its own subscription state and offset. Adding a new consumer is a configuration change at the broker, not at the producer. Kafka, NATS, EventBridge.
- Queue: each message is delivered to exactly one worker from a pool. Used for distributing work, not for broadcasting facts. SQS, RabbitMQ in default mode, traditional message queues.
Event-driven architectures usually want pub/sub: the producer should not know who is listening, and adding consumers should not require producer changes. Queues are appropriate for command-like patterns ("process this work item once") more than event-like patterns.
Benefits of Event-Driven
The wins that justify the cost when the cost is justified:
- Temporal decoupling. The producer does not wait for consumers. A spike in event volume becomes a queue depth issue, not a cascading-failure issue.
- Add consumers without producer changes. A new analytics pipeline subscribes to an existing event stream without the producer team's involvement.
- Natural audit trail. The event stream is, by definition, a log of what happened. Whether you keep it for hours or years is a retention decision.
- Asynchronous patterns become natural. Background jobs, fan-out to N services, retry on failure — these are first-class concerns in an event-driven system instead of bolt-on hacks.
- Replay for debugging or recovery. If a consumer has a bug, fix it and replay from the appropriate offset. Without an event log, this option does not exist.
Costs of Event-Driven
The honest other side:
- Eventual consistency is the default. Consumers process events at their own pace. A user who places an order and immediately checks "my orders" may not see it yet. The application has to surface this somehow.
- Debugging is harder. A request that fails in a synchronous system has a stack trace. An event that fails to process has to be reconstructed from logs across consumers. Distributed tracing helps but is more expensive.
- Schema evolution matters more. A consumer reading events from six months ago needs to handle the schema that existed then. Either events are versioned or every change must be backward-compatible.
- Ordering becomes a concern. Within a partition / consumer / topic the broker can guarantee order; across partitions it usually cannot. Code must not assume strict order without verifying the guarantee.
- Idempotency is not optional. At-least-once delivery is the operational reality (see Exactly-Once Semantics). Every consumer needs to handle duplicates (Idempotency).
- The infrastructure tax. Running a broker (Kafka cluster, RabbitMQ cluster, or managed service) is real operational work. Plus the per-team work to use it correctly.
When Event-Driven Is the Right Default
- Multi-service systems where many consumers care about a small number of state changes. Event streams scale this pattern cleanly; direct calls do not.
- Workloads with bursty traffic. Async absorbs bursts; sync passes them through.
- Analytics / audit / ML pipelines downstream of operational systems. The same event stream feeds operational and analytical needs.
- Workflows that span minutes to days. Async messaging plus Saga handles long-running coordination better than holding open connections.
When Event-Driven Is the Wrong Default
- Small systems with few interactions. A two-service system communicating synchronously is easier to reason about than the same system communicating via a broker.
- Operations needing strong consistency on read-after-write. "Place order, immediately see order" requires either synchronous coupling or careful UI work.
- Teams without the operational maturity for a broker. Brokers are critical infrastructure; treating them as "just install Kafka" leads to outages.
Patterns You Will Compose With
If you go event-driven, several patterns become recurrent rather than situational:
- CQRS — splitting reads from writes pairs naturally with event-driven writes.
- Event Sourcing — when the event stream becomes the source of truth.
- Saga Pattern — multi-service workflows over an event bus.
- Outbox Pattern — the standard way to publish events reliably from a service that owns its own database.
- Idempotency — required at every consumer.
- Exactly-Once Semantics — the actual guarantees you can achieve.
Common Mistakes
- Conflating the three flavors. "We use events" without distinguishing notification vs state transfer vs sourcing produces architectures that combine the costs of all three with the benefits of none.
- Treating events as commands. Imperative messages on an event topic are an organizational smell — the producer thinks it knows what consumers should do.
- Skipping schema versioning. First refactor of an event payload breaks every consumer. Plan for versioning from day one.
- No idempotency at consumers. At-least-once delivery + non-idempotent consumer = production incidents.
- Adopting an event broker for a 3-service system. The infrastructure tax pays off at scale; below scale it is a net negative.
- Using a queue when you want pub/sub (or vice versa). The delivery model matters; pick the one that matches your pattern.
Further Reading
- Martin Fowler, What do you mean by "Event-Driven"? (2017) — the canonical essay that named the three flavors.
- Gregor Hohpe & Bobby Woolf, Enterprise Integration Patterns (2003) — the foundational pattern catalogue. Still the best reference for the underlying mechanics.
- Sam Newman, Building Microservices (2nd ed., 2021), Chapter 4 — practical event-driven patterns in a microservices context.
- Vaughn Vernon, Implementing Domain-Driven Design — covers domain events in the DDD context.
- Confluent's blog — many practical write-ups of event-driven patterns in production.
Pre-commit Checklist
- For each event in my system, do I know whether it is a notification, state transfer, or sourcing event?
- Is each event in past tense and describing a fact, not a command?
- Are my consumers idempotent — they can process the same event twice without harm?
- Have I planned for schema versioning, or am I betting on never needing to evolve event payloads?
- Do I have distributed tracing in place before the system grows large enough to need it for debugging?
- For each "we'll just use events" decision, can I name the consumers and what each one will do — or am I broadcasting to an empty room?