When to Split
The operational signals that justify going from monolith to services — pressure-based, not aspirational, and how to extract the first one
When to Split
The decision to break a system into more deployable units — extracting a service from a monolith, splitting a modular monolith, decomposing further — should be driven by operational pressure, not by architectural fashion. This page is about the specific signals that justify a split, the signals that look like justification but are not, and a low-risk extraction process for when the case is real.
If you have not read Monolith, Modular Monolith, and Microservices, read those first. This page is about the decision, not the destination.
The Premise
Every split has a cost: the distribution tax, the operational burden, the coordination overhead, the new failure modes. The split must pay for itself with a benefit that the monolith — even a well-organized one — could not deliver.
The default is stay. Splits should require justification, not the other way around. A team that splits by default is the team that ends up with a distributed monolith.
Real Signals
The signals that justify splitting are operational, measurable, and specific:
1. Deploy Pain Has Become Real
Releases are slow, scary, and infrequent. Engineers schedule deploys. Hotfixes carry unrelated work. Rollbacks affect unrelated features. The monolith has become a coordination bottleneck.
Measure: time from commit to production, deploys per week, rollback frequency.
What to do: if deploy pain is the symptom, splitting helps because each service deploys independently. But before splitting, exhaust deployment-pipeline improvements (feature flags, canary deploys, blue-green) that may resolve the pain at lower cost.
2. Test Cycle Time Is Hurting Velocity
The test suite takes 30+ minutes. Engineers commit and walk away; come back to merge conflicts. Pre-merge tests are sampled, not exhaustive. CI capacity is a constant constraint.
Measure: CI run time, time engineers spend waiting per day.
What to do: test parallelization, selective test execution by changed files, and proper test isolation often help more than splitting. Split only if the test surface area itself is the problem, not the configuration.
3. One Component Has Different Scaling Needs
The recommendation engine needs 50 replicas; the rest of the system needs 3. Scaling the whole monolith to 50 wastes 47 copies of everything else. The image-processing module CPU-spikes during uploads; the rest of the system is memory-bound.
Measure: CPU/memory/IO profile per component, hours of over-provisioned capacity per month.
What to do: this is one of the strongest cases for splitting. The component with divergent scaling needs becomes its own service, sized independently.
4. Team Size Has Outgrown Single-Codebase Coordination
More than ~30-50 engineers committing to one codebase. Merge conflicts are constant. Release coordination requires meetings. Code ownership is unclear. Onboarding takes months because the system is too entangled to learn.
Measure: number of engineers active in the codebase, hours per week spent on coordination overhead, time-to-first-meaningful-commit for new hires.
What to do: splitting along team boundaries lets teams move in parallel. Pair with Conway's Law reasoning: design the team structure first, the service structure follows.
5. Failure Isolation Is Required
A bug in module A takes down the entire system. Memory leaks in one component crash the rest. Compliance or regulatory requirements demand isolation.
Measure: incident attribution, blast radius of past outages.
What to do: splitting the component that requires isolation into a separate service contains the failure. Service crashes are local; one bad release affects one service.
6. Stack Divergence Has Become Necessary
A subsystem needs a language, runtime, or library the monolith cannot accommodate. ML inference in Python from a JVM monolith. A real-time component in Go from a Rails app. A specialized database for graph queries.
Measure: what would be required to add the new stack to the monolith vs split it out.
What to do: splitting the divergent component into a separate service is often easier than retrofitting the monolith to support the new stack.
7. Independent Release Cadence Is Genuinely Needed
This module needs to ship 10 times per day; the rest ships weekly. Coupling them either holds back the fast module or destabilizes the slow ones.
Measure: ideal release frequency per component vs actual.
What to do: the component needing fast iteration is a natural split candidate.
False Signals
These look like justification but rarely are:
"Microservices Are More Modern"
Modernity is not a benefit. Pattern adoption by industry leaders is observed because they have particular problems; you may not share those problems.
"We Want to Be Cloud Native"
Cloud-native is an operational style, not an architecture. A monolith deployed via container orchestration is cloud-native. Splitting because of buzzword alignment does not solve any specific problem.
"We Heard Monoliths Don't Scale"
Empirically wrong. Shopify, Stack Overflow, GitHub, Basecamp, and many others run very large monoliths in production. Scale is a pressure source; it is not a forcing function for splitting.
"Resume / Career Development"
Engineers wanting microservices on their resume is a real human dynamic but a terrible architectural reason. The cost of the wrong architecture outlasts any individual's tenure.
"We Want Better Testability"
Testability is achieved by good module boundaries, not by network calls. A modular monolith with clean boundaries tests better than tangled microservices.
"It Will Help Us Avoid Conflicts"
Merge conflicts are usually a tooling and discipline problem, not a topology problem. Splitting introduces new coordination costs (API versioning, contract negotiation) that can be larger than the conflicts they were meant to avoid.
The Extraction Process
When a real signal justifies splitting, the lower-risk path:
1. Pick the One Component With the Clearest Case
Resist the urge to redesign the whole system. Extract one component that has the strongest single justification. The first extraction is also the most expensive — operational infrastructure (CI/CD, observability, deployment pipeline, service mesh) has to be built. Subsequent extractions reuse what you built.
2. Make Sure It Is Already a Module
A component you can extract cleanly is one that already has a clean internal boundary. If the codebase is a big ball of mud, the first step is to refactor toward a modular monolith, then extract from that. Trying to extract from tangled code produces a mess in both places.
3. Define the New Service's Boundary in the Monolith First
Make the planned service's interface explicit inside the monolith. Turn its internal calls into calls to the planned interface. Run for weeks with the interface in place but no network hop. Once the interface is stable, the extraction is mechanical.
4. Build the Operational Foundation
The first extraction is where you pay for two-service operations: separate deployment pipelines, distributed tracing, contract versioning, service discovery, on-call runbooks, idempotency. This is non-trivial work that pays off over many future extractions.
5. Extract With a Strangler Pattern
The monolith calls the new service through the same interface, but now the implementation makes a network call instead of an in-process call. Both implementations can coexist during transition; switch with a feature flag, roll back if needed.
6. Run It Long Enough to Find What Hurts
Resist extracting more services until you have operated the first split for at least several months. The pain of the first extraction reveals what your team's actual capacity for distributed-systems work is. Plan further extractions based on what you learned, not what you originally intended.
7. Only Extract More When the Case Repeats
The same operational signals apply to subsequent extractions. The fact that you successfully extracted one service does not mean every other module needs to be a service. Most modules should stay in the monolith.
Decision Heuristics
When the case is unclear, these heuristics help:
- "What problem will this split solve that we cannot solve in the monolith?" If you cannot answer in one sentence, do not split.
- "What will get worse?" The split will make some things worse — name them up front so the choice is informed.
- "Can we test this with a modular monolith first?" If the answer is yes, do that. The modular monolith reveals whether the boundary you want is even the right one.
- "What is our distributed-systems maturity?" A team without distributed tracing, idempotency discipline, and operational on-call rotation will struggle. Build the foundation before splitting, or your first incident will be bad.
- "Does the team boundary match?" If the proposed service has no clear owning team, you have not finished the design (Conway's Law).
Common Mistakes
- Splitting because "the monolith feels wrong." Feelings are not signals. Find the specific operational pain.
- Splitting before the bounded contexts are clear. Premature splits freeze the wrong cuts. See DDD Strategic Design.
- Splitting all at once. A "let's go to microservices" project that extracts 15 services in 6 months almost never goes well. Extract one, learn, then plan more.
- Not investing in operational foundation. Splitting without distributed tracing, idempotency discipline, contract versioning, and on-call rotation produces incidents the team is not equipped to handle.
- Splitting along data tables. A
users-serviceand anorders-servicebecause those are the tables is not bounded-context-driven design. It is a guarantee of a distributed monolith. - No criterion for staying. A team that intends to "eventually be all microservices" will not recognize when staying monolithic is correct.
Relation to Other Pages
- Monolith — the default. Splits depart from this.
- Modular Monolith — the better destination than microservices for most extractions.
- Microservices — the destination when the case is genuinely strong.
- Conway's Law — splits without team alignment do not deliver autonomy.
- DDD Strategic Design — bounded contexts are the natural unit of a split.
- Anti-Patterns — what happens when splits are done badly.
Further Reading
- Martin Fowler, MonolithFirst (2015) — the foundational essay for the "stay monolith until forced" position.
- Sam Newman, Monolith to Microservices (2019) — book-length treatment of extraction.
- Susan Fowler, Microservices in Production — what running them at scale taught.
- Adam Tornhill, Your Code as a Crime Scene — empirical methods for identifying coupling that suggests split candidates.
- Skelton & Pais, Team Topologies — the team-structure side of when to split.
Pre-commit Checklist
- For each proposed split, can I name a specific operational signal (deploy pain, scaling mismatch, team size, failure isolation, stack divergence) — not just "microservices are better"?
- Have I exhausted cheaper alternatives (deployment pipeline improvements, modular monolith, better tooling) before splitting?
- Is the component I want to extract already a clean module — or do I need to refactor toward modularity first?
- Does the proposed service have a clear owning team?
- Have I built the operational foundation (distributed tracing, idempotency, contract versioning, on-call rotation) before extracting?
- Am I extracting one service first, then planning further splits based on what I learn?
- Have I considered whether some modules should stay in the monolith — not "eventually all microservices"?