Circuit breakers, retries, rate limiting, graceful degradation — patterns that keep your service alive when things fail

Resilience

In production, things fail. Databases go down, downstream services time out, networks partition, disks fill up. A resilient system does not prevent failures — it handles them gracefully. The user gets a degraded experience instead of an error page. The on-call engineer gets an alert instead of a wake-up call.

This page covers the patterns that make backend services survive the failures that production will throw at them.

Circuit Breaker

A circuit breaker prevents your service from repeatedly calling a failing downstream. Like an electrical circuit breaker, it "trips" after too many failures and stops sending requests — giving the downstream time to recover.

States

        ┌──────────┐
        │  CLOSED  │  (normal — requests pass through)
        └────┬─────┘
             │ failure threshold exceeded
        ┌────▼─────┐
        │   OPEN   │  (tripped — requests fail immediately)
        └────┬─────┘
             │ timeout expires
        ┌────▼─────┐
        │HALF-OPEN │  (testing — let one request through)
        └────┬─────┘
             │ success → CLOSED
             │ failure → OPEN

Implementation

class CircuitBreaker {
  private state: 'CLOSED' | 'OPEN' | 'HALF_OPEN' = 'CLOSED';
  private failureCount = 0;
  private lastFailureTime = 0;

  constructor(
    private readonly threshold: number = 5,
    private readonly resetTimeout: number = 30_000,
  ) {}

  async call<T>(fn: () => Promise<T>): Promise<T> {
    if (this.state === 'OPEN') {
      if (Date.now() - this.lastFailureTime > this.resetTimeout) {
        this.state = 'HALF_OPEN';
      } else {
        throw new CircuitOpenError('Circuit is open');
      }
    }

    try {
      const result = await fn();
      this.onSuccess();
      return result;
    } catch (error) {
      this.onFailure();
      throw error;
    }
  }

  private onSuccess() {
    this.failureCount = 0;
    this.state = 'CLOSED';
  }

  private onFailure() {
    this.failureCount++;
    this.lastFailureTime = Date.now();
    if (this.failureCount >= this.threshold) {
      this.state = 'OPEN';
    }
  }
}

// Usage
const paymentCircuit = new CircuitBreaker(5, 30_000);

async function chargeUser(userId: string, amount: number) {
  try {
    return await paymentCircuit.call(() =>
      paymentService.charge(userId, amount)
    );
  } catch (err) {
    if (err instanceof CircuitOpenError) {
      // Fallback: queue for later processing
      await paymentQueue.add({ userId, amount });
      return { status: 'queued' };
    }
    throw err;
  }
}

Key decisions: failure threshold (too low = false trips, too high = too many failed requests before tripping), reset timeout (too short = downstream still recovering, too long = unnecessary downtime).

Retry with Exponential Backoff + Jitter

Retries handle transient failures — network blips, temporary overloads. Without backoff, retries create a thundering herd that makes the problem worse.

async function retryWithBackoff<T>(
  fn: () => Promise<T>,
  options: {
    maxRetries?: number;
    baseDelay?: number;
    maxDelay?: number;
    retryableErrors?: (error: Error) => boolean;
  } = {}
): Promise<T> {
  const {
    maxRetries = 3,
    baseDelay = 1000,
    maxDelay = 30_000,
    retryableErrors = isTransient,
  } = options;

  let lastError: Error;

  for (let attempt = 0; attempt <= maxRetries; attempt++) {
    try {
      return await fn();
    } catch (error) {
      lastError = error as Error;

      if (attempt === maxRetries || !retryableErrors(lastError)) {
        throw lastError;
      }

      // Exponential backoff with full jitter
      const exponentialDelay = baseDelay * Math.pow(2, attempt);
      const delay = Math.random() * Math.min(exponentialDelay, maxDelay);

      await sleep(delay);
    }
  }

  throw lastError!;
}

function isTransient(error: Error): boolean {
  if ('statusCode' in error) {
    const code = (error as any).statusCode;
    return code === 429 || code === 502 || code === 503 || code === 504;
  }
  return error.message.includes('ECONNRESET')
      || error.message.includes('ETIMEDOUT');
}

// Usage
const user = await retryWithBackoff(
  () => userService.getById(userId),
  { maxRetries: 3, baseDelay: 500 }
);

Why Jitter Matters

Without jitter, all clients retry at the same time:

No jitter:      [all retry at 1s] [all retry at 2s] [all retry at 4s]
Full jitter:    [retries spread across 0-1s] [0-2s] [0-4s]

Full jitter (random between 0 and the calculated delay) spreads the load. Decorrelated jitter is even better for some patterns — but full jitter is good enough for most cases.

Timeout Policies

Every outbound call needs a timeout. No exceptions. An unbounded call can hold a thread/connection forever, eventually exhausting your server's resources.

// Per-request timeout with AbortController
async function fetchWithTimeout(url: string, timeoutMs: number) {
  const controller = new AbortController();
  const timeout = setTimeout(() => controller.abort(), timeoutMs);

  try {
    const response = await fetch(url, { signal: controller.signal });
    return response;
  } catch (error) {
    if (error.name === 'AbortError') {
      throw new TimeoutError(`Request to ${url} timed out after ${timeoutMs}ms`);
    }
    throw error;
  } finally {
    clearTimeout(timeout);
  }
}

// Cascading timeouts: each layer gets a budget
// HTTP handler:    5000ms total
//   → DB query:   2000ms
//   → API call:   3000ms
//     → Retry 1:  1500ms
//     → Retry 2:  1500ms

Cascading timeouts: if your handler has a 5-second budget, don't give a downstream call a 5-second timeout — it leaves no time for fallback logic. Budget your time across all operations.

Bulkhead Pattern

Isolate failures so one slow dependency doesn't consume all your resources and bring down unrelated endpoints.

// Semaphore-based bulkhead: limit concurrent calls per dependency
class Bulkhead {
  private active = 0;
  private queue: Array<() => void> = [];

  constructor(
    private readonly maxConcurrent: number,
    private readonly maxQueue: number = 100,
  ) {}

  async execute<T>(fn: () => Promise<T>): Promise<T> {
    if (this.active >= this.maxConcurrent) {
      if (this.queue.length >= this.maxQueue) {
        throw new BulkheadFullError('Bulkhead queue is full');
      }
      await new Promise<void>((resolve) => this.queue.push(resolve));
    }

    this.active++;
    try {
      return await fn();
    } finally {
      this.active--;
      const next = this.queue.shift();
      if (next) next();
    }
  }
}

// Separate bulkheads per dependency
const paymentBulkhead = new Bulkhead(10);   // max 10 concurrent payment calls
const inventoryBulkhead = new Bulkhead(20); // max 20 concurrent inventory calls

// If payments are slow, inventory calls are unaffected

Without a bulkhead, a slow payment service can consume all your connection pool, causing inventory checks and user lookups to fail too.

Rate Limiting

Rate limiting protects your service from abuse and overload. Three common algorithms:

Token Bucket

Allows bursts up to a limit, then enforces a steady rate:

class TokenBucket {
  private tokens: number;
  private lastRefill: number;

  constructor(
    private readonly capacity: number,
    private readonly refillRate: number, // tokens per second
  ) {
    this.tokens = capacity;
    this.lastRefill = Date.now();
  }

  tryConsume(tokens: number = 1): boolean {
    this.refill();

    if (this.tokens >= tokens) {
      this.tokens -= tokens;
      return true;
    }

    return false;
  }

  private refill() {
    const now = Date.now();
    const elapsed = (now - this.lastRefill) / 1000;
    this.tokens = Math.min(this.capacity, this.tokens + elapsed * this.refillRate);
    this.lastRefill = now;
  }
}

// 100 requests capacity, refill 10 per second
const limiter = new TokenBucket(100, 10);

Sliding Window Log

Tracks exact timestamps — precise but memory-intensive for high-volume endpoints:

class SlidingWindowLog {
  private timestamps: number[] = [];

  constructor(
    private readonly windowMs: number,
    private readonly maxRequests: number,
  ) {}

  tryConsume(): boolean {
    const now = Date.now();
    const windowStart = now - this.windowMs;

    // Remove expired entries
    this.timestamps = this.timestamps.filter(t => t > windowStart);

    if (this.timestamps.length < this.maxRequests) {
      this.timestamps.push(now);
      return true;
    }

    return false;
  }
}

Fixed Window Counter

Simple, memory-efficient, but allows bursts at window boundaries:

class FixedWindowCounter {
  private count = 0;
  private windowStart = Date.now();

  constructor(
    private readonly windowMs: number,
    private readonly maxRequests: number,
  ) {}

  tryConsume(): boolean {
    const now = Date.now();
    if (now - this.windowStart > this.windowMs) {
      this.count = 0;
      this.windowStart = now;
    }

    if (this.count < this.maxRequests) {
      this.count++;
      return true;
    }

    return false;
  }
}

Choosing an Algorithm

Algorithm	Burst handling	Memory	Precision	Best for
Token bucket	Allows controlled bursts	Low	Good	API rate limiting
Sliding window log	No bursts	High	Exact	Low-volume, strict limits
Fixed window counter	Boundary bursts possible	Very low	Approximate	High-volume, approximate limits

Rate Limit Response

Always tell the client what is happening:

function rateLimitMiddleware(req, res, next) {
  const key = req.ip; // or req.user.id for authenticated endpoints
  const allowed = limiter.tryConsume(key);

  res.setHeader('X-RateLimit-Limit', '100');
  res.setHeader('X-RateLimit-Remaining', limiter.remaining(key).toString());
  res.setHeader('X-RateLimit-Reset', limiter.resetTime(key).toString());

  if (!allowed) {
    res.setHeader('Retry-After', '60');
    return res.status(429).json({
      error: { code: 'RATE_LIMITED', message: 'Too many requests' },
    });
  }

  next();
}

Graceful Degradation

When a dependency fails, serve a reduced experience instead of an error:

async function getProductPage(productId: string) {
  const product = await productService.getById(productId); // Required — fail if this fails

  // Non-critical: degrade gracefully
  const [reviews, recommendations, inventory] = await Promise.allSettled([
    reviewService.getForProduct(productId),
    recommendationService.getForProduct(productId),
    inventoryService.getStock(productId),
  ]);

  return {
    product,
    reviews: reviews.status === 'fulfilled' ? reviews.value : [],
    recommendations: recommendations.status === 'fulfilled' ? recommendations.value : [],
    inStock: inventory.status === 'fulfilled' ? inventory.value > 0 : null, // null = unknown
  };
}

Promise.allSettled is your friend. Unlike Promise.all, it does not reject on the first failure. Each result has a status of 'fulfilled' or 'rejected', letting you handle each dependency independently.

Health Checks

Two types of health check serve different purposes:

Liveness

"Is the process alive?" If this fails, the orchestrator (Kubernetes) should restart the container.

app.get('/healthz', (req, res) => {
  // Only check that the process can respond
  res.status(200).json({ status: 'ok' });
});

Keep liveness checks trivial. Do not check database connectivity here. If the database is down, restarting your container will not fix it — and you will create a restart loop.

Readiness

"Can this instance serve traffic?" If this fails, the load balancer should stop sending requests to this instance.

app.get('/readyz', async (req, res) => {
  const checks = {
    database: false,
    cache: false,
  };

  try {
    await db.query('SELECT 1');
    checks.database = true;
  } catch {}

  try {
    await redis.ping();
    checks.cache = true;
  } catch {}

  const ready = checks.database; // Cache is optional, DB is required
  res.status(ready ? 200 : 503).json({ status: ready ? 'ready' : 'not ready', checks });
});

Putting It Together

A resilient request handler combines multiple patterns:

const paymentCircuit = new CircuitBreaker(5, 30_000);
const paymentBulkhead = new Bulkhead(10);

async function processPayment(orderId: string, amount: number) {
  // Bulkhead: limit concurrency
  return paymentBulkhead.execute(async () => {
    // Circuit breaker: stop calling if downstream is down
    return paymentCircuit.call(async () => {
      // Retry: handle transient failures
      return retryWithBackoff(
        () => fetchWithTimeout(
          `${PAYMENT_URL}/charge`,
          3000, // Timeout: 3 seconds
        ),
        { maxRetries: 2, baseDelay: 500 }
      );
    });
  });
}

The order matters: bulkhead (limit how many attempts), then circuit breaker (fail fast if downstream is gone), then retry (handle transient errors), then timeout (bound each individual call).

In .NET: Polly

The hand-rolled patterns above are educational, but in production you reach for a battle-tested library. In the .NET world that is Polly, and since v8 it composes strategies into a single ResiliencePipeline — the same bulkhead → circuit breaker → retry → timeout ordering, declared instead of imperatively wired:

var pipeline = new ResiliencePipelineBuilder()
    .AddConcurrencyLimiter(permitLimit: 10)           // bulkhead
    .AddCircuitBreaker(new CircuitBreakerStrategyOptions
    {
        FailureRatio = 0.5,
        MinimumThroughput = 10,
        BreakDuration = TimeSpan.FromSeconds(30)
    })
    .AddRetry(new RetryStrategyOptions
    {
        MaxRetryAttempts = 3,
        BackoffType = DelayBackoffType.Exponential,
        UseJitter = true,                              // full jitter, built in
        Delay = TimeSpan.FromMilliseconds(500)
    })
    .AddTimeout(TimeSpan.FromSeconds(3))               // per-attempt timeout
    .Build();

var order = await pipeline.ExecuteAsync(
    async ct => await _payments.ChargeAsync(orderId, ct),
    cancellationToken);

The strategy order in the builder is the execution order: the timeout wraps each individual attempt, the retry sits outside it, and the circuit breaker observes the aggregate failure rate. UseJitter gives you decorrelated jitter for free, so you do not hand-roll the Math.random() spread shown earlier.

For the common case of an outbound HttpClient, you do not even build the pipeline by hand. Microsoft.Extensions.Http.Resilience adds a standard pipeline to a typed client in one call, so every request through it inherits retries, a circuit breaker, and timeouts:

builder.Services.AddHttpClient<IPaymentClient, PaymentClient>()
    .AddStandardResilienceHandler();

This is the idiomatic .NET answer to the "Putting It Together" example — resilience as configuration on the client, not control flow scattered through every call site.

Checklist

Before shipping to production:

Every outbound HTTP call has a timeout.
Retries use exponential backoff with jitter.
Critical dependencies have a circuit breaker.
Non-critical dependencies degrade gracefully (no error page because the recommendation engine is down).
Rate limiting is in place for public endpoints.
Health checks distinguish liveness from readiness.
Failed requests return useful error messages and appropriate status codes.
Resilience behavior is observable — circuit state, retry count, rate limit hits are logged or metriced.

Resilience

On this page