Infrastructure
The platforms beneath the product — compute, network, storage, observability, delivery, security, reliability
Infrastructure
Infrastructure is the layer your application code sits on top of. It is not a single tool category but a stack of opinionated choices about where code runs, how traffic reaches it, where state lives, how you see what it is doing, how you ship changes to it, how you keep attackers out, and how you survive failure.
The material is organized by the questions you ask when designing a system:
- Where does my code run? — compute & runtime
- How does traffic get to it? — networking & edge
- Where does state live? — data & storage
- How do I know it is healthy? — observability
- How do I ship changes safely? — delivery & platform
- How do I keep attackers out and policies enforced? — security & governance
- How do I survive failure, and waste? — reliability & operations
Topics
Compute & Runtime
Where your code actually executes — containers, queues, edge, hosts.
- Containerization — Docker and Kubernetes; the default unit of deployment for almost everything that is not edge or serverless.
- Background Jobs — Sidekiq, BullMQ, Celery, Temporal; task queues, retries, scheduled work, durable workflows.
- Edge Functions — Cloudflare Workers, Vercel Edge, Deno Deploy; small, low-latency code running at the CDN edge.
- Static Site Hosting — Cloudflare Pages, Vercel, Netlify; frontends with edge CDN, preview deploys, and edge functions.
- Workflow Orchestration — Airflow, Dagster, Prefect, Argo Workflows; scheduling and dependencies for DAG-based pipelines.
Networking & Edge
How clients reach your services, and how services reach each other.
- API Gateway — Kong, Envoy, Traefik; the front door — auth, rate limits, routing.
- Service Mesh — Istio, Linkerd; mTLS, traffic management, and observability between services.
- CDN — Cloudflare, Fastly, CloudFront; global edge caching, image optimization, DDoS protection.
- DNS — Records, zones, TTL, anycast, propagation — the original distributed system, still in your critical path.
- Email & Communication — Resend, SendGrid, SES, Postmark, Twilio; transactional email, SMS, push, deliverability.
Data & Storage
Where bytes and rows live, and how they are queried at scale.
- Cache — Redis and Memcached; in-memory caching patterns and classic pitfalls.
- Object Storage — S3, R2, GCS, MinIO; blob storage at unlimited scale.
- Search — Algolia, Meilisearch, Typesense; typo-tolerant instant search beyond Elasticsearch.
- Vector Databases — Pinecone, Qdrant, Weaviate, pgvector; the storage layer for embeddings, RAG, semantic search.
- Time-Series Databases — InfluxDB, TimescaleDB, VictoriaMetrics, QuestDB; purpose-built storage for timestamps, metrics, sensors.
- Data Warehouses & Lakehouses — Snowflake, BigQuery, Databricks, ClickHouse, Redshift; columnar storage and compute for analytics.
- Stream Processing — Flink, Kafka Streams, Materialize; continuous computation over events in motion.
- Message Queues — Kafka, RabbitMQ; durable, asynchronous messaging between services.
Observability
Knowing what your system is doing in production, and why.
- Monitoring — Prometheus and Grafana; metrics, dashboards, and alerting.
- Tracing — OpenTelemetry, Jaeger, Tempo; see how a request flows across services.
- ELK Stack — Elasticsearch, Logstash, Kibana; log management and search.
- Observability Pipelines — Vector, OpenTelemetry Collector, Fluent Bit, Cribl; route, transform, sample, and reduce telemetry.
Delivery & Platform
Getting code, configuration, and models from a commit to production — repeatably.
- CI/CD Platforms — GitHub Actions and GitLab CI in depth; pipelines, OIDC, runners, deployment patterns.
- GitOps — ArgoCD, Flux, Jenkins X; declarative continuous delivery with Git as the source of truth.
- Infrastructure as Code — Terraform, Ansible; declarative, version-controlled infrastructure.
- Feature Flags — LaunchDarkly, Unleash, OpenFeature; decouple deployment from release with targeted rollouts.
- Internal Developer Platforms — Backstage, Port, Cortex, Humanitec; service catalogs, golden paths, self-service.
- MLOps & AI Infrastructure — MLflow, Kubeflow, Ray, BentoML, vLLM, SageMaker; training, serving, and lifecycle for ML/AI workloads.
Security & Governance
Keeping attackers out, and proving the rules are followed.
- Identity & Auth — Auth0, Clerk, WorkOS, Keycloak; user identity, SSO, OAuth/OIDC, SCIM.
- Secret Management — HashiCorp Vault; centralized secret storage, dynamic credentials, rotation.
- VPN & Zero Trust — Tailscale, WireGuard, Cloudflare Tunnel; private networking without perimeter VPN.
- WAF, DDoS & Bot Management — Cloudflare, AWS WAF, Akamai, Imperva; edge security for the public web.
- Container Runtime Security — Falco, Tetragon, Tracee; eBPF-based detection that catches what static checks miss.
- Supply Chain Security — Sigstore, Cosign, SBOM, SLSA, in-toto; prove what is in your artifacts and where they came from.
- Policy as Code — OPA, Kyverno, Cedar, Sentinel; express security, compliance, and operational rules as version-controlled code.
Reliability & Operations
Surviving failure, and not wasting money doing it.
- Chaos Engineering — Chaos Mesh, Litmus, Gremlin; inject controlled failure to find weaknesses before they find you.
- Disaster Recovery & Backup — Velero, Restic, snapshot patterns, cross-region replication, RTO/RPO.
- FinOps & Cloud Cost — OpenCost, Kubecost, Vantage, Cloudability; aligning engineering, finance, and product on cloud spend.
Scope and Boundaries
| Concern | Lives here | Lives elsewhere |
|---|---|---|
| Containers, orchestration, edge runtimes, background jobs | ✓ | — |
| Edge networking, API gateways, service mesh, DNS, CDN | ✓ | — |
| Caches, object storage, search, vector / time-series stores, warehouses, streaming | ✓ | — |
| Monitoring, tracing, logging, observability pipelines | ✓ | — |
| CI/CD, GitOps, IaC, feature flags, IDPs, MLOps platforms | ✓ | — |
| Identity, secrets, network security, runtime/supply-chain security, policy | ✓ | — |
| Chaos, DR/backup, FinOps | ✓ | — |
| Relational databases, query optimization, indexing, transactions | — | software-development/database |
| Server frameworks, API design (REST / GraphQL / gRPC), back-end patterns | — | software-development/back-end |
| Test pipelines, sharding, flake triage | — | software-development/testing/automation |
| Application-level performance, profiling, hot paths | — | software-development/code-craft/quality/performance |
| Distributed systems theory (CAP, consensus, consistency) | — | software-development/architecture |
| Model training, prompt engineering, agent design | — | artificial-intelligence |
| Vendor landscape, industry players, cloud market structure | — | industry-landscape |
The split with database/ is the most common point of confusion: database/ covers what you do inside a relational system (schema, indexes, query plans, transactions). Everything else that stores bytes — caches, blobs, search, vectors, warehouses, streams — lives here, because operationally it behaves more like infrastructure than like SQL.