Model Registry
Version, stage, and promote models like you version code
A model registry is the single source of truth for "which model is in production and how did it get there." Without one, promotion to production is a Slack message and a prayer. With one, it's a recorded, auditable, rollback-safe operation.
Why You Need One
The moment you have more than one model version — which is immediately — you need answers to:
- Which version is serving traffic right now?
- What data and code produced it?
- Who approved it for production?
- Can I roll back to the previous version in under a minute?
A model registry answers all four. A folder of .pkl files answers none.
Core Concepts
Every registry worth using has these primitives:
- Model — a named entity ("fraud-detector", "embedding-model-v2").
- Version — an immutable snapshot of weights, config, and metadata. Every training run can produce a version.
- Stage — a label on a version: "staging", "production", "archived". Moving between stages is the promotion workflow.
- Lineage — the link from a model version back to the training run, dataset, and code commit that produced it. This is what makes audits possible.
The Tools
- MLflow Model Registry — the most common open-source option. Tight integration with MLflow tracking. Supports stage transitions, annotations, and webhooks. Good if you're already in the MLflow ecosystem.
- HuggingFace Hub — the default for anything transformer-based. Model cards, versioning via git, community sharing built in. Less suited for internal production workflows unless you use a private Hub.
- Vertex AI Model Registry — GCP-native. Strong if you're on Google Cloud and want managed infrastructure.
- SageMaker Model Registry — AWS-native. Same idea, different cloud.
- Custom registries — many mature teams build a thin registry layer over their own object storage + metadata DB. More work, more control.
Pick based on your cloud and existing stack. Don't build custom unless the managed options genuinely don't fit.
Artifact Storage
Model weights are big. The registry tracks metadata and pointers; the actual artifacts live in:
- Object storage (S3, GCS, Azure Blob) — the standard. Cheap, durable, no size limits.
- Artifact stores within tracking tools (W&B Artifacts, MLflow Artifacts) — convenient if you already use the tool; may hit limits at scale.
Version your artifacts with content hashes, not timestamps. Two identical training runs should point to the same artifact.
The Promotion Workflow
A healthy promotion flow looks like:
- Training pipeline produces a new model version and registers it as "none" or "candidate".
- Automated validation runs: eval metrics, latency benchmarks, bias checks, data-slice analysis.
- If validation passes, promote to "staging" — serve shadow traffic or run A/B tests.
- If staging looks good, promote to "production".
- Previous production version moves to "archived" but remains available for instant rollback.
Every stage transition should be logged with who did it, when, and why. Manual promotions are fine early on; automate the gates as you mature.
Model Lineage
Lineage is the kill feature of a proper registry. For any production model you should be able to trace:
- Code — exact commit hash.
- Data — exact dataset version or hash.
- Training run — link to the experiment tracker.
- Evaluation results — what metrics looked like before promotion.
- Dependencies — library versions, base model versions.
This sounds like overhead until the first time a model misbehaves in production and you need to answer "what changed?"
Common Mistakes
- No registry, just a folder — works for one person, breaks for teams.
- Skipping lineage — you register the model but not the provenance. Audits become archaeology.
- Manual artifact management — copying weight files by hand is how you get version mismatches in production.
- No rollback plan — if you can't revert to the previous model version in minutes, your deployment is fragile.
- Registering too late — register on every training run, not just the ones you think are good. You want the full history.