Hybrid Search
Combining dense and sparse retrieval because neither alone is enough
Pure vector search sounds elegant, but it has a well-known blind spot: it is bad at exact matches. Search for "error code E-4012" and a pure embedding search will happily return results about error codes in general, missing the one document that contains the exact string. Sparse retrieval (BM25, TF-IDF) nails exact matches but misses semantic paraphrases. Hybrid search combines both, and it is the default you should start with for any production RAG system.
Why Pure Vector Search Fails
Embedding models compress a passage into a single dense vector. This is great for meaning but lossy for specifics:
- Exact keywords and identifiers — product codes, error IDs, function names, proper nouns. The embedding may not preserve these.
- Rare terms — words that appear infrequently in the training data get poor embeddings.
- Negation and subtle modifiers — "not working" and "working" can have similar embeddings.
- Short queries — a 2-word query produces a thin embedding that matches too broadly.
BM25 handles all of these well because it operates on literal token overlap. But BM25 fails on paraphrase: "how to fix a broken pipe" won't match a document titled "repairing damaged plumbing."
You need both signals.
The Architecture
The standard hybrid search pipeline:
- Sparse retrieval — run BM25 (or SPLADE, a learned sparse model) over the corpus. Get top-k1 candidates with scores.
- Dense retrieval — run embedding similarity search. Get top-k2 candidates with scores.
- Fusion — combine the two ranked lists into one.
- (Optional) Reranking — run a cross-encoder over the fused list.
The fusion step is where the magic and the pain live.
Fusion Strategies
Reciprocal Rank Fusion (RRF)
The simplest and most robust approach. For each document, compute:
RRF_score = sum(1 / (k + rank_i)) across all retrieval methods, where k is a constant (usually 60).
- Strength: no score normalization needed, works across any number of retrievers, parameter-free except for k.
- Weakness: treats all retrievers equally. If one is much better than the other, you might want weighted fusion.
- Use this as your default unless you have a strong reason not to.
Weighted score fusion
Normalize scores from each retriever to [0, 1], then combine with weights: alpha * dense_score + (1 - alpha) * sparse_score.
- Strength: lets you bias toward the retriever that works better for your domain.
- Weakness: score normalization is tricky (BM25 scores are unbounded; embedding cosine scores cluster in a narrow range). Weights need tuning.
- Use this when you have evaluation data to tune alpha.
Learned fusion
Train a small model to combine features from both retrievers. Overkill for most teams but used at scale (e.g., search engines).
Dense + Sparse In Practice
BM25 + embedding (the standard stack)
Most vector databases now support this natively:
- Weaviate — built-in BM25 + vector hybrid with configurable alpha.
- Qdrant — sparse vectors support lets you store BM25/SPLADE alongside dense vectors.
- Pinecone — sparse-dense vectors in a single index.
- Elasticsearch — dense vector fields + traditional BM25 in one query.
If your vector DB supports hybrid natively, use it. Avoids the complexity of running two separate systems.
SPLADE as a better sparse retriever
SPLADE is a learned sparse model — it expands the query with related terms and assigns learned weights. It outperforms BM25 on many benchmarks while still being a sparse retriever (fast, exact-match friendly).
Use SPLADE over BM25 when:
- You can afford the model inference at query time.
- Your queries benefit from expansion (short queries, domain-specific vocabulary).
When to Use Hybrid Search
Almost always for production RAG. The cases where pure vector search is sufficient:
- All your queries are semantic / paraphrase-heavy (rare in practice).
- Your corpus is small and homogeneous.
- You've measured and confirmed BM25 adds nothing for your query distribution.
In every other case, the 10–30 minutes to set up hybrid search is the best investment you'll make.
Practical Tips
- Start with RRF, alpha=0.5. Tune later if you have eval data.
- Pre-process for BM25. Tokenize, lowercase, strip stopwords. BM25 benefits from classic text normalization that embedding models don't need.
- Watch for score distribution mismatch. If one retriever dominates the fused list, your normalization or weights are off.
- Test on keyword-heavy queries specifically. That is where hybrid search earns its keep.
- Combine hybrid search with reranking for the best pipeline: BM25 + embeddings -> fuse -> cross-encoder rerank. Three stages, maximum quality.