The trilemma — pick two — Build a vector database (Pinecone / Weaviate / pgvector style)

Build a vector database (Pinecone / Weaviate / pgvector style) (15 scenes)

Scene 10 · The trilemma — pick two

Recall, latency, memory: Flat maxes recall, HNSW speed-at-recall, IVFPQ memory — no index wins all three. Every deployment is a chosen point on the Pareto frontier.

Previously

Four indexes, three axes, one chart — and the picture is unambiguous: no index is best at recall AND latency AND memory at once. Choosing an index is choosing which axis to sacrifice. With the index-choice arc complete, we now leave the clean geometry behind and hit where production actually gets hard: real queries aren't just 'similar to X'.

Scene 10

The trilemma — pick two

Watch

Diagram

The one chart the whole arc has been building. Each dot is an index you've met: recall@10 goes UP the chart (higher = better), latency goes RIGHT (further right = slower), and the bubble's SIZE is memory (bigger = more RAM). The soft curve through the best dots is the frontier — the edge of what's possible. The filled, ringed dot is YOUR live config; drag the knobs and it slides along that edge but can never jump past it.

Sources

Flat: perfect recall, huge & slow

IVFPQ: tiny & fast, lower recall

Every dot the arc dropped, finally on one chart. *Flat* sits top-left: recall 1.0 — but it's slow and its bubble is huge (it keeps every full vector). *HNSW* is fast and high-recall, but its bubble is just as big — it stores full vectors PLUS a graph of edges. *IVFPQ* has a tiny bubble (compressed) and is fast, but it sits lower — it gave up recall. *IVF* lands in the middle. Look across all three axes at once: no dot is high AND far-left AND small. The curve through the best dots is the *frontier* — the boundary of what any index can reach. Nothing lives above-and-left-of it.

Implementation

buyRecall (Flat / re-rank path)

exact distances on full vectors — pays in latency + memory

1def rank_exact(query, candidates):
2    scored = []
3    for v in candidates:           # full float32 vectors
4        d = l2(query, v.full)      # no approximation
5        scored.append((d, v.id))
6    return topk(scored)            # recall ceiling = Flat
7 
8# re-rank: rescue recall an approx index conceded
9approx = index.search(query, k * 10)
10return rank_exact(query, approx)   # exact, but slower + holds vectors

buySpeed (IVF probe / HNSW walk)

skip most of the data — pays in recall (boundary misses)

1def search_fast(query, nprobe, ef_search):
2    cells = nearest_centroids(query, nprobe)  # IVF: scan few cells
3    cand = scan(cells)
4    # OR HNSW greedy walk: enter top layer, hop down
5    cand = greedy_walk(query, ef_search)
6    return topk(cand)
7    # tighter nprobe / ef_search = faster, lossier:
8    # a true neighbor across an unprobed boundary is missed

buyMemory (Product Quantization)

store tiny codes, not vectors — pays in recall (lossy codes)

1def encode(v, m, codebooks):       # split into m subvectors
2    return bytes(
3        nearest_centroid_id(sub, codebooks[i])  # 1 byte each
4        for i, sub in enumerate(split(v, m))
5    )                              # 6144 bytes -> m bytes
6 
7def adc(query, code):              # query stays full-precision
8    return sum(dist_table[i][code[i]] for i in range(m))
9    # lossy: estimated distance, so recall sags vs Flat

Not sure what to ask? Tap a question — the staff engineer answers in the chat panel.

PreviousIVFPQ — skip most cells, compress the rest NextFiltered search — pre vs post