Build a vector database (Pinecone / Weaviate / pgvector style) (15 scenes)
Scene 10 · The trilemma — pick two
Recall, latency, memory: Flat maxes recall, HNSW speed-at-recall, IVFPQ memory — no index wins all three. Every deployment is a chosen point on the Pareto frontier.
Previously
Four indexes, three axes, one chart — and the picture is unambiguous: no index is best at recall AND latency AND memory at once. Choosing an index is choosing which axis to sacrifice. With the index-choice arc complete, we now leave the clean geometry behind and hit where production actually gets hard: real queries aren't just 'similar to X'.
Scene 10
The trilemma — pick two
Diagram
The one chart the whole arc has been building. Each dot is an index you've met: recall@10 goes UP the chart (higher = better), latency goes RIGHT (further right = slower), and the bubble's SIZE is memory (bigger = more RAM). The soft curve through the best dots is the frontier — the edge of what's possible. The filled, ringed dot is YOUR live config; drag the knobs and it slides along that edge but can never jump past it.
Flat: perfect recall, huge & slow
IVFPQ: tiny & fast, lower recall
Every dot the arc dropped, finally on one chart. *Flat* sits top-left: recall 1.0 — but it's slow and its bubble is huge (it keeps every full vector). *HNSW* is fast and high-recall, but its bubble is just as big — it stores full vectors PLUS a graph of edges. *IVFPQ* has a tiny bubble (compressed) and is fast, but it sits lower — it gave up recall. *IVF* lands in the middle. Look across all three axes at once: no dot is high AND far-left AND small. The curve through the best dots is the *frontier* — the boundary of what any index can reach. Nothing lives above-and-left-of it.
Implementation
buyRecall (Flat / re-rank path)
exact distances on full vectors — pays in latency + memory
1def rank_exact(query, candidates):2 scored = []3 for v in candidates: # full float32 vectors4 d = l2(query, v.full) # no approximation5 scored.append((d, v.id))6 return topk(scored) # recall ceiling = Flat78# re-rank: rescue recall an approx index conceded9approx = index.search(query, k * 10)10return rank_exact(query, approx) # exact, but slower + holds vectors
buySpeed (IVF probe / HNSW walk)
skip most of the data — pays in recall (boundary misses)
1def search_fast(query, nprobe, ef_search):2 cells = nearest_centroids(query, nprobe) # IVF: scan few cells3 cand = scan(cells)4 # OR HNSW greedy walk: enter top layer, hop down5 cand = greedy_walk(query, ef_search)6 return topk(cand)7 # tighter nprobe / ef_search = faster, lossier:8 # a true neighbor across an unprobed boundary is missed
buyMemory (Product Quantization)
store tiny codes, not vectors — pays in recall (lossy codes)
1def encode(v, m, codebooks): # split into m subvectors2 return bytes(3 nearest_centroid_id(sub, codebooks[i]) # 1 byte each4 for i, sub in enumerate(split(v, m))5 ) # 6144 bytes -> m bytes67def adc(query, code): # query stays full-precision8 return sum(dist_table[i][code[i]] for i in range(m))9 # lossy: estimated distance, so recall sags vs Flat
Not sure what to ask? Tap a question — the staff engineer answers in the chat panel.