Build a vector database (Pinecone / Weaviate / pgvector style) (15 scenes)
Scene 09 · IVFPQ — skip most cells, compress the rest
Stack IVF (skip most cells) and PQ (compress the residual): the FAISS billion-scale workhorse — tiny memory and a fraction-of-N scan, paid for with compounded approximation.
Previously
PQ shrank the vectors but still touched all of them; IVF skipped most vectors but stored them whole. IVFPQ does both — open only nearby cells, then store each survivor as a tiny compressed residual — and becomes the only index that fits a billion vectors in RAM at speed. We now have four very different dots on the chart, and it's time to see what they collectively prove.
Scene 09
IVFPQ — skip most cells, compress the rest
Diagram
Left: the IVF mini-map — the 14 songs carved into 4 Voronoi cells, with only the nprobe cells nearest the query opened (the rest dimmed). Right: for the songs in those opened cells, each is stored not as its raw vector but as its RESIDUAL (the point minus its cell's centroid) PQ-coded into a handful of bytes. The byte counter runs 6,144 → ~96 live, and the inset Pareto chart drops IVFPQ's dot into the small-memory, fast, lower-recall corner that neither Flat nor HNSW can reach.
residual = point − its cell's centroid
only nearby cells opened
6,144 bytes → a handful
We're not inventing anything new here — we're stacking two tools you already have. IVF carves the 14 songs into a few cells and, for the query 'Now Playing', opens only the nearby ones (everything else stays dimmed and unscanned). Then, for the songs that survive in those opened cells, PQ shrinks each one. The twist: PQ doesn't compress the raw vector — it compresses the RESIDUAL, which just means 'how far this point sits from the centre of its cell.' Because every point in a cell is already huddled near that centre, the residual is a small wobble, and small wobbles round to a prototype far more accurately than a whole vector would. Watch the byte counter collapse from 6,144 bytes (1536 dimensions × 4 bytes each) down to a handful, while only the nearby cells light up.
Implementation
IVFPQ.add
store each vector as (cell id, PQ code of its residual)
1def add(vec):2 c = nearest_centroid(vec) # IVF: which cell3 residual = vec - centroids[c] # the small leftover4 code = pq.encode(residual, m) # m one-byte ids5 cells[c].append((id_of(vec), code))
IVFPQ.search
open nprobe cells, then score the survivors
1def search(q, k, nprobe, rerank):2 near = nprobe_nearest_cells(q, nprobe)3 cands = []4 for c in near: # the rest stay closed5 cands += pq_score(q, c)6 short = top(cands, 4 * k) # cheap shortlist7 if rerank:8 short = exact_rescore(q, short)9 return top(short, k)
PQ.score (ADC)
score codes against the query residual, no decompression
1def pq_score(q, c):2 qr = q - centroids[c] # query residual for this cell3 # one table per subvector: qr-chunk -> each prototype4 table = adc_tables(qr, m)5 out = []6 for (id, code) in cells[c]:7 d = sum(table[j][code[j]] for j in range(m))8 out.append((id, d))9 return out
Not sure what to ask? Tap a question — the staff engineer answers in the chat panel.