Build a vector database (Pinecone / Weaviate / pgvector style) (15 scenes)
Scene 09 · IVFPQ — skip most cells, compress the rest
Stack IVF (skip most cells) and PQ (compress the residual): the FAISS billion-scale workhorse — tiny memory and a fraction-of-N scan, paid for with compounded approximation.
Previously

PQ shrank the vectors but still touched all of them; IVF skipped most vectors but stored them whole. IVFPQ does both — open only nearby cells, then store each survivor as a tiny compressed residual — and becomes the only index that fits a billion vectors in RAM at speed. We now have four very different dots on the chart, and it's time to see what they collectively prove.

Scene 09
IVFPQ — skip most cells, compress the rest
Diagram
Left: the IVF mini-map — the 14 songs carved into 4 Voronoi cells, with only the nprobe cells nearest the query opened (the rest dimmed). Right: for the songs in those opened cells, each is stored not as its raw vector but as its RESIDUAL (the point minus its cell's centroid) PQ-coded into a handful of bytes. The byte counter runs 6,144 → ~96 live, and the inset Pareto chart drops IVFPQ's dot into the small-memory, fast, lower-recall corner that neither Flat nor HNSW can reach.
VECTOR · 1,536 dims · 32 subvectorsd0d1d2d3d4d5d6d7d8d9d10d11d12d13d14d15d16d17d18d19d20d21d22d2332each chunk = 48 dims = 192 raw bytesCODEBOOK · 256 prototypesnearest prototype → 1 byte (id 0–255)PQ CODE · 32 bytes177981919611738215136572341557600000000000032BYTES PER VECTORraw float326,144PQ code32192×smallerRECALL0.59lossyPQ approx onlyIVF CELLS · probe nearestPQ encodes the RESIDUAL (point − centroid)RECALL vs LATENCYrecall ↑latency →FlatHNSWIVFPQIVF opens only the nearby cells; PQ stores each survivor's residual as a few bytes.
residual = point − its cell's centroid
only nearby cells opened
6,144 bytes → a handful
We're not inventing anything new here — we're stacking two tools you already have. IVF carves the 14 songs into a few cells and, for the query 'Now Playing', opens only the nearby ones (everything else stays dimmed and unscanned). Then, for the songs that survive in those opened cells, PQ shrinks each one. The twist: PQ doesn't compress the raw vector — it compresses the RESIDUAL, which just means 'how far this point sits from the centre of its cell.' Because every point in a cell is already huddled near that centre, the residual is a small wobble, and small wobbles round to a prototype far more accurately than a whole vector would. Watch the byte counter collapse from 6,144 bytes (1536 dimensions × 4 bytes each) down to a handful, while only the nearby cells light up.
Implementation
IVFPQ.add
store each vector as (cell id, PQ code of its residual)
1def add(vec):
2 c = nearest_centroid(vec) # IVF: which cell
3 residual = vec - centroids[c] # the small leftover
4 code = pq.encode(residual, m) # m one-byte ids
5 cells[c].append((id_of(vec), code))
IVFPQ.search
open nprobe cells, then score the survivors
1def search(q, k, nprobe, rerank):
2 near = nprobe_nearest_cells(q, nprobe)
3 cands = []
4 for c in near: # the rest stay closed
5 cands += pq_score(q, c)
6 short = top(cands, 4 * k) # cheap shortlist
7 if rerank:
8 short = exact_rescore(q, short)
9 return top(short, k)
PQ.score (ADC)
score codes against the query residual, no decompression
1def pq_score(q, c):
2 qr = q - centroids[c] # query residual for this cell
3 # one table per subvector: qr-chunk -> each prototype
4 table = adc_tables(qr, m)
5 out = []
6 for (id, code) in cells[c]:
7 d = sum(table[j][code[j]] for j in range(m))
8 out.append((id, d))
9 return out
Not sure what to ask? Tap a question — the staff engineer answers in the chat panel.