Build a vector database (Pinecone / Weaviate / pgvector style) (15 scenes)
Scene 05 · The boundary miss — one lonely point
A cell wall is hard: a true neighbor a hair across it is invisible at low nprobe, even though it's closer than points you do return. Proximity in space ≠ membership in a probed cell.
Previously

IVF's speedup comes with a signature failure, and now we've named it: the boundary miss, where a hard cell wall hides an obviously-close neighbor. nprobe patches it but every extra cell you probe is latency back on the bill. That raises a sharper question: is there a way to navigate toward the query's true neighbors directly, instead of carving the space into rigid cells with brittle walls?

Scene 05
The boundary miss — one lonely point
Diagram
Zoomed onto the wall between the query's cell and the cell beside it. Each centroid owns a cell (the colored region); a point's color is its cell. Scanned cells are drawn solid; un-scanned cells are grayed. The red point is 'Lonely Synth' (#14), the query's true #4 neighbor — it sits a hair across the boundary, in a gray (un-probed) cell, so it cannot be returned even though it is closer than 'Study Beats' (#13), which is inside the query's cell and IS returned.
acoustic → electroniccalm → energeticc0c1c2c3Lo-fi RainAcoustic Suns…Campfire FolkCoffeehouseIndie DriveSynth DawnNeon CityClub PulseRave PeakBass DropMidnight DriveGarage BeatStudy BeatsLonely SynthNow PlayingLonely Synth (#14, red) is closer than Study Beats (#13) — but its cell is gr…RECALL vs LATENCYrecallslower →FlatIVF
cell boundary — the hard wall
Lonely Synth (#14): closer, but across the wall →
Study Beats (#13): farther, but inside — returned →
Freeze the frame from the last scene and zoom in on the wall. The query 'Now Playing' sits beside a cell boundary; 'Lonely Synth' (#14, the red point) sits just across it, in the next cell over. Measure honestly and #14 is the query's true #4-closest song. Yet at nprobe=1 — scanning only the query's own cell — IVF skips #14 entirely and fills its slot with 'Study Beats' (#13), which is farther from the query but happens to live INSIDE the scanned cell. The red point is closer and still lost. Notice it: the miss is caused by the line, not by the distance.
Implementation
IVF.build
k-means draws nlist cells once — every wall is a future miss
1def build(vectors, nlist):
2 centroids = kmeans(vectors, k=nlist) # the walls
3 members = {c: [] for c in centroids}
4 for v in vectors:
5 c = nearest_centroid(v, centroids) # Voronoi cell
6 members[c].append(v)
7 return centroids, members
IVF.search
scan only the nprobe nearest cells — the boundary miss lives here
1def search(q, k, nprobe):
2 cells = nearest_centroids(q, nprobe)
3 candidates = []
4 for c in cells: # ONLY probed cells
5 candidates += members[c] # un-probed cells invisible
6 candidates.sort(key=lambda p: dist(q, p))
7 return candidates[:k] # nearness off-cell ignored
Not sure what to ask? Tap a question — the staff engineer answers in the chat panel.