Build a vector database (Pinecone / Weaviate / pgvector style) (15 scenes)
Scene 06 · HNSW — hop along a proximity graph
Link each vector to its M nearest neighbors and search by greedy hops — no hard cell walls, so the lonely boundary point is just one more link away.
Previously
Rigid cells gave us the boundary miss. Linking each vector to its nearest neighbors and hopping greedily toward the query removes the wall entirely — the lonely point is just one more hop away. But a flat graph still has a problem: from a random start, hopping one short link at a time across a huge dataset is slow. How do we cover big distances fast and still land precisely?
Scene 06
HNSW — hop along a proximity graph
Diagram
The same 14-song map, but the Voronoi walls are gone. Each song now has edges to its few nearest neighbors. The animated path is a greedy walk: it starts at an entry node and, at each step, hops to whichever linked neighbor is closest to the query 'Now Playing', stopping when no neighbor is closer. 'Lonely Synth' (#14, red) — the point that fell across a cell wall last scene — is now reachable because it's linked to nodes the walk passes through. The small inset is the running recall-vs-latency chart; HNSW's dot sits up-and-right of IVF.
Last scene, a hard cell wall hid 'Lonely Synth' from the query. Now watch a completely different idea. The cell boundaries are gone. Instead, every song is connected by short links to the few songs nearest it in the space — a web of neighbors. To search, we start at one node and keep hopping to whichever linked neighbor is *closest to the query*, stepping downhill until no neighbor is any closer. Follow the path: Indie Drive → Midnight Drive → Synth Dawn → and the last hop lands on the very point the cells missed.
Implementation
Index.build_graph
wire each vector to its M nearest neighbors (the kNN graph)
1def build_graph(vectors, M):2 edges = defaultdict(set)3 for v in vectors:4 # M = links per node, the only knob here5 near = nearest(v, vectors, k=M)6 for u in near:7 edges[v].add(u) # link is symmetric8 edges[u].add(v)9 return edges
Index.greedy_search
hop to the linked neighbor closest to the query, stop downhill
1def greedy_search(query, edges, entry):2 current = entry3 while True:4 best = min(edges[current],5 key=lambda u: dist(u, query))6 if dist(best, query) >= dist(current, query):7 return current # no neighbor is closer8 current = best # step downhill
Not sure what to ask? Tap a question — the staff engineer answers in the chat panel.