Build a vector database (Pinecone / Weaviate / pgvector style) (15 scenes)
Scene 03 · The Flat index — the 100%-recall baseline
Wrap the brute-force scan as an index called Flat: recall 1.0 by definition. It's the yardstick every faster index is scored against — and it defines the two axes, recall@k and latency.
Previously
With a distance metric we can name the true neighbors; now we wrap the brute-force scan into an index called Flat so we have a perfect-recall yardstick. The next problem is obvious: Flat is correct but far too slow, so how do we avoid comparing the query against every single vector?
Scene 03
The Flat index — the 100%-recall baseline
Diagram
A scoreboard chart you'll see in every scene from here on. The vertical axis is how CORRECT a search is (how many of the genuinely-closest songs it returned); the horizontal axis is how SLOW it is. The good corner is top-left: very correct and very fast. Today there's exactly one dot — 'Flat', the plain scan — parked at the very top (perfectly correct) but far to the right (slow). Every faster trick in later scenes drops its own dot onto this same chart, and you'll watch them spread out.
Two scenes ago you watched a *brute-force scan*: compare the query to every stored song, one at a time, and keep the closest. That scan was never given a name — but it is, in fact, a perfectly real way to run a search. Engineers call it the *Flat index*: 'flat' because it keeps every vector in one plain list and walks the whole list on every query. Watch it run over our 14 songs. It touches all of them, then ranks them, and returns the genuine three closest to 'Now Playing': Midnight Drive (#11), Lonely Synth (#14), Synth Dawn (#6). Because it checked everything, it cannot be wrong — those ARE the true nearest neighbors. On the chart to the side, a single dot appears for Flat: pinned to the very top (perfectly correct) and far to the right (slow, because it looked at everything).
Implementation
FlatIndex.search
compare the query to every vector — exact top-k
1def search(query, k):2 scored = []3 for v in all_vectors: # touches EVERY vector4 scored.append((distance(query, v), v.id))5 scored.sort() # nearest first6 return [id for _, id in scored[:k]]
recall_at_k
the scorecard — measured AGAINST Flat's result
1def recall_at_k(approx_ids, true_ids, k):2 true_set = set(true_ids[:k]) # Flat's exact top-k3 hits = len(set(approx_ids[:k]) & true_set)4 return hits / k # 1.0 means perfect5 # Flat scored against itself is always 1.0
Not sure what to ask? Tap a question — the staff engineer answers in the chat panel.