Build a vector database (Pinecone / Weaviate / pgvector style) (15 scenes)
Scene 03 · The Flat index — the 100%-recall baseline
Wrap the brute-force scan as an index called Flat: recall 1.0 by definition. It's the yardstick every faster index is scored against — and it defines the two axes, recall@k and latency.
Previously

With a distance metric we can name the true neighbors; now we wrap the brute-force scan into an index called Flat so we have a perfect-recall yardstick. The next problem is obvious: Flat is correct but far too slow, so how do we avoid comparing the query against every single vector?

Scene 03
The Flat index — the 100%-recall baseline
Diagram
A scoreboard chart you'll see in every scene from here on. The vertical axis is how CORRECT a search is (how many of the genuinely-closest songs it returned); the horizontal axis is how SLOW it is. The good corner is top-left: very correct and very fast. Today there's exactly one dot — 'Flat', the plain scan — parked at the very top (perfectly correct) but far to the right (slow). Every faster trick in later scenes drops its own dot onto this same chart, and you'll watch them spread out.
↖ ideal: high recall · low latency0.000.000.250.250.500.500.750.751.001.00recall@k (up = more correct)latency (right = slower)Flat · recall@3…INDEXFlat · recall@3…current configFlat owns the top — perfect — but lives on the …Flat returns the true top-3 [11, 14, 6…] over 14 songs. One dot today; every faster index drops its own dot h…
Two scenes ago you watched a *brute-force scan*: compare the query to every stored song, one at a time, and keep the closest. That scan was never given a name — but it is, in fact, a perfectly real way to run a search. Engineers call it the *Flat index*: 'flat' because it keeps every vector in one plain list and walks the whole list on every query. Watch it run over our 14 songs. It touches all of them, then ranks them, and returns the genuine three closest to 'Now Playing': Midnight Drive (#11), Lonely Synth (#14), Synth Dawn (#6). Because it checked everything, it cannot be wrong — those ARE the true nearest neighbors. On the chart to the side, a single dot appears for Flat: pinned to the very top (perfectly correct) and far to the right (slow, because it looked at everything).
Implementation
FlatIndex.search
compare the query to every vector — exact top-k
1def search(query, k):
2 scored = []
3 for v in all_vectors: # touches EVERY vector
4 scored.append((distance(query, v), v.id))
5 scored.sort() # nearest first
6 return [id for _, id in scored[:k]]
recall_at_k
the scorecard — measured AGAINST Flat's result
1def recall_at_k(approx_ids, true_ids, k):
2 true_set = set(true_ids[:k]) # Flat's exact top-k
3 hits = len(set(approx_ids[:k]) & true_set)
4 return hits / k # 1.0 means perfect
5 # Flat scored against itself is always 1.0
Not sure what to ask? Tap a question — the staff engineer answers in the chat panel.