The Flat index — the 100%-recall baseline — Build a vector database (Pinecone / Weaviate / pgvector style)

Build a vector database (Pinecone / Weaviate / pgvector style) (15 scenes)

Scene 03 · The Flat index — the 100%-recall baseline

Wrap the brute-force scan as an index called Flat: recall 1.0 by definition. It's the yardstick every faster index is scored against — and it defines the two axes, recall@k and latency.

Previously

With a distance metric we can name the true neighbors; now we wrap the brute-force scan into an index called Flat so we have a perfect-recall yardstick. The next problem is obvious: Flat is correct but far too slow, so how do we avoid comparing the query against every single vector?

Scene 03

The Flat index — the 100%-recall baseline

Watch

Diagram

A scoreboard chart you'll see in every scene from here on. The vertical axis is how CORRECT a search is (how many of the genuinely-closest songs it returned); the horizontal axis is how SLOW it is. The good corner is top-left: very correct and very fast. Today there's exactly one dot — 'Flat', the plain scan — parked at the very top (perfectly correct) but far to the right (slow). Every faster trick in later scenes drops its own dot onto this same chart, and you'll watch them spread out.

Sources

Two scenes ago you watched a *brute-force scan*: compare the query to every stored song, one at a time, and keep the closest. That scan was never given a name — but it is, in fact, a perfectly real way to run a search. Engineers call it the *Flat index*: 'flat' because it keeps every vector in one plain list and walks the whole list on every query. Watch it run over our 14 songs. It touches all of them, then ranks them, and returns the genuine three closest to 'Now Playing': Midnight Drive (#11), Lonely Synth (#14), Synth Dawn (#6). Because it checked everything, it cannot be wrong — those ARE the true nearest neighbors. On the chart to the side, a single dot appears for Flat: pinned to the very top (perfectly correct) and far to the right (slow, because it looked at everything).

Implementation

FlatIndex.search

compare the query to every vector — exact top-k

1def search(query, k):
2    scored = []
3    for v in all_vectors:        # touches EVERY vector
4        scored.append((distance(query, v), v.id))
5    scored.sort()                # nearest first
6    return [id for _, id in scored[:k]]

recall_at_k

the scorecard — measured AGAINST Flat's result

1def recall_at_k(approx_ids, true_ids, k):
2    true_set = set(true_ids[:k])  # Flat's exact top-k
3    hits = len(set(approx_ids[:k]) & true_set)
4    return hits / k               # 1.0 means perfect
5    # Flat scored against itself is always 1.0

Not sure what to ask? Tap a question — the staff engineer answers in the chat panel.

PreviousThe vector and the distance metric NextIVF — carve the space into cells