Build a vector database (Pinecone / Weaviate / pgvector style) (15 scenes)
Scene 02 · The vector and the distance metric
'Closest' isn't one thing — L2 is the gap between arrowtips, cosine is the angle — and a metric that mismatches the embedding silently returns garbage. Normalize once and they agree.
Previously
Now that we can measure how close any two vectors are, we finally have a precise definition of 'the true nearest neighbors'. The next problem: that brute-force scan from scene 1 is the only thing that gets those neighbors perfectly right — so before we make anything faster, we need a way to MEASURE how much correctness a faster method gives up.
Scene 02
The vector and the distance metric
Diagram
Each song is drawn as an ARROW from the origin, so both its direction (angle) and its length (magnitude) are visible at once. The 'Now Playing' query is one arrow; a candidate is another. The overlay shows the active metric: L2 as the straight gap between the two arrowtips, cosine as the angle wedge between the two arrows, dot product as both together. Flip 'normalize' and every arrow snaps onto the unit circle — once all arrows are the same length, the three metrics rank the neighbors identically.
A *vector* (the list of numbers we turned each song into last scene) can be drawn as an arrow from the origin — its direction and its length both carry meaning. Watch the blue 'Now Playing' query arrow and the 'Big Wave' arrow beside it: Big Wave points almost the SAME direction but is much longer. Now ask 'how close are these?' two honest ways at once. The violet wedge measures only the *angle between the arrows* — tiny, so it says 'near-perfect match'. The amber line measures the *gap between the two arrowtips* — big, so it says 'far apart'. Same two arrows, two opposite verdicts. There is no single 'closest'; there's a CHOICE of metric.
Implementation
Metric.score
the three rules that turn two vectors into one number
1def l2(q, v): # tip gap2 return sqrt(sum((qi - vi)**23 for qi, vi in zip(q, v)))45def dot(q, v): # angle and length6 return sum(qi * vi for qi, vi in zip(q, v))78def cosine(q, v): # angle only9 return dot(q, v) / (norm(q) * norm(v))
Index.normalizeAtIngest
snap every stored vector to length 1, once
1def ingest(vectors):2 for v in vectors:3 if NORMALIZE:4 v = v / norm(v) # onto the unit circle5 store(v)6 # on unit-length vectors:7 # cosine(q, v) == dot(q, v)8 # argmax dot == argmin l2
Index.topK
score the query against every vector, rank, return k
1def top_k(query, k):2 q = query / norm(query) if NORMALIZE else query3 scored = []4 for v in stored_vectors:5 if METRIC == 'l2': s = -l2(q, v)6 elif METRIC == 'dot': s = dot(q, v)7 else: s = cosine(q, v)8 scored.append((s, v))9 scored.sort(reverse=True) # higher score = closer10 return scored[:k]
Not sure what to ask? Tap a question — the staff engineer answers in the chat panel.