Build a vector database (Pinecone / Weaviate / pgvector style) (15 scenes)
Scene 12 · When the filter disconnects the graph
Pre-filtering an HNSW search makes only matching nodes walkable — and if the path to a valid neighbor ran through a filtered-out node, the greedy walk dead-ends and recall craters.
Previously
Pre-filtering avoided starvation, but on an HNSW graph it can sever the very paths the greedy walk needs — and now we've seen exactly how: the bridge node got filtered out and the walk dead-ended. Engines fix it with extra edges, two-hop jumps, or a brute-force fallback. But there's a different limit filters can't touch: similarity itself sometimes misses the exact word the user typed.
Scene 12
When the filter disconnects the graph
Diagram
The same 14-song graph from the HNSW scenes, but now a metadata filter ('genre = electronic') has grayed out every non-matching song. Only the solid nodes are walkable. The greedy walk starts at an entry node and hops toward the query 'Now Playing' — but the short edge it needs runs to 'Synth Dawn' (#6), which got filtered out. With that bridge gone, the walk has no walkable step closer to the query and DEAD-ENDS (the ✕). 'Lonely Synth' (#14, red) is a song that matches the filter and sits right by the query, yet it's stranded on the far side of the missing bridge. The inset is the running recall-vs-latency chart; the filtered run's dot drops far down the recall axis until a fix is applied.
Last scene, pre-filtering looked like the safe answer: remove the non-matching songs first, then search only what's left — no starvation. Here is the trap. We're searching the HNSW graph from scenes 6-7, but first we apply the filter 'genre = electronic'. About 80% of the songs don't match, so they gray out and the walk is only allowed to step on the solid ones. Watch the greedy walk start at Indie Drive (#5) and head for the query. To get closer it needs to hop to 'Synth Dawn' (#6) — but #6 isn't electronic, so it was filtered out. With that one node gone, there's no walkable step any closer to the query. The walk dead-ends. And 'Lonely Synth' (#14, red) — which DOES match the filter and sits right next to the query — is stranded on the far side of the missing bridge, never reached.
Implementation
Index.search
the dispatcher: scan tiny match-sets, else walk the graph
1def search(query, predicate, k):2 matching = nodes_where(predicate)3 # below the cutoff the graph isn't worth it4 if len(matching) <= flatSearchCutOff:5 return exact_topk(query, matching, k)6 return filtered_greedy_walk(7 query, matching, k,8 )
HNSW.filteredGreedyWalk
best-first hops, but only onto nodes that match the filter
1def filtered_greedy_walk(query, matching, k):2 frontier = [entry_node] # size ef_search3 while frontier.improving():4 cur = frontier.closest_to(query)5 for nbr in neighbors(cur):6 if nbr in matching:7 frontier.add(nbr) # walkable8 elif two_hop:9 for far in neighbors(nbr): # ACORN10 if far in matching:11 frontier.add(far) # skip the gap12 return frontier.topk(k)
HNSW.buildEdges
wiring the graph — filterable HNSW adds same-category links
1def build_edges(node):2 link(node, nearest_neighbors(node, M))3 # filterable HNSW: also link to same-category4 # nodes so a walk inside one category never5 # depends on a node another filter removes6 if filterable_hnsw:7 peers = same_category(node)8 link(node, nearest(peers, M))
Not sure what to ask? Tap a question — the staff engineer answers in the chat panel.