Build a vector database (Pinecone / Weaviate / pgvector style) (15 scenes)
Scene 12 · When the filter disconnects the graph
Pre-filtering an HNSW search makes only matching nodes walkable — and if the path to a valid neighbor ran through a filtered-out node, the greedy walk dead-ends and recall craters.
Previously

Pre-filtering avoided starvation, but on an HNSW graph it can sever the very paths the greedy walk needs — and now we've seen exactly how: the bridge node got filtered out and the walk dead-ended. Engines fix it with extra edges, two-hop jumps, or a brute-force fallback. But there's a different limit filters can't touch: similarity itself sometimes misses the exact word the user typed.

Scene 12
When the filter disconnects the graph
Diagram
The same 14-song graph from the HNSW scenes, but now a metadata filter ('genre = electronic') has grayed out every non-matching song. Only the solid nodes are walkable. The greedy walk starts at an entry node and hops toward the query 'Now Playing' — but the short edge it needs runs to 'Synth Dawn' (#6), which got filtered out. With that bridge gone, the walk has no walkable step closer to the query and DEAD-ENDS (the ✕). 'Lonely Synth' (#14, red) is a song that matches the filter and sits right by the query, yet it's stranded on the far side of the missing bridge. The inset is the running recall-vs-latency chart; the filtered run's dot drops far down the recall axis until a fix is applied.
acoustic → electroniccalm → energeticLo-fi RainAcoustic Suns…Campfire FolkCoffeehouseIndie DriveSynth DawnNeon CityClub PulseRave PeakBass DropMidnight DriveGarage BeatStudy BeatsLonely SynthNow PlayingPre-filter grays out non-matching songs — the walk needs a node that's gone.RECALL vs LATENCYrecallslower →FlatHNSW (no …Filtered …
Last scene, pre-filtering looked like the safe answer: remove the non-matching songs first, then search only what's left — no starvation. Here is the trap. We're searching the HNSW graph from scenes 6-7, but first we apply the filter 'genre = electronic'. About 80% of the songs don't match, so they gray out and the walk is only allowed to step on the solid ones. Watch the greedy walk start at Indie Drive (#5) and head for the query. To get closer it needs to hop to 'Synth Dawn' (#6) — but #6 isn't electronic, so it was filtered out. With that one node gone, there's no walkable step any closer to the query. The walk dead-ends. And 'Lonely Synth' (#14, red) — which DOES match the filter and sits right next to the query — is stranded on the far side of the missing bridge, never reached.
Implementation
Index.search
the dispatcher: scan tiny match-sets, else walk the graph
1def search(query, predicate, k):
2 matching = nodes_where(predicate)
3 # below the cutoff the graph isn't worth it
4 if len(matching) <= flatSearchCutOff:
5 return exact_topk(query, matching, k)
6 return filtered_greedy_walk(
7 query, matching, k,
8 )
HNSW.filteredGreedyWalk
best-first hops, but only onto nodes that match the filter
1def filtered_greedy_walk(query, matching, k):
2 frontier = [entry_node] # size ef_search
3 while frontier.improving():
4 cur = frontier.closest_to(query)
5 for nbr in neighbors(cur):
6 if nbr in matching:
7 frontier.add(nbr) # walkable
8 elif two_hop:
9 for far in neighbors(nbr): # ACORN
10 if far in matching:
11 frontier.add(far) # skip the gap
12 return frontier.topk(k)
HNSW.buildEdges
wiring the graph — filterable HNSW adds same-category links
1def build_edges(node):
2 link(node, nearest_neighbors(node, M))
3 # filterable HNSW: also link to same-category
4 # nodes so a walk inside one category never
5 # depends on a node another filter removes
6 if filterable_hnsw:
7 peers = same_category(node)
8 link(node, nearest(peers, M))
Not sure what to ask? Tap a question — the staff engineer answers in the chat panel.