Build a graph database (Neo4j / Dgraph-style) (16 scenes)
Scene 12 · Search indexes bolted on the side
Index-free adjacency only serves EXPAND, so finding start nodes by value — full-text, range, geo, vector — is served by classic secondary indexes (Lucene, B-tree) maintained as separate structures riding shotgun, with the usual write-amplification and staleness costs.
Previously
Sharding placed the edges; it never solved finding the anchor by value. That job goes to classic indexes — full-text, B-tree, vector — bolted on the side, feeding start nodes into the native pointer-chase, with their own staleness costs. So now we have the complete picture for LOCAL work: SEEK an anchor, then EXPAND cheaply. But every query we've built lights up only a few nodes. What about the questions that need the WHOLE graph?
Scene 12
Search indexes bolted on the side
Diagram
The native pointer-chase store is in the CENTER — that's index-free adjacency, which only serves EXPAND (following edges you already hold). Bolted on the SIDE are separate structures that find a start node by VALUE: a full-text index (Lucene) for free-text search, a B-tree for exact/range lookups, and a vector index for nearest-by-embedding. Each one is a secondary index: it answers 'which node matches this value?' and hands ONE start-node id into the core's SEEK; the graph engine then expands natively. A write into the core has to be re-applied to every side box too — boxes that don't update on the write path lag behind and go STALE.
The native pointer-chase store sits in the center — that's the index-free adjacency you built: follow edges you already hold, O(1) per hop. But it can ONLY expand from a node you already have. To start 'from the product whose description says cordless drill', or 'from users aged 30-40', or 'from the photo most similar to this embedding', the engine asks a SEPARATE box on the side. Watch each search box light up and hand exactly one start-node id into the core's SEEK — then the native walk takes over.
Implementation
Query.run
value-search SEEKs an anchor (side index), then EXPANDs natively
1def run(query):2 # SEEK: find the start node BY VALUE — not possible natively3 index = pick_secondary_index(query.predicate) # lucene | btree | vector4 start_id = index.lookup(query.value) # value -> node id5 # EXPAND: index-free adjacency, O(1) per hop6 node = store.load(start_id)7 return traverse(node, query.pattern) # follow pointers
Store.write
a write must fan out to every secondary index (amp / staleness)
1def write(node, change):2 store.apply(change) # the core mutation3 for index in secondary_indexes:4 if index.sync:5 index.reindex(node) # on the write path -> write-amp6 else:7 enqueue_async(index, node) # lags -> stale window
Planner.pickIndex
each value predicate routes to its OWN side structure — none of it is the native walk
1def pick_index(predicate):2 # the core can only EXPAND, so route by value-kind3 if predicate.kind == TEXT:4 return lucene # words -> node ids (free-text)5 if predicate.kind == RANGE:6 return btree # value/range -> node ids7 if predicate.kind == VECTOR:8 return vector # nearest embedding -> node ids9 raise NoIndex # else: full label scan
Index.lookup
a side index returns whatever it currently maps — fresh only if it re-applied the last write
1def lookup(value):2 # this structure is maintained SEPARATELY from the core3 node_id = self.map.get(value) # value -> node id4 # if a write hasn't been re-applied here yet,5 # self.map still holds the pre-write entry6 return node_id # may point at a since-changed node
Not sure what to ask? Tap a question — the staff engineer answers in the chat panel.