Build a graph database (Neo4j / Dgraph-style) (16 scenes)
Scene 12 · Search indexes bolted on the side
Index-free adjacency only serves EXPAND, so finding start nodes by value — full-text, range, geo, vector — is served by classic secondary indexes (Lucene, B-tree) maintained as separate structures riding shotgun, with the usual write-amplification and staleness costs.
Previously

Sharding placed the edges; it never solved finding the anchor by value. That job goes to classic indexes — full-text, B-tree, vector — bolted on the side, feeding start nodes into the native pointer-chase, with their own staleness costs. So now we have the complete picture for LOCAL work: SEEK an anchor, then EXPAND cheaply. But every query we've built lights up only a few nodes. What about the questions that need the WHOLE graph?

Scene 12
Search indexes bolted on the side
Diagram
The native pointer-chase store is in the CENTER — that's index-free adjacency, which only serves EXPAND (following edges you already hold). Bolted on the SIDE are separate structures that find a start node by VALUE: a full-text index (Lucene) for free-text search, a B-tree for exact/range lookups, and a vector index for nearest-by-embedding. Each one is a secondary index: it answers 'which node matches this value?' and hands ONE start-node id into the core's SEEK; the graph engine then expands natively. A write into the core has to be re-applied to every side box too — boxes that don't update on the write path lag behind and go STALE.
SECONDARY INDEXES (bolted on the side)NATIVE CORE (EXPAND only)native pointer-chasestore (EXPAND)SEEK ← anchor idindex-free adjacency · O(1)/hopEach search box hands ONE start-node id to the core — then the native pointer-chase takes over (EXPAND).
The native pointer-chase store sits in the center — that's the index-free adjacency you built: follow edges you already hold, O(1) per hop. But it can ONLY expand from a node you already have. To start 'from the product whose description says cordless drill', or 'from users aged 30-40', or 'from the photo most similar to this embedding', the engine asks a SEPARATE box on the side. Watch each search box light up and hand exactly one start-node id into the core's SEEK — then the native walk takes over.
Implementation
Query.run
value-search SEEKs an anchor (side index), then EXPANDs natively
1def run(query):
2 # SEEK: find the start node BY VALUE — not possible natively
3 index = pick_secondary_index(query.predicate) # lucene | btree | vector
4 start_id = index.lookup(query.value) # value -> node id
5 # EXPAND: index-free adjacency, O(1) per hop
6 node = store.load(start_id)
7 return traverse(node, query.pattern) # follow pointers
Store.write
a write must fan out to every secondary index (amp / staleness)
1def write(node, change):
2 store.apply(change) # the core mutation
3 for index in secondary_indexes:
4 if index.sync:
5 index.reindex(node) # on the write path -> write-amp
6 else:
7 enqueue_async(index, node) # lags -> stale window
Planner.pickIndex
each value predicate routes to its OWN side structure — none of it is the native walk
1def pick_index(predicate):
2 # the core can only EXPAND, so route by value-kind
3 if predicate.kind == TEXT:
4 return lucene # words -> node ids (free-text)
5 if predicate.kind == RANGE:
6 return btree # value/range -> node ids
7 if predicate.kind == VECTOR:
8 return vector # nearest embedding -> node ids
9 raise NoIndex # else: full label scan
Index.lookup
a side index returns whatever it currently maps — fresh only if it re-applied the last write
1def lookup(value):
2 # this structure is maintained SEPARATELY from the core
3 node_id = self.map.get(value) # value -> node id
4 # if a write hasn't been re-applied here yet,
5 # self.map still holds the pre-write entry
6 return node_id # may point at a since-changed node
Not sure what to ask? Tap a question — the staff engineer answers in the chat panel.