Build a graph database (Neo4j / Dgraph-style) (16 scenes)
Scene 05 · You still need an index to start
Index-free adjacency only covers EXPAND (traversing from a node you hold); to SEEK the anchor — 'start from Alice' — you still need a regular index, and without one finding the start node is a full label scan over every :User.
Previously
The doubly-linked chain lets us EXPAND from a node we hold — but we still had to find Alice. That first step, the SEEK, needs an ordinary index; index-free adjacency never covered it. Pull that index out and watch the start node turn into a full scan. So the store has TWO jobs with two cost models — and that raises a deeper storage question we've dodged.
Scene 05
You still need an index to start
Diagram
The same social graph. A query runs in two phases. First the SEEK: an ordinary B-tree index on User.name is consulted to find the anchor — the node the query starts from, Alice (ringed). Then the EXPAND: from Alice the engine follows relationship pointers — the index-free O(1)-per-hop walk from earlier scenes — lighting only the few nodes it reaches. The toggle removes the index: with no index the SEEK can't jump to Alice, so it falls back to a full label scan that lights EVERY :User node before EXPAND can even begin. The inset keeps the spine: SEEK lights one anchor, EXPAND a few.
Last scene we walked a node's relationship chain — but we ASSUMED we were already standing on Alice. How did we get there? Watch the two phases. First the SEEK: a plain B-tree index on User.name lights up and jumps straight to Alice (the ringed node) — one O(log N) lookup, the same kind of index a relational store uses. Only now do we hold a node. Second the EXPAND: from Alice we follow relationship pointers — the index-free O(1) walk — hop by hop to her friends and their friends. Two phases, two completely different cost models.
Implementation
Query.run
every query is two phases — SEEK the anchor, then EXPAND from it
1def run(label, prop, value, depth):2 # phase 1: SEEK — find the start node by value (once)3 anchor = seek_anchor(label, prop, value)4 # phase 2: EXPAND — index-free pointer walk from it5 return expand(anchor, depth)
Query.seek_anchor
find the start node by value — indexed, or a full label scan
1def seek_anchor(label, prop, value):2 idx = index_for(label, prop)3 if idx is not None:4 return idx.lookup(value) # O(log N) once5 # no index → full label scan6 for node in all_nodes_with(label): # O(N) every query7 if node[prop] == value:8 return node
Index.create
what the toggle does: register a B-tree so seek_anchor can look up, not scan
1def create_index(label, prop):2 btree = BTree()3 for node in all_nodes_with(label): # one-time build4 btree.insert(node[prop], node)5 catalog[(label, prop)] = btree # now index_for() finds it6 return btree
Query.expand
from the anchor, follow relationship pointers (index-free)
1def expand(anchor, depth):2 frontier = [anchor]3 for _ in range(depth):4 nxt = []5 for node in frontier:6 rel = node.first_rel # pointer, not an index7 while rel is not None: # O(1) per edge8 nxt.append(rel.end_node)9 rel = rel.next_rel10 frontier = nxt11 return frontier
Not sure what to ask? Tap a question — the staff engineer answers in the chat panel.