You still need an index to start

Build a graph database (Neo4j / Dgraph-style) (16 scenes)

Scene 05 · You still need an index to start

Index-free adjacency only covers EXPAND (traversing from a node you hold); to SEEK the anchor — 'start from Alice' — you still need a regular index, and without one finding the start node is a full label scan over every :User.

Previously

The doubly-linked chain lets us EXPAND from a node we hold — but we still had to find Alice. That first step, the SEEK, needs an ordinary index; index-free adjacency never covered it. Pull that index out and watch the start node turn into a full scan. So the store has TWO jobs with two cost models — and that raises a deeper storage question we've dodged.

Scene 05

You still need an index to start

Watch

Diagram

The same social graph. A query runs in two phases. First the SEEK: an ordinary B-tree index on User.name is consulted to find the anchor — the node the query starts from, Alice (ringed). Then the EXPAND: from Alice the engine follows relationship pointers — the index-free O(1)-per-hop walk from earlier scenes — lighting only the few nodes it reaches. The toggle removes the index: with no index the SEEK can't jump to Alice, so it falls back to a full label scan that lights EVERY :User node before EXPAND can even begin. The inset keeps the spine: SEEK lights one anchor, EXPAND a few.

Sources

Last scene we walked a node's relationship chain — but we ASSUMED we were already standing on Alice. How did we get there? Watch the two phases. First the SEEK: a plain B-tree index on User.name lights up and jumps straight to Alice (the ringed node) — one O(log N) lookup, the same kind of index a relational store uses. Only now do we hold a node. Second the EXPAND: from Alice we follow relationship pointers — the index-free O(1) walk — hop by hop to her friends and their friends. Two phases, two completely different cost models.

Implementation

Query.run

every query is two phases — SEEK the anchor, then EXPAND from it

1def run(label, prop, value, depth):
2    # phase 1: SEEK — find the start node by value (once)
3    anchor = seek_anchor(label, prop, value)
4    # phase 2: EXPAND — index-free pointer walk from it
5    return expand(anchor, depth)

Query.seek_anchor

find the start node by value — indexed, or a full label scan

1def seek_anchor(label, prop, value):
2    idx = index_for(label, prop)
3    if idx is not None:
4        return idx.lookup(value)      # O(log N) once
5    # no index → full label scan
6    for node in all_nodes_with(label):  # O(N) every query
7        if node[prop] == value:
8            return node

Index.create

what the toggle does: register a B-tree so seek_anchor can look up, not scan

1def create_index(label, prop):
2    btree = BTree()
3    for node in all_nodes_with(label):  # one-time build
4        btree.insert(node[prop], node)
5    catalog[(label, prop)] = btree    # now index_for() finds it
6    return btree

Query.expand

from the anchor, follow relationship pointers (index-free)

1def expand(anchor, depth):
2    frontier = [anchor]
3    for _ in range(depth):
4        nxt = []
5        for node in frontier:
6            rel = node.first_rel       # pointer, not an index
7            while rel is not None:     # O(1) per edge
8                nxt.append(rel.end_node)
9                rel = rel.next_rel
10        frontier = nxt
11    return frontier

Not sure what to ask? Tap a question — the staff engineer answers in the chat panel.

PreviousStore records and the relationship chain NextLinked list vs CSR adjacency