Build a graph database (Neo4j / Dgraph-style) (16 scenes)
Scene 09 · The supernode breaks the promise
A supernode is a vertex with a huge degree, and its relationship chain is so long that any traversal THROUGH it must scan millions of edges — so the O(1)-per-hop promise dies on that one node, and the fix is to expand from the low-degree side.
Previously
The planner anchors on the few, not the many — but what if the 'many' is unavoidable because a single node has millions of edges? That's a supernode, and expanding through its giant relationship chain scans millions of records: the O(1)-per-hop promise collapses on that one node. Reads are only half the damage, though — what happens when everyone tries to ADD an edge to that same celebrity at once?
Scene 09
The supernode breaks the promise
Diagram
The same social graph. Erin (node 5, badge '5') is a normal node — five relationship records. The Rock (node 11) is a supernode (a dense node): a vertex with a huge degree — degree is just the number of edges on a node — drawn as a fan of ~20 edges, each labeled ×10M to stand in for its 232M-record chain (a real hub is too big to draw). Expanding INTO that chain ('who follows The Rock?') must scan the whole fan, so the violet edges flood and the scan never finishes — that is the drowning. The inset is the persistent spine: a normal hop lights a few nodes; a hop through the supernode lights its entire chain.
Two nodes, side by side. Erin has five relationship records — a normal node; a traversal that grazes her lights just a few neighbors. The Rock has so many followers that we can only draw a fan, each edge labeled ×10M (it really has ~232M). Watch a traversal try to expand INTO The Rock: it has to walk its entire relationship chain. The scan floods the fan… and stalls. It can't finish — every hop the curriculum promised was O(1) just turned into 'scan millions of edges' on this one node.
Implementation
Traversal.expand
a hop walks the node's relationship chain — O(degree), not O(1)
1def expand(node, rel_type, direction):2 out = []3 # walk the doubly-linked relationship chain for this side4 rel = node.first_rel(rel_type, direction)5 while rel is not None: # O(degree) — the whole cost6 out.append(rel.other_end(node))7 rel = rel.next_in_chain # one pointer follow per edge8 return out910# inbound on The Rock: ~232M iterations → drowns11# outbound on The Rock: ~3 iterations → instant
Planner.pickExpandSide
expand cost is O(degree) on that side, so walk whichever direction is shorter
1def pick_expand_side(node, rel_type):2 # cost of a hop = how many edges hang off this side3 in_deg = node.degree(rel_type, INBOUND)4 out_deg = node.degree(rel_type, OUTBOUND)5 # walk from the low-degree side — fewer pointer follows6 if in_deg <= out_deg:7 return INBOUND8 return OUTBOUND910# The Rock: in_deg ~232M, out_deg ~3 → pick OUTBOUND11# (same node, opposite cost — degree, not the node, is the bill)
Store.denseNodeGroups
a dense node splits its one mixed chain into per-(type, direction) groups
1DENSE_THRESHOLD = 50 # ≥ this many edges → 'dense' layout23def first_rel(node, rel_type, direction):4 if node.degree() < DENSE_THRESHOLD:5 return node.head_of_single_chain # one mixed chain6 # dense: jump straight to the right group's head7 grp = node.group_record(rel_type, direction)8 return grp.chain_head # no scan to locate it910# so EXPAND lands on the short chain directly,11# never touching the giant one beside it
Not sure what to ask? Tap a question — the staff engineer answers in the chat panel.