Build a graph database (Neo4j / Dgraph-style) (16 scenes)
Scene 01 · Friends of friends melts a join
The canonical graph workload — 'friends of friends of friends' — is one self-join per hop, and the intermediate row count explodes combinatorially until a relational engine can't finish; the same query as a pointer walk visits only the people you actually reach.
Scene 01
Friends of friends melts a join
Diagram
Two ways to answer one question — 'who are Alice's friends of friends of friends?' LEFT: a relational store keeps friendships in a join table (USER_FRIEND), so each extra hop is another self-join of that table to itself; the intermediate-row counter explodes 50 → 2,500 → 6.25M → 312M until the engine can't finish. RIGHT: the SAME question answered as a pointer walk that starts on Alice and steps from person to person — each step is a HOP (traverse one edge), and the walk only ever lights up the people it actually reaches. A store built so that a hop is a step you FOLLOW rather than a table you re-join is a GRAPH DATABASE — its cost tracks the reachable set it visits, not the table's cartesian product. (The faint greyed corner is the course's spine: a local walk lights a few nodes, but whole-graph work lights up everything — the tension the finale resolves.)
Relational: one self-join per hopGraph database: follow pointersUSER_FRIEND(user, friend) · deg 50SELECT f4.friend FROM user_friend…depth d = 1 (friends)intermediate rows5050d12.5Kd2125Kd36.25Md4312Md5AliceBobCarolDaveErinFrankGraceHeidiIvanJudyThe RockThe MatrixInceptionGothamhoppeople you actually reach: 4Neo4j · d4: 1.3sMySQL · d4: 1543sMySQL · d5: did not finishDepth 1: the table self-joins to ~50 rows; the walk reached 4 people.whole-graph lights all
One question: 'who are Alice's friends of friends of friends?' On the LEFT, a relational store answers it by joining the friendship table to itself once per hop — watch the intermediate-row counter as the depth climbs. On the RIGHT, the same question is answered by walking from Alice, person to person, lighting up only the people the walk actually reaches. Watch the two numbers pull apart.
Implementation
Relational.friendsOfFriends
answering it as a table: each hop joins USER_FRIEND to itself once more
1def friends_of_friends(start, depth):
2 rows = [(start,)] # depth 0: just Alice
3 for hop in range(depth): # ONE self-join per hop
4 # join USER_FRIEND again: every row fans to its friends
5 rows = [(*r, f)
6 for r in rows
7 for f in friends_of(r[-1])] # ~degree each
8 return dedup(last_col(rows)) # ...after materializing rows
Walk.expand
answering it as a walk: step edge by edge, never revisiting a person
1def expand(start, depth):
2 seen = {start}
3 frontier = [start]
4 for hop in range(depth): # one step per hop
5 nxt = []
6 for person in frontier:
7 for friend in neighbors(person): # follow an edge
8 if friend not in seen: # skip the visited
9 seen.add(friend); nxt.append(friend)
10 frontier = nxt # only the newly reached
11 return seen - {start}
cost(depth)
the two cost curves side by side, as the depth knob drives them
1def relational_cost(depth):
2 return DEGREE ** depth # 50^d candidate rows
3
4def walk_cost(depth):
5 # bounded by distinct people reachable, not 50^d
6 return len(reachable_within(start, depth))
Not sure what to ask? Tap a question — the staff engineer answers in the chat panel.