Build a graph database (Neo4j / Dgraph-style) (16 scenes)
Scene 01 · Friends of friends melts a join
The canonical graph workload — 'friends of friends of friends' — is one self-join per hop, and the intermediate row count explodes combinatorially until a relational engine can't finish; the same query as a pointer walk visits only the people you actually reach.
Scene 01
Friends of friends melts a join
Diagram
Two ways to answer one question — 'who are Alice's friends of friends of friends?' LEFT: a relational store keeps friendships in a join table (USER_FRIEND), so each extra hop is another self-join of that table to itself; the intermediate-row counter explodes 50 → 2,500 → 6.25M → 312M until the engine can't finish. RIGHT: the SAME question answered as a pointer walk that starts on Alice and steps from person to person — each step is a HOP (traverse one edge), and the walk only ever lights up the people it actually reaches. A store built so that a hop is a step you FOLLOW rather than a table you re-join is a GRAPH DATABASE — its cost tracks the reachable set it visits, not the table's cartesian product. (The faint greyed corner is the course's spine: a local walk lights a few nodes, but whole-graph work lights up everything — the tension the finale resolves.)
One question: 'who are Alice's friends of friends of friends?' On the LEFT, a relational store answers it by joining the friendship table to itself once per hop — watch the intermediate-row counter as the depth climbs. On the RIGHT, the same question is answered by walking from Alice, person to person, lighting up only the people the walk actually reaches. Watch the two numbers pull apart.
Implementation
Relational.friendsOfFriends
answering it as a table: each hop joins USER_FRIEND to itself once more
1def friends_of_friends(start, depth):2 rows = [(start,)] # depth 0: just Alice3 for hop in range(depth): # ONE self-join per hop4 # join USER_FRIEND again: every row fans to its friends5 rows = [(*r, f)6 for r in rows7 for f in friends_of(r[-1])] # ~degree each8 return dedup(last_col(rows)) # ...after materializing rows
Walk.expand
answering it as a walk: step edge by edge, never revisiting a person
1def expand(start, depth):2 seen = {start}3 frontier = [start]4 for hop in range(depth): # one step per hop5 nxt = []6 for person in frontier:7 for friend in neighbors(person): # follow an edge8 if friend not in seen: # skip the visited9 seen.add(friend); nxt.append(friend)10 frontier = nxt # only the newly reached11 return seen - {start}
cost(depth)
the two cost curves side by side, as the depth knob drives them
1def relational_cost(depth):2 return DEGREE ** depth # 50^d candidate rows34def walk_cost(depth):5 # bounded by distinct people reachable, not 50^d6 return len(reachable_within(start, depth))
Not sure what to ask? Tap a question — the staff engineer answers in the chat panel.