Build a graph database (Neo4j / Dgraph-style) (16 scenes)
Scene 13 · Think like a vertex: Pregel / BSP
Whole-graph algorithms have no locality to exploit, so the model flips to Pregel/BSP — every vertex runs a small function, sends messages to neighbors, then all vertices wait at a synchronization barrier before the next superstep — and a supernode at the barrier stalls everyone.
Previously
Local queries light up a few nodes and everything we built makes them fly. PageRank lights up EVERY node, every round — no locality, so index-free adjacency buys nothing. The model flips to thinking like a vertex: compute, message neighbors, wait at a barrier, repeat. And the supernode is back one last time, stalling the barrier while everyone waits. You've now seen every force in graph databases — time to make the calls yourself.
Scene 13
Think like a vertex: Pregel / BSP
Diagram
A tiny 5-vertex graph running PageRank. In each round every vertex pushes its score along its out-edges (the message arrows), then ALL vertices stop at the horizontal line — the synchronization barrier — and only once everyone has arrived do scores update and the next round begins. One such round (compute → message → barrier → update) is a superstep — the Pregel/BSP unit of work. The whole graph lights up every round: that's the spine inset's contrast — a local k-hop query from scene 1 lit only a few nodes; PageRank lights ALL of them, every superstep. Wire in The Rock and watch the barrier wait on the one straggler with millions of messages — the pass runs as slow as its slowest vertex.
PageRank over a tiny 5-vertex graph. Watch one round: every vertex sends its current score, split across its out-edges, to its neighbors. Then everyone STOPS at the horizontal line — nobody reads the next round's numbers until all the messages from this round have landed. Only then do scores update and the next round begins. Notice the whole graph lights up — unlike the k-hop query from scene 1 that lit only a couple of nodes.
Implementation
Pregel.run
the BSP driver: run supersteps until convergence, barrier between each
1def run(graph, compute):2 init_scores(graph, 1 / N)3 for step in range(MAX_SUPERSTEPS): # ~20-50 for PageRank4 for vertex in graph: # every vertex, every round5 inbox = messages_for(vertex)6 compute(vertex, inbox) # send new messages7 barrier() # ALL must arrive before the next superstep starts8 if converged(graph): break
PageRank.compute (per vertex)
pr = 0.15/N + 0.85·Σ(incoming share); a supernode's inbox is huge
1def compute(vertex, inbox):2 incoming = sum(inbox) # supernode: millions of messages3 vertex.score = 0.15 / N + 0.85 * incoming4 share = vertex.score / out_degree(vertex)5 for neighbor in vertex.out_edges:6 send(neighbor, share) # push along every out-edge
Barrier.await (per superstep)
every worker signals arrival; the round ends only when the LAST one does
1def await(workers):2 for w in workers:3 w.signal_arrived() # 'I finished this round'4 while not all_arrived(workers):5 wait() # everyone idles for the laggard6 release() # next superstep may begin
Not sure what to ask? Tap a question — the staff engineer answers in the chat panel.