Think like a vertex: Pregel / BSP

Build a graph database (Neo4j / Dgraph-style) (16 scenes)

Scene 13 · Think like a vertex: Pregel / BSP

Whole-graph algorithms have no locality to exploit, so the model flips to Pregel/BSP — every vertex runs a small function, sends messages to neighbors, then all vertices wait at a synchronization barrier before the next superstep — and a supernode at the barrier stalls everyone.

Previously

Local queries light up a few nodes and everything we built makes them fly. PageRank lights up EVERY node, every round — no locality, so index-free adjacency buys nothing. The model flips to thinking like a vertex: compute, message neighbors, wait at a barrier, repeat. And the supernode is back one last time, stalling the barrier while everyone waits. You've now seen every force in graph databases — time to make the calls yourself.

Scene 13

Think like a vertex: Pregel / BSP

Watch

Diagram

A tiny 5-vertex graph running PageRank. In each round every vertex pushes its score along its out-edges (the message arrows), then ALL vertices stop at the horizontal line — the synchronization barrier — and only once everyone has arrived do scores update and the next round begins. One such round (compute → message → barrier → update) is a superstep — the Pregel/BSP unit of work. The whole graph lights up every round: that's the spine inset's contrast — a local k-hop query from scene 1 lit only a few nodes; PageRank lights ALL of them, every superstep. Wire in The Rock and watch the barrier wait on the one straggler with millions of messages — the pass runs as slow as its slowest vertex.

Sources

PageRank over a tiny 5-vertex graph. Watch one round: every vertex sends its current score, split across its out-edges, to its neighbors. Then everyone STOPS at the horizontal line — nobody reads the next round's numbers until all the messages from this round have landed. Only then do scores update and the next round begins. Notice the whole graph lights up — unlike the k-hop query from scene 1 that lit only a couple of nodes.

Implementation

Pregel.run

the BSP driver: run supersteps until convergence, barrier between each

1def run(graph, compute):
2    init_scores(graph, 1 / N)
3    for step in range(MAX_SUPERSTEPS):  # ~20-50 for PageRank
4        for vertex in graph:            # every vertex, every round
5            inbox = messages_for(vertex)
6            compute(vertex, inbox)      # send new messages
7        barrier()  # ALL must arrive before the next superstep starts
8        if converged(graph): break

PageRank.compute (per vertex)

pr = 0.15/N + 0.85·Σ(incoming share); a supernode's inbox is huge

1def compute(vertex, inbox):
2    incoming = sum(inbox)            # supernode: millions of messages
3    vertex.score = 0.15 / N + 0.85 * incoming
4    share = vertex.score / out_degree(vertex)
5    for neighbor in vertex.out_edges:
6        send(neighbor, share)        # push along every out-edge

Barrier.await (per superstep)

every worker signals arrival; the round ends only when the LAST one does

1def await(workers):
2    for w in workers:
3        w.signal_arrived()        # 'I finished this round'
4    while not all_arrived(workers):
5        wait()                    # everyone idles for the laggard
6    release()                     # next superstep may begin

Not sure what to ask? Tap a question — the staff engineer answers in the chat panel.

PreviousSearch indexes bolted on the side NextDesign your graph database