Build a graph database (Neo4j / Dgraph-style) (16 scenes)
Scene 11 · The cut turns hops into RPCs
Split a graph across two machines and any cut severs edges, so a traversal that crosses the cut turns each pointer dereference into a network round trip — the boundary is exactly where O(1) becomes O(network).
Previously
On one primary every pointer was local. Split the graph and the cut severs edges — a hop across it becomes a network RPC, so O(1) becomes O(network). That's unavoidable for a connected graph. But HOW you cut decides how MUCH crosses — and on real, power-law graphs, the obvious cut (assign each vertex to a machine) lands every one of a celebrity's edges on a single node.
Scene 11
The cut turns hops into RPCs
Diagram
The same social graph, now split across two machines by a vertical slice. The partition cut is that slice — the line of severed edges whose two endpoints landed on opposite machines. A hop ALONG an edge that stays on one machine is a local pointer dereference (a fast solid arrow, ~1 µs). A cross-partition hop (network RPC) is a hop along a cut-crossing edge: it leaves the machine and makes a network round trip (a dashed 'RPC' arrow stamped ~1 ms). Left = machine 0, right = machine 1; Alice is the anchor every walk starts from. The inset is the persistent spine: the same few nodes light up, but locality is now a PHYSICAL property — whether a lit node sits on your machine or across the wire.
On one machine, a friends-of-friends walk from Alice was three pointer dereferences — microseconds. Now watch what splitting the graph across two machines does. First the slice drops and the edges it severs flash. That severed slice is the partition cut. Then the same 3-hop walk runs — but now it zig-zags across the cut, and each crossing is no longer a pointer follow. It's a network round trip: a cross-partition hop, an RPC. The hop count is identical; the cost is a thousand times higher.
Implementation
Traversal.follow_edge
one hop: pointer dereference if local, network RPC if it crosses the cut
1def follow_edge(cur, edge):2 if machine_of(edge.other_end) == this_machine:3 # local hop: index-free adjacency, ~1 µs4 return deref_pointer(edge) # pointer follow5 else:6 # the edge crosses the partition cut7 # ~1 ms: send a request, wait for the reply8 return rpc(machine_of(edge.other_end), edge)
Traversal.walk_cost
3 hops: total time = local follows + cut-crossing RPCs
1def walk_cost(path):2 cost = 03 for hop in path: # always 3 hops here4 if crosses_cut(hop):5 cost += RPC_MS # ~1 ms network round trip6 else:7 cost += LOCAL_US # ~1 µs pointer follow8 return cost # boundary = where O(1) becomes O(network)
Partitioner.edge_cut_assign
assign each vertex to a machine — the edges that span machines ARE the cut
1def edge_cut_assign(graph):2 for v in graph.vertices:3 machine[v] = pick_machine(v) # 1 vertex → 1 machine4 cut = []5 for (a, b) in graph.edges:6 if machine[a] != machine[b]: # endpoints split7 cut.append((a, b)) # severed: a hop here is an RPC8 return machine, cut # connected graph ⇒ cut is never empty
Not sure what to ask? Tap a question — the staff engineer answers in the chat panel.