Build a graph database (Neo4j / Dgraph-style) (16 scenes)
Scene 11.5 · Edge-cut, vertex-cut, predicate sharding
Edge-cut assigns each vertex to a machine and chokes on power-law hubs; vertex-cut splits the hub across machines to balance edges; predicate-sharding keeps all edges of one TYPE together so a common expand stays on one machine.
Previously
Cutting is unavoidable; the question was which cut crosses fewest edges. Edge-cut hotspots on hubs; vertex-cut splits the hub to balance edges; predicate-sharding keeps each edge type together so common expands stay local. The supernode broke naive partitioning, just as it broke reads and writes. But all of this is about TRAVERSAL by edges — what about finding a start node by its VALUE across a distributed store?
Scene 11.5
Edge-cut, vertex-cut, predicate sharding
Diagram
The same social graph, now split across two machines with a cut line down the middle. EDGE-CUT assigns each vertex to one machine and cuts the edges that span machines — so The Rock and all its edges land on machine 1 as a red hotspot. VERTEX-CUT assigns each EDGE to a machine and slices a high-degree vertex into mirrors across machines, balancing edges instead of vertices. PREDICATE SHARDING keeps all edges of one TYPE on one machine (all :FRIEND together, all :FOLLOWS together) so a 'give me all of X's friends' expand reads exactly one machine. Crossing edges render as dashed RPC hops with a latency stamp — a cut hop is a network round trip, not a pointer follow.
Same social graph, but it no longer fits on one machine — we have to cut it across two. Watch the simplest cut first: give every PERSON to one machine and cut whatever edges span the two. Step 1, the cut line drops. Step 2, look what happens to The Rock — it's a hub, so when its vertex lands on machine 1, ALL of its edges land there too. One machine ends up carrying half the graph's edges: a hotspot.
Implementation
Partitioner.place
where does each vertex / edge live? (the slider picks the branch)
1def place(graph, mode, M): # M machines2 if mode == 'edge-cut':3 for v in graph.vertices: # vertex → one machine4 home[v] = hash(v) % M # a hub's edges all follow v5 elif mode == 'vertex-cut':6 for e in graph.edges: # edge → one machine7 home[e] = hash(e) % M # split hubs: mirror across M8 elif mode == 'predicate':9 for e in graph.edges: # edge TYPE → one shard10 home[e] = shard_of[e.type] # all of one type together11 return home
Query.expand_friends
'all of X's friends' — how many machines does it touch?
1def expand_friends(X, mode):2 if mode == 'predicate':3 return read(shard_of['FRIEND']) # ONE machine, local4 # otherwise X's :FRIEND edges may live on several machines5 machines = {home[e] for e in edges(X, 'FRIEND')}6 return rpc_fanout(machines) # cross-cut network hops
StreamingPartitioner.assign
no cut crosses zero edges — min-cut is NP-hard, so we greedily place and accept some crossings
1def assign(graph, M): # one pass, heuristic2 for v in stream(graph.vertices):3 # score each machine: keep neighbors together, stay balanced4 best = argmax(m in machines,5 neighbors_on(v, m) - lambda * load[m])6 home[v] = best # greedy, not optimal7 load[best] += 18 # whatever's left spanning machines is the unavoidable cut:9 crossings = count(e for e in edges10 if home[e.from] != home[e.to])11 return home, crossings # crossings > 0 always
Not sure what to ask? Tap a question — the staff engineer answers in the chat panel.