Build a graph database (Neo4j / Dgraph-style) (16 scenes)
Scene 14 · Design your graph database
Every graph-DB deployment is a deliberate set of choices — storage layout, index strategy, supernode handling, single-node ACID vs distributed, and traversal-heavy vs aggregate-heavy workload — and the right configuration for a fraud-ring traversal is wrong for a PageRank pipeline even though the primitives are identical.
Previously
You've felt every force: index-free adjacency makes local traversal fly, the supernode and the partition cut break it, ACID anchors correctness on one machine, and Pregel inverts the payoff for whole-graph work. The capstone is choosing deliberately for a real workload — every knob traceable to the scene that justified it, with the local-vs-global spine as your compass.
Scene 14
Design your graph database
Diagram
The capstone canvas. Four workload cards are docked across the top — each carries a one-line constraints summary (mutating or read-mostly, fits one box or too big, supernodes, local hops or whole-graph). One card is live at a time. Down the side sits the palette of every knob the arc earned, in six groups: storage layout (doubly-linked chains vs CSR array), index strategy (the anchor you SEEK before you EXPAND), supernode handling (expand the cheap direction, dense-node grouping, relationship-chain locks), consistency (single-primary ACID vs distributed/weaker isolation), distribution (single node, edge-cut, vertex-cut, predicate sharding), and workload model (on-demand local traversal vs whole-graph Pregel). As you choose knobs, the verifier grades each one against the LIVE workload — green 'fits', amber 'wasteful' (you're paying on an axis this workload doesn't constrain), red 'violation' (it breaks a hard limit) — and every verdict cites the scene that justifies it. The compass is the local-vs-global spine: a local query lights a few nodes; a whole-graph pass lights them all.
Sources
- docRobinson, Webber & Eifrem — Graph Databases (O'Reilly, 2nd ed)
- blogRelationship Chain Locks: Don't Block the Rock (Neo4j)
- blogPowerGraph: distributed graph-parallel computation (the morning paper)
- docDgraph design concepts — minimizing network calls
- docPregel: A System for Large-Scale Graph Processing (Malewicz et al.)
four workloads, one toolkit
the spine is your compass: local lights a few · global lights all
every knob traces back to a scene →
Here is everything the arc built, in one place. Four workload cards are docked across the top — *fraud-ring traversal*, *knowledge graph*, *social feed*, *nightly PageRank* — and each card states its real constraints: does the graph mutate or sit read-mostly, does it fit one box or sprawl across machines, are there celebrity supernodes, and does a query touch a few nodes or every node. Down the side is the palette: *storage layout* (mutable doubly-linked chains vs compact CSR), the *index* anchor you SEEK before you EXPAND, *supernode handling*, *consistency* (single-primary ACID vs distributed), *distribution* (single node, edge-cut, vertex-cut, predicate sharding), and the *workload model* (on-demand local traversal vs whole-graph Pregel). Every one of those knobs was earned in an earlier scene. The job now is not to learn anything new — it's to read each workload off the local-vs-global spine and pick the honest configuration. The same toolkit; four different right answers.
Not sure what to ask? Tap a question — the staff engineer answers in the chat panel.