Build a graph database (Neo4j / Dgraph-style) (16 scenes)
Scene 06 · Linked list vs CSR adjacency
Doubly-linked relationship lists are cache-hostile but mutate in place; Compressed Sparse Row packs a node's neighbors into one contiguous slice that streams through cache but rebuilds the whole array to add a single edge — the OLTP-graph vs OLAP-graph storage fork.
Previously
The doubly-linked list mutates cheaply, but it scatters across memory — every hop is a potential cache miss. CSR packs a node's neighbors into one contiguous slice that flies through cache, at the price of never being able to cheaply add an edge. That's the storage fork; now both layouts are just bytes. The thing missing is a way to ASK questions of them without hand-coding pointer walks.
Scene 06
Linked list vs CSR adjacency
Diagram
Alice's four FRIEND neighbors, drawn two ways. LEFT — a doubly-linked relationship list: each neighbor is a record sitting at its own scattered heap address, chained by prev/next pointers, so walking the chain jumps all over memory (a cache miss per hop). RIGHT — CSR (Compressed Sparse Row): one offset array plus one packed neighbor array, where Alice's neighbors are the contiguous slice neighborArray[offset[v]:offset[v+1]] that streams straight through cache. The 'insert a friend' action shows the fork: the list splices one cell in place, while CSR must rebuild the whole array. The three-way trade legend reads off mutability vs compactness vs cache-locality.
Last scene the relationship chain was a doubly-linked list because it's easy to insert and delete. But pointer-chasing across the heap is cache-hostile. Watch Alice's four friends drawn first as that scattered list — each cell at its own heap address, chained by prev/next — then drawn a second way: packed into one contiguous slice that reads straight through cache. Same four neighbors, two completely different memory layouts.
Implementation
CSR.neighbors
a vertex's neighbors are one contiguous slice — a single cache stream
1def neighbors(v, offset, neighbor_array):2 start = offset[v]3 end = offset[v + 1]4 # contiguous run — streams through cache5 return neighbor_array[start:end]
insert_edge
in-place splice on the list vs full rebuild on CSR
1def insert_list(cell, after): # doubly-linked list2 cell.next = after.next # rewire two pointers3 after.next = cell # O(1) — splice in place45def insert_csr(v, u, offset, arr): # CSR — no free slot6 arr = rebuild_with_edge(v, u) # O(total edges) rebuild7 return arr
List.walk
the read-side cost the left pane draws — each hop follows a pointer to a new heap address
1def walk_list(head): # doubly-linked relationship chain2 cell = head3 while cell is not None:4 visit(cell.id) # use this neighbor5 cell = cell.next # pointer → some other heap address6 # next cell is elsewhere → likely a cache miss
Store.pickLayout
match the layout to whether the graph mutates or just gets read
1def pick_layout(workload):2 if workload.mutates_constantly:3 return LINKED_LIST # splice edges in place, skip rebuilds4 else: # read-mostly: never inserts5 return CSR # contiguous neighbors stream through cache
Not sure what to ask? Tap a question — the staff engineer answers in the chat panel.