All scenes
Build a B-tree storage engine (SQLite-style)
11 scenes · ~77 min · build the primitive
Build your own B-tree storage engine (SQLite-style)
What actually happens when you run INSERT INTO users(...). One file of fixed-size pages, organized as B-trees, with a write-ahead log that turns commits into appends. Build it from a SQL writer's perspective and feel why every knob exists.
- 01Find the row — why a file scan won't doA SQL engine has to answer point, range, and indexed lookups against ONE file — the only family of structures that can do all three is an ordered tree.~7 min
- 02The file is a strip of pagesThe DB file is a strip of identical fixed-size pages — every read/write is one whole page because the disk and OS work in pages.~7 min
- 03Pages linked into a treePages are linked into a tree where leaves hold rows and interior pages hold keys + pointers — branching factor is hundreds, so 3 levels = ~64M rows.~7 min
- 04Searching — a cursor descends the treeA point query reads exactly ONE page per tree level — three I/Os to find one row in a tree of millions, even on a miss.~7 min
- 05Insert and split — the tree growsAn INSERT writes one cell when there's room; when full, the leaf splits and a separator promotes to the parent — cascades grow the tree.~7 min
- 05aDelete and VACUUM — the file doesn't shrinkDELETE doesn't shrink the file — pages get freeblocks but never merge; only VACUUM rebuilds the file (at 2x disk cost).~7 min
- 06Indexes — a second B-treeEach CREATE INDEX builds a second B-tree; reads do TWO descents, but every write must update every index.~7 min
- 07The page cache — RAM is 1000x faster than diskThe pager caches pages in RAM with LRU; you're fast iff the working set fits, slow the moment it doesn't.~7 min
- 08WAL — durability without rewriting the file every commitCommits append page-images to the WAL and fsync; the main DB file is left alone until checkpoint — fast, and readers don't block.~7 min
- 09Checkpoint — and the 20 GB WALCheckpoints copy frames back to the DB file and rewind the WAL — but a long-lived reader can pin the WAL open forever, blowing it up.~7 min
- 10Design canvas — size a SQLite deploymentWorkloads pick different knob combinations; each knob has a 'too low' and 'too high' failure mode you can name from earlier scenes.~7 min