Build a B-tree storage engine (SQLite-style) (11 scenes)
Scene 09 · Checkpoint — and the 20 GB WAL
Checkpoints copy frames back to the DB file and rewind the WAL — but a long-lived reader can pin the WAL open forever, blowing it up.
Previously
Every commit appends to the WAL and the DB file 'catches up later.' Checkpointing is what 'later' means — and it has one famous failure mode.
Scene 09
Checkpoint — and the 20 GB WAL
Diagram
Top: **users.db** as the page strip — pages light up as the checkpoint copies WAL frames home. Middle: **users.db-wal**, drawn as a row of frames; a **checkpoint pointer** walks across it copying each frame to its matching DB page. A **long reader** badge can sit on a frame partway down the WAL — checkpoints can still **flush** past it, but cannot **rewind** the WAL past that pin. Bottom: a **WAL size** readout and trajectory; **STARVED** lights up when the file grows without bound.
Auto-checkpoint fires when the WAL hits its threshold. Watch the checkpoint pointer walk every frame, copy it into the matching DB page (the strip lights up), and — because no reader is pinned — the WAL truncates to zero.
Implementation
Checkpoint.passive
default mode — copy what you can, never block writers
1def checkpoint_passive(wal, db, readers):2 floor = min((r.snapshot_end for r in readers), default=None)3 for frame in wal.frames:4 db.write_page(frame.page_number, frame.bytes)5 if floor is None:6 wal.rewind_to_start() # next commit overwrites from f17 # else: cannot rewind past floor; WAL keeps growing8 return
Checkpoint.truncate
FULL-equivalent flush, then physically zero the WAL file
1def checkpoint_truncate(wal, db, readers):2 wait_for_writers_to_finish() # FULL semantics3 for frame in wal.frames: # transfer EVERY frame4 db.write_page(frame.page_number, frame.bytes)5 if any(r.snapshot_end is not None for r in readers):6 return # pinned reader → cannot rewind, file stays full7 wal.rewind_to_start()8 wal.truncate_file_to_zero_bytes() # physical shrink
Pager.onCommit
what each commit does — and what fires the checkpoint
1def on_commit(wal, shm, threshold = 1000):2 wal.append_commit_frame() # page image + commit marker3 if synchronous in (FULL, EXTRA):4 wal.fsync() # durability point5 shm.publish_new_end_mark(len(wal.frames))6 if len(wal.frames) >= threshold:7 # auto-checkpoint runs PASSIVE on this connection8 checkpoint_passive(wal, db, readers)