Build a Bitcask-style KV store (9 scenes)
Scene 05 · Merge — GC without stopping writes
Merge scans only immutable files, writes a deduplicated copy, then atomically retargets keydir pointers. The active file is never touched.
Previously

RAM held one entry per live key. Disk held every record ever written. Merge is what makes the disk remember its dead — without ever asking the writer to slow down.

Scene 05
Merge — GC without stopping writes
Diagram
BEFORE (top) and AFTER (bottom) share one keydir. Pointers retarget atomically at the swap tick; the active file is identical in both panels.
BEFORE — immutable + active001.data001.dataimmutableuser:42 = alice (superseded by later write)use… = alicart:9 = [3,7] (superseded by later write)car… = [3,user:42 = alice2 (superseded by later write)use… = aliuser:42 = alice3 (live)use… = alisess:7 = S1 (superseded by later write)ses… = S1002.data002.dataimmutableold:1 = TOMB (delete marker)old… = cart:9 = [3,7,9] (superseded by later write)car… = [3,cart:9 = [3,7,9,11] (live)car… = [3,sess:7 = S2 (superseded by later write)ses… = S2003.data003.dataimmutableuser:42 = alice-old (superseded by later write)use… = alistale = TOMB (delete marker)sta… = sess:7 = S3-old (superseded by later write)ses… = S3-ghost = g (superseded by later write)gho… = gsess:7 = S4 (live)ses… = S4004.data (active)004.data(active)active · acceptingwriterevt:101 = click (live)evt… = clievt:102 = scroll (live)evt… = screvt:103 = purchase (live)evt… = purlivedeadtombdead bytes 60% of total — merge becomes eligible above the trigger thresholdDEAD BYTES60%merge: idleAFTER — merged + hint + activemerged-001.datamerged-001.datamerge outputmerge output004.data (active)004.data(active)active · acceptingwriterevt:101 = click (live)evt… = clievt:102 = scroll (live)evt… = screvt:103 = purchase (live)evt… = purSHARED KEYDIRpre-swapkeybefore → afteruser:42: 001@240 → merged-001@0 (pending)user:42001@240merged-001@0cart:9: 002@96 → merged-001@56 (pending)cart:9002@96merged-001@56sess:7: 003@312 → merged-001@128 (pending)sess:7003@312merged-001@128dead bytes 60% — merge triggers at 60%MERGE TRIGGERdead 60%· trigger 60%above trigger — merge eligiblebelow trigger — merge waits0 live writes accepted during merge (window: always)LIVE WRITES DURING MERGEwindow: always0active file keeps accepting writes · never blockedDISK USAGEbefore 2.7 MB → after 2.7 MB (0% reclaimed)before2.7 MBafter2.7 MBreclaimed: 0 Bmerge idle — dead bytes accumulating
Three immutable files at ~60% dead bytes — overwrites and tombstones marked dead in red. Below them, the active file is still accepting appends (the 'live writes during merge' counter ticks throughout). AFTER will hold a single merged data file with its hint sibling — only the latest record per key. Watch merge walk idle → scanning → emitting → swap → done; every keydir row flips at the swap tick, never half.
Implementation
Bitcask.should_merge
the trigger — operator-tunable, OR of two conditions
1def should_merge(stats, cfg, clock):
2 if not within_window(clock, cfg.merge_window):
3 return False
4 frag = stats.dead_bytes / stats.total_bytes
5 if frag >= cfg.frag_merge_trigger:
6 return True # default 60%
7 if stats.dead_bytes >= cfg.dead_bytes_merge_trigger:
8 return True # default 512 MB
9 return False
Bitcask.merge
scan immutables, emit deduped copy + hint, swap, delete
1def merge(files, keydir, active_file):
2 # scope: immutable files only — active file excluded
3 inputs = [f for f in files if f is not active_file]
4 out = open_new_data_file()
5 hint = open_new_hint_file()
6 new_pointers = {}
7 for f in inputs:
8 for rec in f.scan():
9 if live_record(rec, keydir):
10 pos = out.append(rec)
11 hint.append(rec.key, pos, rec.vsz)
12 new_pointers[rec.key] = (out.id, pos, rec.vsz)
13 fsync(out); fsync(hint)
14 atomic_keydir_swap(keydir, new_pointers)
15 for f in inputs: delete(f)
Bitcask.live_record
a record is live iff the keydir still points at exactly it
1def live_record(rec, keydir):
2 entry = keydir.get(rec.key)
3 if entry is None:
4 return False # tombstoned or never existed
5 # newer write would have moved the pointer elsewhere
6 return (
7 entry.file_id == rec.file_id
8 and entry.value_pos == rec.value_pos
9 )
Bitcask.atomic_keydir_swap
every key flips at one tick — never half
1def atomic_keydir_swap(keydir, new_pointers):
2 with keydir.write_lock: # held only for the swap
3 for key, ptr in new_pointers.items():
4 current = keydir.get(key)
5 # a concurrent put may have moved the key into the
6 # active file while we were emitting — keep that.
7 if current is None:
8 continue
9 if current.file_id == new_pointers_origin(key):
10 keydir[key] = ptr
11 # writers were never blocked from appending to active_file