Build a Bitcask-style KV store (9 scenes)
Scene 06 · Crash recovery — scan or hint
On startup the keydir is gone (RAM is gone). Hint files (a merge byproduct) drop cold-start time ~10x; per-record CRCs make a torn tail self-truncating.
Previously
Merge produced a hint file as a byproduct. That byproduct is the difference between a 30-minute restart and a 3-minute one — and the per-record CRC is what makes the surviving log parseable in either case.
Scene 06
Crash recovery — scan or hint
Diagram
Left: data files (.hint siblings drop scan cost ~100×). Right: keydir rebuilds row-by-row. Bottom: time-to-ready meter with no-hints baseline comparator.
Bitcask recovery — scanning data files. 0 B of 5.40 GB scanned. Elapsed 0 ms.
The process was killed mid-write. Watch the startup scanner sweep each file: 001 has no hint sibling so it crawls record-by-record, 002 has a .hint file from merge so it flies through metadata only, and 003 (the active file) hits a CRC mismatch on its torn tail and truncates to the last valid offset. The keydir rebuilds on the right, with later writes overwriting earlier ones for the same key. Every record carries a per-record CRC32 — that's why the surviving log is always parseable even after a hard crash.
Implementation
Bitcask.rebuild_keydir
the open() path that runs before the first GET can be served
1def rebuild_keydir(data_dir):2 keydir = {} # empty — RAM is gone3 files = sorted(data_dir.list('*.data'))4 for f in files:5 hint = data_dir.path(f.id + '.hint')6 if hint.exists():7 entries = read_hints(hint) # metadata only8 else:9 entries = scan_data_file(f) # full crawl10 for e in entries:11 # later tstamp wins on overwrite12 if e.key not in keydir or e.tstamp > keydir[e.key].tstamp:13 keydir[e.key] = (f.id, e.vsz, e.vpos, e.tstamp)14 return keydir
Bitcask.scan_data_file
full crawl with torn-tail truncation via per-record CRC
1def scan_data_file(f):2 pos = 03 while pos < f.size:4 header = f.pread(pos, HEADER_SIZE)5 crc, tstamp, ksz, vsz = unpack(header)6 kv = f.pread(pos + HEADER_SIZE, ksz + vsz)7 if crc32(header[4:] + kv) != crc:8 f.truncate(pos) # drop torn tail9 return # stop, log ends here10 key = kv[:ksz]11 vpos = pos + HEADER_SIZE + ksz12 yield Entry(key, tstamp, vsz, vpos)13 pos += HEADER_SIZE + ksz + vsz
OnDisk.record_formats
data records carry values; hint records omit them
1# Data record — written by every put() to the active file:2data_record = struct(3 crc: u32, # checksum over (tstamp..value)4 tstamp: u64,5 ksz: u32,6 vsz: u32,7 key: bytes[ksz],8 value: bytes[vsz], # the value bytes themselves9)1011# Hint record — written by merge alongside each merged data file:12hint_record = struct(13 tstamp: u64,14 ksz: u32,15 vsz: u32,16 vpos: u64, # where the value lives in the .data file17 key: bytes[ksz],18 # no value bytes — that is the entire point19)