All scenes
Build a Bitcask-style KV store
9 scenes · ~63 min · build the primitive
Build your own Bitcask-style KV store
The simplest possible KV store that still works: an append-only log on disk + an in-memory hash index. Build it from first principles and feel which trade-offs every later store inherits.
- 01Log on disk, hash in RAMA Bitcask is exactly two structures: an append-only log of records on disk + a single in-memory hash (the keydir). Every other property follows from this split.~7 min
- 02Append, fsync, update, ackEvery put is four ordered steps on one open writer. Concurrent writers are rejected at open(), not at write time.~7 min
- 03One hash lookup, one preadget is a constant-time keydir lookup + at most one pread; bounded by one disk seek, zero when the page cache is hot.~7 min
- 04RAM is paid per key, not per byteEvery live key consumes a fixed-size keydir entry (~44.5 B + key length). RAM scales with key count alone — fat values are free, tiny tags OOM.~7 min
- 05Merge — GC without stopping writesMerge scans only immutable files, writes a deduplicated copy, then atomically retargets keydir pointers. The active file is never touched.~7 min
- 06Crash recovery — scan or hintOn startup the keydir is gone (RAM is gone). Hint files (a merge byproduct) drop cold-start time ~10x; per-record CRCs make a torn tail self-truncating.~7 min
- 06afsync — pick two of safe, fast, simplesync_strategy is a three-way trade (o_sync / interval / none). fsync controls RECENCY; CRC controls VALIDITY — two separate guarantees.~7 min
- 07Tombstones — deletes that come backIf merge GCs a tombstone before every older segment containing the key has been merged away, restart resurrects the deleted key. The riak_kv #925 fix.~7 min
- 08Design your KV — and feel its limitsCapstone: every choice is a trade against the hash-in-RAM aesthetic. The absence of range/prefix iteration is the ceiling that bridges to LSM.~7 min