Build a Bitcask-style KV store
9 scenes · ~63 min · build the primitive

Build your own Bitcask-style KV store

The simplest possible KV store that still works: an append-only log on disk + an in-memory hash index. Build it from first principles and feel which trade-offs every later store inherits.

  1. 01
  2. 02
  3. 03
  4. 04
  5. 05
  6. 06
  7. 06a
  8. 07
  9. 08
  1. 01
    Log on disk, hash in RAM
    A Bitcask is exactly two structures: an append-only log of records on disk + a single in-memory hash (the keydir). Every other property follows from this split.
    ~7 min
  2. 02
    Append, fsync, update, ack
    Every put is four ordered steps on one open writer. Concurrent writers are rejected at open(), not at write time.
    ~7 min
  3. 03
    One hash lookup, one pread
    get is a constant-time keydir lookup + at most one pread; bounded by one disk seek, zero when the page cache is hot.
    ~7 min
  4. 04
    RAM is paid per key, not per byte
    Every live key consumes a fixed-size keydir entry (~44.5 B + key length). RAM scales with key count alone — fat values are free, tiny tags OOM.
    ~7 min
  5. 05
    Merge — GC without stopping writes
    Merge scans only immutable files, writes a deduplicated copy, then atomically retargets keydir pointers. The active file is never touched.
    ~7 min
  6. 06
    Crash recovery — scan or hint
    On startup the keydir is gone (RAM is gone). Hint files (a merge byproduct) drop cold-start time ~10x; per-record CRCs make a torn tail self-truncating.
    ~7 min
  7. 06a
    fsync — pick two of safe, fast, simple
    sync_strategy is a three-way trade (o_sync / interval / none). fsync controls RECENCY; CRC controls VALIDITY — two separate guarantees.
    ~7 min
  8. 07
    Tombstones — deletes that come back
    If merge GCs a tombstone before every older segment containing the key has been merged away, restart resurrects the deleted key. The riak_kv #925 fix.
    ~7 min
  9. 08
    Design your KV — and feel its limits
    Capstone: every choice is a trade against the hash-in-RAM aesthetic. The absence of range/prefix iteration is the ceiling that bridges to LSM.
    ~7 min