Build a Bitcask-style KV store (9 scenes)
Scene 04 · RAM is paid per key, not per byte
Every live key consumes a fixed-size keydir entry (~44.5 B + key length). RAM scales with key count alone — fat values are free, tiny tags OOM.
Previously
Reads were cheap because the keydir was in RAM. The bill for that line is paid in keys, not bytes — and most workloads don't realise which side of that asymmetry they're on until they OOM.
Scene 04
RAM is paid per key, not per byte
Diagram
Two bars — RAM (keyCount × (44.5 + avgKeyLen)) vs Disk (keyCount × avgValueSize) — under a 256 MB budget line with a fits/tight/OOM verdict.
Default workload: 1M keys, 32 B keys, 4 KB values. Watch the RAM bar settle around 76 MB while the disk bar climbs to ~4 GB — same key count, two very different bars. Both fit under the 256 MB RAM budget line.
Implementation
KeydirEntry (per live key)
what's actually stored in RAM for each key
1struct KeydirEntry {2 file_id: u32 # which data file holds the value3 value_sz: u32 # bytes to pread4 value_pos: u64 # offset within file_id5 tstamp: u32 # for tombstone / merge ordering6 # 4 + 4 + 8 + 4 = 20 B payload7 # + ~24.5 B Erlang term + HashMap bookkeeping8 # → 44.5 B static + len(key) per entry9}
estimateKeydirRam
the one-line sizing rule for a Bitcask deployment
1def estimateKeydirRam(keyCount, avgKeyLen):2 static_overhead = 44.5 # Riak capacity calculator figure3 bytes_per_entry = static_overhead + avgKeyLen4 # value_size DOES NOT appear — the keydir only stores5 # a pointer to the value, never the value itself.6 return keyCount * bytes_per_entry
verdict
the budget check operators run before deploying
1def verdict(keyCount, avgKeyLen, ramBudget):2 needed = estimateKeydirRam(keyCount, avgKeyLen)3 if needed > ramBudget:4 return OOM # use an LSM instead5 if needed > 0.75 * ramBudget:6 return TIGHT # one traffic spike from OOM7 return FITS