Build a Bitcask-style KV store (9 scenes)
Scene 04 · RAM is paid per key, not per byte
Every live key consumes a fixed-size keydir entry (~44.5 B + key length). RAM scales with key count alone — fat values are free, tiny tags OOM.
Previously

Reads were cheap because the keydir was in RAM. The bill for that line is paid in keys, not bytes — and most workloads don't realise which side of that asymmetry they're on until they OOM.

Scene 04
RAM is paid per key, not per byte
Diagram
Two bars — RAM (keyCount × (44.5 + avgKeyLen)) vs Disk (keyCount × avgValueSize) — under a 256 MB budget line with a fits/tight/OOM verdict.
WORKLOAD SIZING — RAM vs DISK1.0 KB1.0 MB1.00 GB1.00 TBRAM (keydir): 0 B RAM = N × (44.5 + keylen)RAM (keydir)RAM = N × (44.5 + keylen)0 B0 BDISK (live records): 0 BDISK (live records)0 B0 Bbudget 256.0 MBFITSRAM 0 B vs budget 256.0 MBWORKLOAD KNOBSkey count01k1Bavg key length32 B8 B256 Bavg value size4.0 KB16 B1 MBkeydir overhead — 44.5 B/key (Riak capacity calculator)WORKLOAD PINS(driven by sliders — not clickable)session cachesession cache1M × 4 KB — Bitcask's sweet spot1M × 4 KB — Bitcask's sweet spotkeys: 1.00Mkeylen: 32 Bvalue: 4.0 KBACTIVEfat blobsfat blobs1M × 1 MB — RAM tiny, disk huge1M × 1 MB — RAM tiny, disk hugekeys: 1.00Mkeylen: 32 Bvalue: 1.0 MBACTIVEtiny tagstiny tags1B × 100 B — RAM blows the budget1B × 100 B — RAM blows the budgetkeys: 1.00Bkeylen: 16 Bvalue: 100 BACTIVEStreaming writes — RAM grows with keys, disk grows with key×value. Both fit.
Default workload: 1M keys, 32 B keys, 4 KB values. Watch the RAM bar settle around 76 MB while the disk bar climbs to ~4 GB — same key count, two very different bars. Both fit under the 256 MB RAM budget line.
Implementation
KeydirEntry (per live key)
what's actually stored in RAM for each key
1struct KeydirEntry {
2 file_id: u32 # which data file holds the value
3 value_sz: u32 # bytes to pread
4 value_pos: u64 # offset within file_id
5 tstamp: u32 # for tombstone / merge ordering
6 # 4 + 4 + 8 + 4 = 20 B payload
7 # + ~24.5 B Erlang term + HashMap bookkeeping
8 # → 44.5 B static + len(key) per entry
9}
estimateKeydirRam
the one-line sizing rule for a Bitcask deployment
1def estimateKeydirRam(keyCount, avgKeyLen):
2 static_overhead = 44.5 # Riak capacity calculator figure
3 bytes_per_entry = static_overhead + avgKeyLen
4 # value_size DOES NOT appear — the keydir only stores
5 # a pointer to the value, never the value itself.
6 return keyCount * bytes_per_entry
verdict
the budget check operators run before deploying
1def verdict(keyCount, avgKeyLen, ramBudget):
2 needed = estimateKeydirRam(keyCount, avgKeyLen)
3 if needed > ramBudget:
4 return OOM # use an LSM instead
5 if needed > 0.75 * ramBudget:
6 return TIGHT # one traffic spike from OOM
7 return FITS