RAM is paid per key, not per byte

Build a Bitcask-style KV store (9 scenes)

Scene 04 · RAM is paid per key, not per byte

Every live key consumes a fixed-size keydir entry (~44.5 B + key length). RAM scales with key count alone — fat values are free, tiny tags OOM.

Previously

Reads were cheap because the keydir was in RAM. The bill for that line is paid in keys, not bytes — and most workloads don't realise which side of that asymmetry they're on until they OOM.

Scene 04

RAM is paid per key, not per byte

Watch

Diagram

Two bars — RAM (keyCount × (44.5 + avgKeyLen)) vs Disk (keyCount × avgValueSize) — under a 256 MB budget line with a fits/tight/OOM verdict.

Sources

Default workload: 1M keys, 32 B keys, 4 KB values. Watch the RAM bar settle around 76 MB while the disk bar climbs to ~4 GB — same key count, two very different bars. Both fit under the 256 MB RAM budget line.

Implementation

KeydirEntry (per live key)

what's actually stored in RAM for each key

1struct KeydirEntry {
2    file_id:    u32   # which data file holds the value
3    value_sz:   u32   # bytes to pread
4    value_pos:  u64   # offset within file_id
5    tstamp:     u32   # for tombstone / merge ordering
6    # 4 + 4 + 8 + 4 = 20 B payload
7    # + ~24.5 B Erlang term + HashMap bookkeeping
8    # → 44.5 B static + len(key) per entry
9}

estimateKeydirRam

the one-line sizing rule for a Bitcask deployment

1def estimateKeydirRam(keyCount, avgKeyLen):
2    static_overhead = 44.5  # Riak capacity calculator figure
3    bytes_per_entry = static_overhead + avgKeyLen
4    # value_size DOES NOT appear — the keydir only stores
5    # a pointer to the value, never the value itself.
6    return keyCount * bytes_per_entry

verdict

the budget check operators run before deploying

1def verdict(keyCount, avgKeyLen, ramBudget):
2    needed = estimateKeydirRam(keyCount, avgKeyLen)
3    if needed > ramBudget:
4        return OOM   # use an LSM instead
5    if needed > 0.75 * ramBudget:
6        return TIGHT # one traffic spike from OOM
7    return FITS

PreviousOne hash lookup, one pread NextMerge — GC without stopping writes