Build a Bitcask-style KV store (9 scenes)
Scene 08 · Design your KV — and feel its limits
Capstone: every choice is a trade against the hash-in-RAM aesthetic. The absence of range/prefix iteration is the ceiling that bridges to LSM.
Previously
Every Bitcask choice is a trade against the same hash-in-RAM aesthetic. Sync, file size, merge cadence, sharding, key cardinality — they all push back on the same wall. One trade has no good answer at all.
Scene 08
Design your KV — and feel its limits
Diagram
Free-form canvas. Left: palette of Bitcask building blocks — data file (with max_file_size knob), keydir RAM budget, sync_strategy selector, merge_window selector, frag/dead-bytes triggers, hint-file toggle, vnode count. Center: drop zones for workload SLO cards — photo blobs, event tags, idempotency keys. Right: a verifier pane that names the earlier scene each placement satisfies or violates, and explicitly flags 'use LSM' when keydir RAM exceeds budget.
Sources
One workload card on the canvas: photo blobs, 10M × 2 MB, p99 < 5 ms reads, lose < 1 s on crash. Watch the verifier walk the four checks — RAM math, sync trilemma, merge cadence, hint files — and name the earlier scene behind each.
Implementation
verify_workload(workload, config)
top-level verifier — runs the checks in dependency order
1def verify_workload(workload, config):2 violations = []3 # bitcask-04: RAM ceiling — most decisive.4 violations += keydir_ram_check(workload, config)5 # bitcask-01: hash index has no order.6 violations += range_query_check(workload)7 # bitcask-06a: sync_strategy vs crash-loss SLO.8 violations += sync_strategy_check(workload, config)9 # bitcask-07: delete-then-recreate trap.10 violations += tombstone_check(workload, config)11 return violations
keydir_ram_check(workload, config)
the decisive verdict — bitcask-04 ceiling
1def keydir_ram_check(workload, config):2 per_key = 44.5 + workload.avg_key_len # bitcask-043 total = workload.key_count * per_key4 per_vnode = total / config.vnode_count5 if per_vnode <= RAM_BUDGET_PER_VNODE:6 return [] # fits7 if total / MAX_VNODES > RAM_BUDGET_PER_VNODE:8 return [reject('use LSM (LevelDB/RocksDB):'9 ' keydir blows budget at any'10 ' vnode count')]11 return [warn('add vnodes — per-vnode keydir'12 ' over 256 MB budget')]
range_query_check(workload)
no Bitcask config can satisfy ordered access — bitcask-01
1def range_query_check(workload):2 if not workload.requires_range_or_prefix:3 return []4 # keydir is a hash table; hash tables have no order.5 # list_keys + filter is O(N) — produces the answer,6 # not efficiently. No knob fixes this.7 return [reject('use LSM: ordered access requires'8 ' a sorted index (SSTables /'9 ' B-tree), not a hash')]
recommend_engine(workload, config)
closing function — when to switch storage engines
1def recommend_engine(workload, config):2 if workload.requires_range_or_prefix:3 return 'LSM' # bitcask-01 ceiling4 per_key = 44.5 + workload.avg_key_len5 needed = workload.key_count * per_key6 budget = RAM_BUDGET_PER_VNODE * MAX_VNODES7 if needed > budget:8 return 'LSM' # bitcask-04 ceiling9 if workload.deletes_and_recreates:10 config.delete_mode = 'keep' # bitcask-0711 return 'Bitcask' # honest fit