Design your KV — and feel its limits — Build a Bitcask-style KV store

Build a Bitcask-style KV store (9 scenes)

Scene 08 · Design your KV — and feel its limits

Capstone: every choice is a trade against the hash-in-RAM aesthetic. The absence of range/prefix iteration is the ceiling that bridges to LSM.

Previously

Every Bitcask choice is a trade against the same hash-in-RAM aesthetic. Sync, file size, merge cadence, sharding, key cardinality — they all push back on the same wall. One trade has no good answer at all.

Scene 08

Design your KV — and feel its limits

Watch

Diagram

Free-form canvas. Left: palette of Bitcask building blocks — data file (with max_file_size knob), keydir RAM budget, sync_strategy selector, merge_window selector, frag/dead-bytes triggers, hint-file toggle, vnode count. Center: drop zones for workload SLO cards — photo blobs, event tags, idempotency keys. Right: a verifier pane that names the earlier scene each placement satisfies or violates, and explicitly flags 'use LSM' when keydir RAM exceeds budget.

Sources

One workload card on the canvas: photo blobs, 10M × 2 MB, p99 < 5 ms reads, lose < 1 s on crash. Watch the verifier walk the four checks — RAM math, sync trilemma, merge cadence, hint files — and name the earlier scene behind each.

Implementation

verify_workload(workload, config)

top-level verifier — runs the checks in dependency order

1def verify_workload(workload, config):
2    violations = []
3    # bitcask-04: RAM ceiling — most decisive.
4    violations += keydir_ram_check(workload, config)
5    # bitcask-01: hash index has no order.
6    violations += range_query_check(workload)
7    # bitcask-06a: sync_strategy vs crash-loss SLO.
8    violations += sync_strategy_check(workload, config)
9    # bitcask-07: delete-then-recreate trap.
10    violations += tombstone_check(workload, config)
11    return violations

keydir_ram_check(workload, config)

the decisive verdict — bitcask-04 ceiling

1def keydir_ram_check(workload, config):
2    per_key = 44.5 + workload.avg_key_len   # bitcask-04
3    total = workload.key_count * per_key
4    per_vnode = total / config.vnode_count
5    if per_vnode <= RAM_BUDGET_PER_VNODE:
6        return []                           # fits
7    if total / MAX_VNODES > RAM_BUDGET_PER_VNODE:
8        return [reject('use LSM (LevelDB/RocksDB):'
9                       ' keydir blows budget at any'
10                       ' vnode count')]
11    return [warn('add vnodes — per-vnode keydir'
12                 ' over 256 MB budget')]

range_query_check(workload)

no Bitcask config can satisfy ordered access — bitcask-01

1def range_query_check(workload):
2    if not workload.requires_range_or_prefix:
3        return []
4    # keydir is a hash table; hash tables have no order.
5    # list_keys + filter is O(N) — produces the answer,
6    # not efficiently. No knob fixes this.
7    return [reject('use LSM: ordered access requires'
8                   ' a sorted index (SSTables /'
9                   ' B-tree), not a hash')]

recommend_engine(workload, config)

closing function — when to switch storage engines

1def recommend_engine(workload, config):
2    if workload.requires_range_or_prefix:
3        return 'LSM'           # bitcask-01 ceiling
4    per_key = 44.5 + workload.avg_key_len
5    needed = workload.key_count * per_key
6    budget = RAM_BUDGET_PER_VNODE * MAX_VNODES
7    if needed > budget:
8        return 'LSM'           # bitcask-04 ceiling
9    if workload.deletes_and_recreates:
10        config.delete_mode = 'keep'  # bitcask-07
11    return 'Bitcask'           # honest fit

PreviousTombstones — deletes that come back