Build an LSM-tree storage engine (LevelDB / RocksDB style) (11 scenes)
Scene 05 · Definitely-not-here, in one bitmap
Each SSTable carries a bloom filter that says 'definitely no' or 'maybe' — eliminating disk reads on misses, not on hits.
Previously

Walking every file on every miss is the cliff. The way out is a tiny per-file structure that says 'definitely not here' before we touch disk — and is small enough to keep in RAM for every file.

Scene 05
Definitely-not-here, in one bitmap
Diagram
Each SSTable (immutable sorted file from scene 3) gets a small bitmap badge — that bitmap is the BLOOM FILTER (the new term — a tiny RAM data structure that, given any key, can definitely-rule-out membership but cannot definitely-confirm it; a 'one-sided test'). A query key on the left fans probe arrows to every file's bloom. Verdict colors: GRAY = definitely no (file is skipped, no disk read); YELLOW = 'maybe' / false-positive (file consulted, returns not_found); GREEN = 'maybe' / true hit (file consulted, returns the value). A 'bits per key' readout (how much RAM the bloom uses per key it indexes) drives bitmap density; a live false-positive-rate readout updates with bits/key. 'Disk reads this query' counter at the bottom.
QUERYworkload: mixedGET probes every SSTable's bloom →BLOOM TUNINGBITS / KEY10FALSE-POSITIVE RATE1.0%rule of thumb: 10 b/k ≈ 1%BLOOM000007[ah]idleBLOOM000006[ip]idleBLOOM000005[qz]idleBLOOM000004[az]idleBLOOM000003[by]idleDISK READS THIS QUERY0/ 5 files5 skipped by bloomFive SSTables; each carries a tiny bitmap. Idle.
Five SSTables, each topped with a bloom bitmap. Watch a missing-key GET (most files reply 'definitely no') and a present-key GET (one file replies 'hit'). Disk reads counter shows how many files we ACTUALLY had to read.
Implementation
Bloom-aware get
skip files whose bloom says 'definitely no'
1def get(key):
2 if key in memtable: return memtable[key]
3 for sst in sstables_newest_first():
4 if not sst.bloom.maybe_contains(key):
5 continue # skip — no disk read
6 v = sst.read(key) # 'maybe' → block read
7 if v is FOUND: return v # may be tombstone
8 return not_found
Bits-per-key rule of thumb
linear RAM, exponential FP-rate decay
1# bloom filter: per-file bitmap of m bits set by k hash functions
2# false-positive rate ≈ (1 - e^(-kn/m))^k
3# practical:
4# 1.5 bits/key → ~50% FP (almost useless)
5# 10 bits/key → ~1% FP (canonical default)
6# 15.5 bits/key → ~0.1% FP (diminishing returns)
7# RAM grows linearly with bits/key. Pick 10 unless …