Build an LSM-tree storage engine (LevelDB / RocksDB style) (11 scenes)
Scene 06 · Compaction — the GC of sorted files
A background process merges N immutable SSTables into 1, drops shadowed/deleted records, atomically swaps files. Foreground writes never block.
Previously

Bloom bounded the cost of LOOKING at files. It didn't bound how many there are. Every flush adds another. Sooner or later, we have to remove files — not just skip them.

Scene 06
Compaction — the GC of sorted files
Diagram
Top: memtable (in-RAM sorted table, scene 2) + WAL (write-ahead log, scene 2) keep accepting writes — a 'writes during compaction' counter ticks the whole time. The SSTable shelf shows immutable sorted files (scene 3); a COMPACTOR badge (the new word's worker — a background process that picks several files and merges them) selects N input files (highlighted outline), runs a merge-sort cursor across them (a sorted-merge that emits each key once at its newest value), and emits one new SSTable. Old inputs fade and disappear. The 'atomic file swap' (when compaction's output replaces its inputs in one tick — no half-state where some readers see old and some see new) is the load-bearing safety property.
RAMmemtable (live writes)newKey = ack'd whil…DISKWAL — livePUT newKey=…SSTable shelfL0000007immutableL0000006immutableL0000005immutableL0000004immutableL0000003immutableOPERATIONPUT newKey→ memtable + WAL (uninterrupted by compaction)memtableWALFive SSTables, three of them all hold key k. Compactor wakes up.
Five SSTables on the shelf, three of them holding the same key. Watch the compactor select them, run a merge-sort, and emit a single new file. Foreground writes keep going; the writes-during-compaction counter never stops.
Implementation
lsm.compact
merge-sort N inputs → 1 output, drop shadowed/tombstones, atomic swap
1def compact(inputs: list[SSTable]) -> SSTable:
2 out = open_new_sstable(next_id())
3 cursors = [s.iter_sorted() for s in inputs]
4 while any(cursors):
5 (key, val, seqno) = min_record(cursors)
6 if newer_record_exists(key, val, seqno):
7 continue # shadowed — drop
8 if val.is_tombstone and at_bottom_level:
9 continue # tombstone GC (scene 9)
10 out.append(key, val)
11 out.fsync()
12 atomic_swap_files(remove=inputs, add=[out]) # one tick
13 return out
Foreground writes are unaffected
puts always land in the memtable + WAL
1def put(key, val):
2 # NOTE: no awareness of compaction whatsoever
3 wal.append(key, val); fsync_if_policy(wal)
4 memtable.insert_sorted(key, val)
5 return ok
6# inputs are immutable; outputs land via atomic_swap.
7# readers see EITHER the old file list OR the new — never a mix.