Build an LSM-tree storage engine (LevelDB / RocksDB style) (11 scenes)
Scene 04 · Newest wins — and the reads get slower
A get walks memtable → SSTables newest-first; first hit wins. Cost grows linearly with file count, especially for misses.
Previously

Disk now holds one or more sorted SSTables — but a single get has to know which one to read. The rule is simple, but its cost grows with every flush.

Scene 04
Newest wins — and the reads get slower
Diagram
RAM column on the left holds the memtable (the sorted in-RAM table from scene 2; often empty here). The SSTable shelf on the right (immutable sorted files from scene 3) is sorted newest-at-top — newer files sit above older ones. A READ CURSOR (an animated marker tracking where a get is currently looking) descends memtable → newest SSTable → next → … and stops on the first place that has the key. A 'files consulted' counter (how many places the cursor visited) and a latency readout sit beneath the cursor.
RAMmemtable (empty)emptyDISKSSTable shelf — newest at topL0L0/000007.sst[k … z]immutableL0L0/000006.sst[g … n]immutableL0L0/000005.sst[a … m]immutableOPERATIONidle — no operation in flightThree SSTables on the shelf. Watch a few GETs walk the read path.
Three SSTables on the shelf. Two GETs run: one whose key is in the newest file, one whose key is only in the oldest. Watch the cursor descend; watch the counter climb.
Implementation
lsm.get (newest-first walk)
memtable, then SSTables newest-to-oldest; first hit wins
1def get(key):
2 if key in memtable:
3 return memtable[key]
4 for sst in sstables_newest_first(): # sorted by recency
5 if key not in sst.key_range: continue
6 v = sst.read(key) # block index + 1 block
7 if v is FOUND:
8 return v # may be an absence marker
9 return not_found # missed every file
Why first-hit-wins is correct
newer writes ALWAYS sit above older — cursor stops fresh
1# invariant the flush + WAL pipeline maintains:
2# sst[i].max_seqno < sst[j].max_seqno for newer j
3# so:
4# if a key has a live record in sst[i] AND sst[j] (j newer),
5# the cursor sees sst[j]'s record first and returns IT.
6# older shadowed records are correct to ignore.
7# they are NOT correct to delete — see scene 9.