Build an LSM-tree storage engine (LevelDB / RocksDB style) (11 scenes)
Scene 02 · Sort in RAM, log to disk
Every put fans out: a sorted in-RAM memtable AND an append-only WAL whose only job is to rebuild the memtable on a crash.
Previously
The third panel had a question mark in it. The first piece of the answer is to write to two structures at once: one in RAM that sorts cheap, one on disk that survives a crash.
Scene 02
Sort in RAM, log to disk
Diagram
Top-left: a tall RAM column with a MEMTABLE (the new word — a sorted table of recent writes, kept in RAM; rows are in key order). Top-right: a disk column with the WAL (write-ahead log — an append-only file on disk where every write lands before we ack it) on top, records in arrival order — NOT sorted — and an empty SSTable shelf below (an SSTable is a sorted file of key-value records on disk; the shelf is empty here because the next scene is where memtable contents become SSTables). Bottom: a put/get lane showing the most recent op and its 2-of-2 fan-out arrows — one to the memtable, one to the WAL.
Two new structures get introduced: the MEMTABLE (an in-RAM sorted table) and the WAL (write-ahead log — an append-only file on disk). Three PUTs come in: m=1, then a=2, then z=3. Watch where each one lands — memtable rows arrive in SORTED positions; WAL records arrive in PUT-arrival order.
Implementation
lsm.put
fan out: sorted insert into memtable, sequential append to WAL
1def put(key, value):2 record = encode(tstamp, key, value)3 wal.append(record) # sequential write — cheap4 fsync_if_policy(wal) # durability knob5 memtable.insert_sorted(key, value) # in-RAM sort — cheap6 return ok7 # ack means: WAL has the record AND memtable shows it
lsm.recover (on restart)
the WAL's only job: rebuild the memtable
1def recover():2 memtable = empty_sorted_structure()3 for record in wal.scan_sequentially():4 memtable.insert_sorted(record.key, record.value)5 return memtable6 # WAL goes in; fresh memtable comes out, same final shape