Build an LSM-tree storage engine (LevelDB / RocksDB style) (11 scenes)
Scene 03 · Flush — pay the IOU once
Frozen memtable streams sequentially to disk as a single immutable sorted file (an SSTable); the WAL is then deleted.
Previously
The memtable can't grow forever, and the WAL it dragged along grows with it. Once the memtable crosses a threshold, we have to do something with what we've collected.
Scene 03
Flush — pay the IOU once
Diagram
Top-left: the memtable (in-RAM sorted table from scene 2). A dashed line marks write_buffer_size — the size at which the memtable is full and triggers a flush. Top-right: the WAL (write-ahead log — the disk file that backs up the memtable; struck through during flush) and the SSTABLE shelf below — the new word: an immutable, sorted file of records on disk. During flush, an arrow streams the frozen memtable's rows in sort order to a new file labelled 000005.sst, with a small BLOCK INDEX callout (first key of each block → byte offset; the index is what makes the file binary-searchable).
New term this scene: SSTABLE (Sorted String Table) — an immutable, key-sorted file on disk. Watch the memtable fill, freeze, stream to disk in sort order as one new SSTable, and the WAL get deleted. Four beats.
Implementation
lsm.flush
freeze, sequential sorted write, fsync, delete WAL
1def flush(memtable, wal):2 memtable.freeze() # no more writes accepted3 sst = open_new_sstable(next_id())4 for (k, v) in memtable.iter_sorted():5 sst.append_block(k, v) # sequential write6 sst.write_block_index() # first key of each block7 sst.fsync() # the SSTable is durable8 wal.delete() # IOU paid; WAL no longer needed9 return sst # immutable forever
SSTable on disk
sorted blocks + a tiny index — binary-searchable
1# SSTable layout (logical):2 block_0: [key, value]+ # smallest keys3 block_1: [key, value]+4 ...5 block_n: [key, value]+ # largest keys6 block_index: [(first_key_in_block, offset)]7 footer: [block_index_offset, ...]8# read: load index → binary-search → read 1 block