Build a Prometheus-style time-series database (12 scenes)
Scene 06 · Head chunks, WAL, and flushing
Active chunk lives in RAM (the head); a write-ahead log on disk catches every sample so a crash mid-chunk loses nothing.
Previously

A chunk only compresses well *after* it is full — but writes don't wait. We need a place for the half-built chunk that's fast to append to AND survives a crash.

Scene 06
Head chunks, WAL, and the durable ack
Diagram
Three horizontal bands. TOP — RAM: a row of HEAD CHUNKS, one per active series, each shown as an open-bracket tile with a fill bar (samples-in / 120). MIDDLE — WAL on disk: an append-only log strip; every sample lands here too, in arrival order. BOTTOM — sealed mmap'd chunks: closed tiles with locks, immutable once flushed. An incoming sample forks at a junction: one arrow into the WAL tail, one into the matching head.
RAM — head chunksS1http_requests_totalmethod=GET,status=20030/120S2http_requests_totalmethod=GET,status=50060/120S3http_requests_totalmethod=POST,status=200119/120S4node_cpu_seconds_totalcpu=0,mode=user90/120WAL — append-only log on diskS1S4S2S1S2S4S3S1S2S3S4S2S1S3Sealed chunks — mmapped read-onlyseal-S1-aS1120 samples165 B · mmapseal-S2-aS2120 samples158 B · mmapseal-S4-aS4120 samples172 B · mmapRAM (head chunks, one per active series) · disk WAL (every sample, append-only) · disk sealed chunks (mmap, read-only).
A new sample arrives for series S3. Watch it fork: one copy appends to the WAL on disk, one fills S3's head chunk in RAM. When S3 hits 120/120 it seals — the chunk slides down to the mmap'd disk row and a fresh empty head takes its place.
Implementation
TSDB.appendSample
fork: WAL on disk first, then head chunk in RAM, then ack
1def appendSample(seriesId, ts, value):
2 entry = (seriesId, ts, value)
3 if wal_enabled:
4 walAppend(entry) # disk first
5 headAppend(seriesId, ts, value) # RAM second
6 return ok # ack: both have it
WAL.walAppend
sequential disk write; fsync per durability policy
1def walAppend(entry):
2 record = encode(entry)
3 wal_file.write(record) # append-only, sequential
4 if fsync_policy == 'always':
5 wal_file.fsync() # durable before ack
6 # else: OS flushes on its own schedule
Head.headAppend
into the in-RAM chunk; if cap hit, seal and open fresh
1def headAppend(seriesId, ts, value):
2 head = head_chunks[seriesId] # one per active series
3 head.samples.append((ts, value)) # sub-ms in-memory
4 if head.samples_in == head.capacity:
5 sealHeadChunk(seriesId) # write file + mmap RO
6
7def sealHeadChunk(seriesId):
8 head = head_chunks[seriesId]
9 path = write_compressed(head) # gorilla-encoded bytes
10 sealed_chunks.append(mmap_readonly(path))
11 head_chunks[seriesId] = new_head() # fresh, empty
TSDB.recoverFromCrash
replay WAL into fresh heads; sealed chunks reopened mmap
1def recoverFromCrash():
2 head_chunks = {} # RAM was wiped
3 for entry in wal.scan_sequentially():
4 headAppend(*entry) # rebuild fill levels
5 for path in chunks_dir.list():
6 sealed_chunks.append(mmap_readonly(path))
7 # without WAL: partial heads are gone forever