Build a Prometheus-style time-series database (12 scenes)
Scene 05 · A chunk: 120 points, packed
Bundle ~120 consecutive points into a single bit-packed blob. Gorilla: 16 B/point → 1.37 B/point — about 12× compression.
Previously

We have two encodings, each almost magical alone. A chunk is where they live together.

Scene 05
A chunk: 120 points, packed
Diagram
Top: a horizontal timeline of up to 120 sample dots for one series, with a dashed line at the 120-sample cap. Middle: the same data rendered as a packed bitstring — coloured cells per sample (timestamp bits in one shade, value bits in another), with a 16-byte header block on the left holding series-id, first-ts, first-value. Bottom: side-by-side bars compare naive 16 B/point storage against the compressed ~1.37 B/point packing, with the ratio called out between them.
CHUNKcpu_usage_seconds_totalseries#S#73a1 · {job=api,instance=10.0.0.7,mode=user}0/120 samples · cap 120mTIMELINE · 0 samples (cap 120 or 120m)0306090120cap = 120PACKED BITSTRINGHEADERsid S#73a1ts0 20:06:40v0 0.42full encoding · once per chunkfirst 1 samples · 144 bitstotal 144 bits packed · 18.0 BCOST · 0 samplesNaive (uncompressed)16 B/point × 01 BCompressed (ΔΔ + XOR)~0.00 B/point × 00 BRATIO0.00×smallerOne series, one chunk-to-be. The timeline is empty; the bitstring shows only the 16-byte header.
New term this scene: CHUNK — a fixed-size bundle of compressed points from one series. Watch 120 samples accumulate, the bitstring pack in lockstep, and the chunk close. Then the bars draw and the 12× ratio appears.
Implementation
Chunk
one series, one header, one bit-packed body
1class Chunk:
2 # --- header (16 B, written once) ---
3 series_id: uint64 # which series
4 first_ts: int64 # anchor for delta-of-delta
5 first_value: float64 # anchor for XOR
6 # --- body (bit-packed, grows append-only) ---
7 bits: BitBuffer # (ts_bits, val_bits) pairs
8 samples: int = 0 # cap at SAMPLES_CAP (120)
9 span_min: int = 0 # cap at TIME_CAP_MIN (120)
10 closed: bool = False # immutable once true
Chunk.append
encode dod + xor; close on whichever cap fires first
1def append(chunk, ts, value):
2 dod = (ts - chunk.last_ts) - chunk.last_delta
3 chunk.bits.write(encode_dod(dod)) # ~1 bit on cadence
4 xor = float_bits(value) ^ chunk.last_value_bits
5 chunk.bits.write(encode_xor(xor)) # 1 bit if unchanged
6 chunk.samples += 1
7 chunk.span_min = (ts - chunk.first_ts) // 60
8 if chunk.samples >= SAMPLES_CAP \
9 or chunk.span_min >= TIME_CAP_MIN:
10 close_chunk(chunk)
close_chunk
finalize, mark immutable, hand to mmap
1def close_chunk(chunk):
2 chunk.bits.flush_byte_aligned() # body ends on byte boundary
3 chunk.closed = True # no more appends, ever
4 # one chunk = the unit of compression / mmap / I/O
5 file = chunks_dir / f"{chunk.series_id}-{chunk.first_ts}"
6 file.write(chunk.header || chunk.bits)
7 mmap_readonly(file) # query path reads via mmap
8 # head opens a fresh chunk for this series and keeps scraping
9 return Chunk(series_id=chunk.series_id)