Build a Prometheus-style time-series database (12 scenes)
Scene 05 · A chunk: 120 points, packed
Bundle ~120 consecutive points into a single bit-packed blob. Gorilla: 16 B/point → 1.37 B/point — about 12× compression.
Previously
We have two encodings, each almost magical alone. A chunk is where they live together.
Scene 05
A chunk: 120 points, packed
Diagram
Top: a horizontal timeline of up to 120 sample dots for one series, with a dashed line at the 120-sample cap. Middle: the same data rendered as a packed bitstring — coloured cells per sample (timestamp bits in one shade, value bits in another), with a 16-byte header block on the left holding series-id, first-ts, first-value. Bottom: side-by-side bars compare naive 16 B/point storage against the compressed ~1.37 B/point packing, with the ratio called out between them.
New term this scene: CHUNK — a fixed-size bundle of compressed points from one series. Watch 120 samples accumulate, the bitstring pack in lockstep, and the chunk close. Then the bars draw and the 12× ratio appears.
Implementation
Chunk
one series, one header, one bit-packed body
1class Chunk:2 # --- header (16 B, written once) ---3 series_id: uint64 # which series4 first_ts: int64 # anchor for delta-of-delta5 first_value: float64 # anchor for XOR6 # --- body (bit-packed, grows append-only) ---7 bits: BitBuffer # (ts_bits, val_bits) pairs8 samples: int = 0 # cap at SAMPLES_CAP (120)9 span_min: int = 0 # cap at TIME_CAP_MIN (120)10 closed: bool = False # immutable once true
Chunk.append
encode dod + xor; close on whichever cap fires first
1def append(chunk, ts, value):2 dod = (ts - chunk.last_ts) - chunk.last_delta3 chunk.bits.write(encode_dod(dod)) # ~1 bit on cadence4 xor = float_bits(value) ^ chunk.last_value_bits5 chunk.bits.write(encode_xor(xor)) # 1 bit if unchanged6 chunk.samples += 17 chunk.span_min = (ts - chunk.first_ts) // 608 if chunk.samples >= SAMPLES_CAP \9 or chunk.span_min >= TIME_CAP_MIN:10 close_chunk(chunk)
close_chunk
finalize, mark immutable, hand to mmap
1def close_chunk(chunk):2 chunk.bits.flush_byte_aligned() # body ends on byte boundary3 chunk.closed = True # no more appends, ever4 # one chunk = the unit of compression / mmap / I/O5 file = chunks_dir / f"{chunk.series_id}-{chunk.first_ts}"6 file.write(chunk.header || chunk.bits)7 mmap_readonly(file) # query path reads via mmap8 # head opens a fresh chunk for this series and keeps scraping9 return Chunk(series_id=chunk.series_id)