Build your own Prometheus-style time-series database

The simplest database that can absorb a 1M-points-per-second firehose and still answer `sum(rate(http_requests_total{status="500"}[5m]))` in milliseconds — built bit by bit, literally.

01
What a metric point actually is
A point is (metric, label_set, ts, value). The first two parts are the series identity — change one label and you've named a different stream.
~7 min
02
Why one row per point is wrong
Storing each point as a SQL row spends 60+ bytes of metadata to carry an 8-byte float — the labels JSON repeats on every row of the firehose.
~7 min
03
Delta-of-delta crushes timestamps
Scrapers run on a fixed cadence, so the second derivative of timestamps is almost always zero — encoding ~96% of timestamps in a single bit.
~7 min
04
XOR crushes adjacent floats
Adjacent float64s share most of their IEEE-754 bits. XOR them and the leftover bits are tiny — an unchanged value costs 1 bit.
~7 min
05
A chunk: 120 points, packed
Bundle ~120 consecutive points into a single bit-packed blob. Gorilla: 16 B/point → 1.37 B/point — about 12× compression.
~7 min
06
Head chunks, WAL, and flushing
Active chunk lives in RAM (the head); a write-ahead log on disk catches every sample so a crash mid-chunk loses nothing.
~7 min
07
How a query becomes points
A read is four stages — parse, resolve label-selectors to series IDs, decompress the matching chunks, then aggregate. Stage 3 dominates.
~7 min
08
Inverted index — labels to series
For each (label, value) pair the database stores a sorted list of series IDs (a postings list). A multi-label query is the intersection.
~7 min
09
Cardinality is the killer
Each unique label-set is one series with its own head chunk in RAM. Add an unbounded label like user_id and you OOM in minutes.
~7 min
10
Downsampling — a retention pyramid
Aggregate old chunks into 5-minute, then 1-hour buckets, dropping originals as you go. Recording rules materialize — they cost storage.
~7 min
10a
Single-node by design; HA is somebody else's problem
The TSDB itself isn't replicated. HA = two parallel scrapers; durability = ship every sample to a remote-write receiver that dedups.
~7 min
11
Design canvas: pick a workload, ship a config
Capstone: alerting, tracing, or business KPIs — the verifier turns scrape interval, label set, retention, and rules into projected RAM, disk, and a fits/refuses verdict.
~7 min