Build a Prometheus-style time-series database (12 scenes)
Scene 09 · Cardinality is the killer
Each unique label-set is one series with its own head chunk in RAM. Add an unbounded label like user_id and you OOM in minutes.
Previously
The index is fast — until you make it index a million tiny postings lists. Cardinality is where the architecture meets reality.
Scene 09
Cardinality is the killer
Diagram
Left: a grid of label-set combinations. Each cell is one time series the database has to hold a head chunk for. Right: a vertical RAM bar with a red dashed BUDGET line at 8 GB. As dangerous labels (host, user_id) get added, the cell count multiplies and the bar climbs; once the bar pierces the budget line, the head OOMs and the banner reads 'OOM-killed at t+4 min'.
Start with two safe labels — method and status. Four series, head chunk barely visible on the RAM bar. Then add `path` with 7 bounded values: 28 series, still calm. Both labels are *enumerable*, and that is what keeps the curve flat.
Implementation
Series.id
the full label-set IS the series identity
1def series_id(metric_name, labels):2 # canonical: name + sorted (k, v) pairs3 parts = [metric_name]4 for k in sorted(labels.keys()):5 parts.append(k + '=' + labels[k])6 # one distinct value of ANY label7 # mints a brand-new series id8 return hash('|'.join(parts))
Head.onSample
lookup-or-create per scraped sample — create is the OOM path
1def on_sample(metric_name, labels, ts, value):2 sid = series_id(metric_name, labels)3 series = head.index.get(sid)4 if series is None:5 # NEW series: head chunk + index entry +6 # one postings insert per label.7 series = HeadSeries(labels)8 head.index[sid] = series9 for k, v in labels.items():10 postings[(k, v)].append(sid)11 series.append(ts, value)
Head.estimateRamMb
cardinality is the PRODUCT of label cardinalities
1HEAD_CHUNK_BYTES = 12_288 # ~12 KB live chunk2INDEX_ENTRY_BYTES = 4_096 # postings + label strings3BUDGET_MB = 8_000 # head RAM budget45def estimate_ram_mb(dimensions, churn):6 series = 17 for d in dimensions:8 series *= len(d.values) # MULTIPLY, not add9 bytes_per = HEAD_CHUNK_BYTES + INDEX_ENTRY_BYTES10 stale = 1.25 if churn else 1.011 mb = series * bytes_per * stale / (1024 * 1024)12 assert mb < BUDGET_MB, 'OOM-killed'