Build a Prometheus-style time-series database (12 scenes)
Scene 09 · Cardinality is the killer
Each unique label-set is one series with its own head chunk in RAM. Add an unbounded label like user_id and you OOM in minutes.
Previously

The index is fast — until you make it index a million tiny postings lists. Cardinality is where the architecture meets reality.

Scene 09
Cardinality is the killer
Diagram
Left: a grid of label-set combinations. Each cell is one time series the database has to hold a head chunk for. Right: a vertical RAM bar with a red dashed BUDGET line at 8 GB. As dangerous labels (host, user_id) get added, the cell count multiplies and the bar climbs; once the bar pierces the budget line, the head OOMs and the banner reads 'OOM-killed at t+4 min'.
LABEL-SET EXPLOSION — every unique combo is a seriesmethod:2status:2methodstatus200500GETPOST#1#2#3#4series:4HEAD RAMBUDGET (7.81 GB)Series4RAM0 MB / 7.81 GBBaseline: 2 methods × 2 statuses = 4 series. The head chunk is tiny. RAM bar is calm.
Start with two safe labels — method and status. Four series, head chunk barely visible on the RAM bar. Then add `path` with 7 bounded values: 28 series, still calm. Both labels are *enumerable*, and that is what keeps the curve flat.
Implementation
Series.id
the full label-set IS the series identity
1def series_id(metric_name, labels):
2 # canonical: name + sorted (k, v) pairs
3 parts = [metric_name]
4 for k in sorted(labels.keys()):
5 parts.append(k + '=' + labels[k])
6 # one distinct value of ANY label
7 # mints a brand-new series id
8 return hash('|'.join(parts))
Head.onSample
lookup-or-create per scraped sample — create is the OOM path
1def on_sample(metric_name, labels, ts, value):
2 sid = series_id(metric_name, labels)
3 series = head.index.get(sid)
4 if series is None:
5 # NEW series: head chunk + index entry +
6 # one postings insert per label.
7 series = HeadSeries(labels)
8 head.index[sid] = series
9 for k, v in labels.items():
10 postings[(k, v)].append(sid)
11 series.append(ts, value)
Head.estimateRamMb
cardinality is the PRODUCT of label cardinalities
1HEAD_CHUNK_BYTES = 12_288 # ~12 KB live chunk
2INDEX_ENTRY_BYTES = 4_096 # postings + label strings
3BUDGET_MB = 8_000 # head RAM budget
4
5def estimate_ram_mb(dimensions, churn):
6 series = 1
7 for d in dimensions:
8 series *= len(d.values) # MULTIPLY, not add
9 bytes_per = HEAD_CHUNK_BYTES + INDEX_ENTRY_BYTES
10 stale = 1.25 if churn else 1.0
11 mb = series * bytes_per * stale / (1024 * 1024)
12 assert mb < BUDGET_MB, 'OOM-killed'