Build a Prometheus-style time-series database (12 scenes)
Scene 10 · Downsampling — a retention pyramid
Aggregate old chunks into 5-minute, then 1-hour buckets, dropping originals as you go. Recording rules materialize — they cost storage.
Previously

Cardinality bounds width. Retention bounds depth — and downsampling is how depth stays affordable.

Scene 10
Downsampling: a retention pyramid
Diagram
Three horizontal tiers stacked into a pyramid: top is raw 15-second points (last 7 days, dense dots), middle is 5-minute rollups (last 90 days, medium density), bottom is 1-hour rollups (last 2 years, sparsest). A query-range strip slides across the timeline; whichever tier fully covers the range glows and serves the query, with 'points decoded' shown on the side panel along with per-tier disk MB.
RETENTION PYRAMID · raw → downsampled tiersRaw / 15 s scrape15 s · 7d · 40.3 k pts/seriesdense5-minute rollup5 m · 90d · 25.9 k pts/series+ job:http_requests:rate5mmedium1-hour rollup1 h · 2y · 17.5 k pts/series+ job:http_requests:rate1hsparsequery: 1d ago → nownow (0d)← 2y ago90d1.0yQUERY READOUTpoints decoded5.8 kactive tierRaw / 15 s scrapeSTORAGE PER TIER15 s55.0 MB5 m35.0 MB1 h24.0 MBtotal114.0 MBQuery: last 1 day → raw tier serves (15 s resolution, every scrape preserved).
Watch a query range slide across three tiers. A 1-day range is served by the raw tier; a 60-day range falls through to the 5 m rollup; a 1-year range drops to the 1 h rollup. The engine always picks the coarsest tier that fully covers the range.
Implementation
RuleEngine.evaluateRecordingRule
every rule.interval, write the result back as a NEW series
1def evaluateRecordingRule(rule, window):
2 # rule.expr e.g. rate(http_requests_total[5m])
3 samples = promql.eval(rule.expr, window)
4 for labels, value in samples:
5 series = tsdb.getOrCreateSeries(
6 name = rule.name, # job:http_requests:rate5m
7 labels = labels,
8 )
9 series.headChunk.append(now(), value)
10 postings.index(series.id, labels)
11 # real bytes, real cardinality, real cost
QueryEngine.pickTier
find the coarsest tier that fully covers query.range
1def pickTier(query):
2 # tiers ordered finest -> coarsest
3 tiers = [raw, rate5m, rate1h]
4 for tier in reversed(tiers): # try coarsest first
5 if tier.retentionDays >= query.range.days:
6 covers = tier
7 # coarsest tier that still reaches back far enough
8 return covers.lookupSeries(query.matchers)
Retention.dropExpiredChunks
per-tier retention — old chunks get unlinked
1def dropExpiredChunks(tier):
2 cutoff = now() - tier.retentionDays * day
3 for series in tier.allSeries():
4 for chunk in series.chunks:
5 if chunk.maxTime < cutoff:
6 series.chunks.remove(chunk)
7 disk.unlink(chunk.file)
8 postings.maybeGC(series.id)
9 # raw: 7d | rate5m: 90d | rate1h: 730d