Build a Prometheus-style time-series database (12 scenes)
Scene 10 · Downsampling — a retention pyramid
Aggregate old chunks into 5-minute, then 1-hour buckets, dropping originals as you go. Recording rules materialize — they cost storage.
Previously
Cardinality bounds width. Retention bounds depth — and downsampling is how depth stays affordable.
Scene 10
Downsampling: a retention pyramid
Diagram
Three horizontal tiers stacked into a pyramid: top is raw 15-second points (last 7 days, dense dots), middle is 5-minute rollups (last 90 days, medium density), bottom is 1-hour rollups (last 2 years, sparsest). A query-range strip slides across the timeline; whichever tier fully covers the range glows and serves the query, with 'points decoded' shown on the side panel along with per-tier disk MB.
Watch a query range slide across three tiers. A 1-day range is served by the raw tier; a 60-day range falls through to the 5 m rollup; a 1-year range drops to the 1 h rollup. The engine always picks the coarsest tier that fully covers the range.
Implementation
RuleEngine.evaluateRecordingRule
every rule.interval, write the result back as a NEW series
1def evaluateRecordingRule(rule, window):2 # rule.expr e.g. rate(http_requests_total[5m])3 samples = promql.eval(rule.expr, window)4 for labels, value in samples:5 series = tsdb.getOrCreateSeries(6 name = rule.name, # job:http_requests:rate5m7 labels = labels,8 )9 series.headChunk.append(now(), value)10 postings.index(series.id, labels)11 # real bytes, real cardinality, real cost
QueryEngine.pickTier
find the coarsest tier that fully covers query.range
1def pickTier(query):2 # tiers ordered finest -> coarsest3 tiers = [raw, rate5m, rate1h]4 for tier in reversed(tiers): # try coarsest first5 if tier.retentionDays >= query.range.days:6 covers = tier7 # coarsest tier that still reaches back far enough8 return covers.lookupSeries(query.matchers)
Retention.dropExpiredChunks
per-tier retention — old chunks get unlinked
1def dropExpiredChunks(tier):2 cutoff = now() - tier.retentionDays * day3 for series in tier.allSeries():4 for chunk in series.chunks:5 if chunk.maxTime < cutoff:6 series.chunks.remove(chunk)7 disk.unlink(chunk.file)8 postings.maybeGC(series.id)9 # raw: 7d | rate5m: 90d | rate1h: 730d