Build a Prometheus-style time-series database (12 scenes)
Scene 11 · Design canvas: pick a workload, ship a config
Capstone: alerting, tracing, or business KPIs — the verifier turns scrape interval, label set, retention, and rules into projected RAM, disk, and a fits/refuses verdict.
Previously

We have all the pieces. Now you build.

Scene 12
Design canvas: pick a workload, ship a config
Diagram
Left: three workload cards — pick the one closest to your real problem. Center: the configuration panel — scrape interval, label set, chunk samples, head retention, recording rules, downsampling tiers. Right: a verifier that turns those choices into projected RAM, disk/day, query latency, and a red/green 'fits' verdict with explicit warnings.
WORKLOADCONFIGVERIFIERACTIVE1000-host fleet alerting1000 targets × 50 metrics × 15s scrape — bou…1000 targets · 50/tgt @ 15sRAM≈2.9 GB · disk≈6.0 GB/dCARDINALITY LOWPer-request distributed tracing50 services × 10 metrics × 1s scrape — but t…50 targets · 10/tgt @ 1sRAM≈78.1 GB · disk≈195.3 GB/dCARDINALITY HIGHBusiness KPIs · 1-week retenti…12 services × 200 metrics × 60s scrape — slo…12 targets · 200/tgt @ 60sRAM≈500 MB · disk≈200 MB/dCARDINALITY MEDscrape interval15schunk samples120head retention3hlabel setjobinstancemethodstatusrecording rules• job:http_request_duration_seconds:rate5m @30sdownsampling tiers• raw 15s → 15d• 5m rollup → 90dFITS BUDGETprojection within budgetRAM2.9 GB / 8.0 GBDISK6.0 GB/dayQUERY50 ms typicalWARNINGSCardinality ≈ 1000 hosts × 50 metrics × ~4…Chunks: 120 samples / 15s = 30 min per chu…Honest fit for fleet alerting — every knob traces to a scene.
Workload A is on the canvas: 1000-host fleet alerting, 50 metrics per target, 15s scrape. Default config loads — chunk samples 120 (scene 4), head retention 3h (scene 5), one recording rule (scene 9), two downsampling tiers (scene 9). The verifier turns it green: ~3 GB RAM, ~6 GB/day disk, ~50 ms typical query.
Implementation
Canvas.verify
the top-level fits-or-doesnt verdict the canvas renders
1def verify(workload, config):
2 bomb = detectCardinalityBomb(workload, config)
3 if bomb:
4 return refuse(bomb.reason) # wrong tool
5 ram = projectRam(workload, config)
6 disk = projectDisk(workload, config)
7 latency = projectQueryLatency(config, queryRange)
8 fits = ram.mb <= ram.budget and disk.ok
9 warnings = ram.warnings + disk.warnings
10 return Verdict(fits, ram, disk, latency, warnings)
verifier.projectRam
head-block RAM is dominated by series count, not sample rate
1def projectRam(workload, config):
2 series_count = product(
3 cardinality(label) for label in config.labelSet
4 ) * workload.metricsPerTarget
5 head_bytes = series_count * BYTES_PER_HEAD_CHUNK
6 index_bytes = series_count * BYTES_PER_POSTINGS_ENTRY
7 ram_mb = (head_bytes + index_bytes) / MB
8 if ram_mb > RAM_BUDGET_MB:
9 return overshoot(ram_mb, RAM_BUDGET_MB)
10 return ok(ram_mb)
verifier.detectCardinalityBomb
scan the proposed label set for unbounded identifiers
1UNBOUNDED = {
2 'user_id', 'request_id', 'trace_id',
3 'session_id', 'email', 'ip',
4}
5
6def detectCardinalityBomb(workload, config):
7 for label in config.labelSet:
8 if label in UNBOUNDED:
9 return Bomb(
10 reason=f'{label} is unbounded — wrong tool',
11 )
12 return None
verifier.projectQueryLatency
downsampling tiers turn long-range queries from O(M) into O(k)
1def projectQueryLatency(config, queryRange):
2 tier = pickTier(config.downsamplingTiers, queryRange)
3 points = queryRange.seconds / tier.resolutionSeconds
4 return points * DECODE_COST_PER_POINT_MS
5
6def pickTier(tiers, queryRange):
7 # coarsest tier whose retention covers the range
8 for t in sorted(tiers, by=resolution, desc=True):
9 if t.retentionDays * DAY >= queryRange.seconds:
10 return t
11 return tiers[0] # fall back to raw