Design canvas: pick a workload, ship a config

Build a Prometheus-style time-series database (12 scenes)

Scene 11 · Design canvas: pick a workload, ship a config

Capstone: alerting, tracing, or business KPIs — the verifier turns scrape interval, label set, retention, and rules into projected RAM, disk, and a fits/refuses verdict.

Previously

We have all the pieces. Now you build.

Scene 12

Design canvas: pick a workload, ship a config

Watch

Diagram

Left: three workload cards — pick the one closest to your real problem. Center: the configuration panel — scrape interval, label set, chunk samples, head retention, recording rules, downsampling tiers. Right: a verifier that turns those choices into projected RAM, disk/day, query latency, and a red/green 'fits' verdict with explicit warnings.

Workload A is on the canvas: 1000-host fleet alerting, 50 metrics per target, 15s scrape. Default config loads — chunk samples 120 (scene 4), head retention 3h (scene 5), one recording rule (scene 9), two downsampling tiers (scene 9). The verifier turns it green: ~3 GB RAM, ~6 GB/day disk, ~50 ms typical query.

Implementation

Canvas.verify

the top-level fits-or-doesnt verdict the canvas renders

1def verify(workload, config):
2    bomb = detectCardinalityBomb(workload, config)
3    if bomb:
4        return refuse(bomb.reason)  # wrong tool
5    ram = projectRam(workload, config)
6    disk = projectDisk(workload, config)
7    latency = projectQueryLatency(config, queryRange)
8    fits = ram.mb <= ram.budget and disk.ok
9    warnings = ram.warnings + disk.warnings
10    return Verdict(fits, ram, disk, latency, warnings)

verifier.projectRam

head-block RAM is dominated by series count, not sample rate

1def projectRam(workload, config):
2    series_count = product(
3        cardinality(label) for label in config.labelSet
4    ) * workload.metricsPerTarget
5    head_bytes = series_count * BYTES_PER_HEAD_CHUNK
6    index_bytes = series_count * BYTES_PER_POSTINGS_ENTRY
7    ram_mb = (head_bytes + index_bytes) / MB
8    if ram_mb > RAM_BUDGET_MB:
9        return overshoot(ram_mb, RAM_BUDGET_MB)
10    return ok(ram_mb)

verifier.detectCardinalityBomb

scan the proposed label set for unbounded identifiers

1UNBOUNDED = {
2    'user_id', 'request_id', 'trace_id',
3    'session_id', 'email', 'ip',
4}
5 
6def detectCardinalityBomb(workload, config):
7    for label in config.labelSet:
8        if label in UNBOUNDED:
9            return Bomb(
10                reason=f'{label} is unbounded — wrong tool',
11            )
12    return None

verifier.projectQueryLatency

downsampling tiers turn long-range queries from O(M) into O(k)

1def projectQueryLatency(config, queryRange):
2    tier = pickTier(config.downsamplingTiers, queryRange)
3    points = queryRange.seconds / tier.resolutionSeconds
4    return points * DECODE_COST_PER_POINT_MS
5 
6def pickTier(tiers, queryRange):
7    # coarsest tier whose retention covers the range
8    for t in sorted(tiers, by=resolution, desc=True):
9        if t.retentionDays * DAY >= queryRange.seconds:
10            return t
11    return tiers[0]  # fall back to raw

PreviousSingle-node by design; HA is somebody else's problem