Build a distributed logging stack (ELK / Loki) (12 scenes)
Scene 11 · Design your logging stack
Capstone: agent + buffer policy + structuring + index strategy + tiers + retention + sampling — the verifier traces every choice back to the scene that earned it.
Previously

Every choice from scenes 02 through 10 is now a knob. Pick a workload and ship a configuration the verifier won't reject.

Scene 11
Design your logging stack
Diagram
LEFT: a 9-knob palette (agent dlog-02, buffer policy dlog-03, structuring dlog-04, index strategy dlog-05, labels under cardinality budget dlog-06, tier policy dlog-07, retention + PII audit dlog-08, sampling dlog-09, backend topology dlog-10). CENTER: the active workload card — startup-A, fintech-B, or iot-C — with volume, SLO, retention, and compliance posture. RIGHT: a verifier pane that turns each placement into a green/amber/red verdict and cites the scene id behind it.
PALETTEWORKLOADVERIFIERSHIPPING AGENTdlog-02Promtail (with disk spill)BUFFER POLICYdlog-03spillSTRUCTURINGdlog-04JSON (slog) at the appINDEX STRATEGYdlog-05Loki labels-onlyLABEL / FIELDSdlog-06app, env, levelTIER POLICYdlog-07Loki single-tier (S3 from day 0)RETENTIONdlog-0890 days, per-tenantSAMPLINGdlog-09noneBACKEND TOPOLOGYdlog-10RF=3 · 3 ingestersAsmall startupWORKLOAD CARD
50 hosts · 10 GB/day · query latency forgiving · 90-day retention
50 hosts10 GB/dayp99 lax90-daySLIDER MODEABC
VERDICTSOK 9WARN 0FAIL 0AGENTdlog-02Promtail tails files, ships at-least-once. D…BUFFER POLICYdlog-03spill chosen — backend hiccups stream to dis…STRUCTURINGdlog-04JSON at the app pushes parse cost to write-t…INDEX STRATEGYdlog-05Loki labels-only index — 50 hosts × 10 GB/da…LABELS (CARDINALITY)dlog-06{app, env, level} → ~6 streams. Far below th…TIER POLICYdlog-07Loki has effectively one tier (S3); 10 GB/da…RETENTIONdlog-0890 days per-tenant — well above the 30d defa…SAMPLINGdlog-09None — at 10 GB/day there's nothing to sacri…BACKEND TOPOLOGYdlog-10RF=3 across 3 ingesters: a single ingester l…Workload A — small startup, 50 hosts, 10 GB/day. Loki-only stack accepted; every knob defended by a scene.
Workload A is on the canvas — 50 hosts, 10 GB/day, query latency forgiving, 90-day retention. The default stack loads — Promtail with disk spill, JSON at the app, Loki labels-only, low-cardinality labels, S3 from day 0, no sampling, RF=3. Watch the verifier walk each knob and cite the scene that defends it.
Implementation
Verifier.check_workload
rule fires when a knob contradicts the workload — cites a scene
1def check_workload(workload, stack):
2 issues = []
3 # dlog-05a: query SLO picks the index strategy.
4 if workload.required_query_p99_ms < 1000:
5 if stack.index_strategy != 'elk-inverted':
6 issues.append(reject('dlog-05a'))
7 # dlog-06: high-card identifiers stay out of labels.
8 for label in stack.labels:
9 if label in UNBOUNDED_KEYS: # trace_id, user_id…
10 issues.append(reject('dlog-06'))
11 # dlog-08: long retention demands per-index template + PII audit.
12 if workload.required_retention_y >= 1 and not stack.pii_audit:
13 issues.append(warn('dlog-08'))
14 # dlog-03: never block the app on the agent buffer.
15 if stack.buffer_policy != 'spill':
16 issues.append(warn('dlog-03'))
17 return issues
Stack.serialize_config
build the deployable config from the 9 palette knobs
1def serialize_config(stack):
2 return {
3 'agent': stack.agent, # dlog-02
4 'buffer_policy': stack.buffer_policy, # dlog-03
5 'structuring': stack.structuring, # dlog-04
6 'index_strategy': stack.index_strategy, # dlog-05
7 'labels': stack.labels, # dlog-06
8 'tier_policy': stack.tier_policy, # dlog-07
9 'retention': stack.retention, # dlog-08
10 'pii_audit': stack.pii_audit, # dlog-08
11 'sampling': stack.sampling, # dlog-09
12 'replication_factor': stack.rf, # dlog-10
13 }
Workload.profile
derive the verifier inputs from the workload card
1def profile(card):
2 return Workload(
3 volume_per_day=card.gb_per_day,
4 required_query_p99_ms=(
5 forgiving if card.id == 'startup-A'
6 else 500 if card.id == 'fintech-B'
7 else rare # iot-C: queries are rare
8 ),
9 required_retention_y=(
10 0.25 if card.id == 'startup-A'
11 else 7 if card.id == 'fintech-B'
12 else 0.08 # iot-C: ~30 days
13 ),
14 pii_required=(card.id == 'fintech-B'), # GDPR
15 must_include_trace_id=(card.id == 'iot-C'),
16 )