Sampling — head, tail, and the only error — Build a distributed logging stack (ELK / Loki)

Build a distributed logging stack (ELK / Loki) (12 scenes)

Scene 09 · Sampling — head, tail, and the only error

Head sampling decides at emit (cheap, blind); tail sampling decides at the collector (can keep all errors, costs buffer). A uniform 1% sample drops the only error you needed.

Previously

Bytes die on a schedule and the index has to forget. But even with perfect retention, at scale we cannot keep everything we emit. The honest question is which lines we sacrifice — and the wrong answer is 'the only error of the day'.

Scene 09

Sampling — head, tail, and the only error

Watch

Diagram

An incoming stream of log records sits at the top, colored by level — mostly blue INFO, a few amber WARN, and one rare red ERROR mixed in. The stream forks into TWO funnels. LEFT funnel is HEAD SAMPLING: a `p=0.01` filter at the agent that drops 99% of records uniformly — the rare red ERROR almost never survives. RIGHT funnel is TAIL SAMPLING: a collector buffer holds every record for `decision_wait` seconds, then a rule node fires (`keep if any record at level=ERROR`) — the rare ERROR survives, but the in-flight memory bar visibly grows and a consistent-routing badge appears. A capture-rate comparison meter at the bottom shows head% vs tail% on the same axis. To the right, a SEPARATE lane shows the rate-limit mechanism: per-tenant ingestion quota, with HTTP 429 indicators when ingest exceeds quota — this is NOT sampling, it is admission control.

incoming stream — one rare red ERROR mixed in

Watch the stream at the top. Most records are INFO; one rare red ERROR is mixed in around the middle. The stream forks LEFT into a HEAD SAMPLER (uniform p=0.01 at the agent) and RIGHT into a TAIL SAMPLER (collector buffers for 30s then keeps the trace if any record was an ERROR). The capture-rate meter at the bottom tells the story: head drops the rare red cell; tail keeps it. To the right, a separate lane shows the rate-limit mechanism — per-tenant quota with HTTP 429s — distinct from sampling, same diagram.

Implementation

Agent.head_sample

decision at emit time on the agent — no buffer, blind to outcomes

1# uniform: every record gets the same coin flip
2def head_sample_uniform(line, p=0.01):
3    if random() < p:
4        ship(line)
5    else:
6        drop(line)  # never reaches the backend
7 
8# level-aware: per-level p map (needs structured logs)
9P_BY_LEVEL = {info: 0.01, debug: 0.01,
10              warn: 1.0,  error: 1.0}
11def head_sample_level_aware(line):
12    p = P_BY_LEVEL[line.level]
13    if random() < p: ship(line) else: drop(line)

Collector.tail_sample

decision at the collector after buffering decision_wait

1buffer = {}  # trace_id -> list[span]  (per collector)
2 
3def on_span(span):
4    # consistent routing required: all spans of a trace
5    # MUST land at the same collector or the rule sees
6    # a partial trace.
7    buffer.setdefault(span.trace_id, []).append(span)
8 
9def on_idle(trace_id, decision_wait=30):
10    spans = buffer.pop(trace_id)
11    if any(s.level == ERROR for s in spans):
12        ship(spans)        # keep on error
13    else:                  # drop the whole trace
14        drop(spans)

Backend.rate_limit

per-tenant ingestion_rate_mb — admission control, not sampling

1# Loki: ingestion_rate_mb / ingestion_burst_size_mb
2# ELK:  bulk thread pool fills -> bulk-queue-rejected
3def on_push(tenant, batch):
4    used = tokens_used(tenant)  # bytes/sec window
5    if used + batch.bytes > ingestion_rate_mb:
6        return HTTP_429  # agent retries / spills to disk
7    accept(batch)
8    advance_tokens(tenant, batch.bytes)

PreviousRetention vs deletion — the index has to forget NextDistributors, ingesters, queriers (briefly)