Block, drop, or spill

Build a distributed logging stack (ELK / Loki) (12 scenes)

Scene 03 · Block, drop, or spill

When the backend stalls, the agent must block, drop, or spill to disk — and `when_full=block` plus a synchronous logger is how a logging outage takes the application down with it.

Previously

The agent ships at-least-once over a pipe — but ships them WHERE? The backend at the far end can stall, and when it does the agent has exactly three choices, only one of which keeps the application alive.

Scene 03

Block, drop, or spill

Watch

Diagram

We zoom into the agent's batch compartment from scene 2. The buffer is now a vertical pipe with capacity ticks (3200 events, Filebeat default). Above it sits the app process with a synchronous write() that turns amber when blocked; a thread-pool meter on the app reads healthy/stressed/starved. To the right, a disk-spill overlay (queue.disk / storage.type=filesystem / WAL) lights up under SPILL. Below the buffer, a 3-position policy switch (BLOCK / DROP / SPILL) and a lines-lost counter that only ticks under DROP. Top-right: a backend-health lamp — green/red only, no internals.

Backend healthy. Lines arrive at the buffer, fill climbs, a batch ships, fill drains — and the cycle repeats. The app's thread-pool meter is calm; the disk-spill overlay on the right is dim, unused. Get a feel for the steady state before we break the backend.

Implementation

App.write_log

synchronous logger calls into the agent's enqueue path

1def write_log(line):
2    # request-handler thread is the caller
3    if sync_logger:
4        agent.enqueue(line)   # blocks if queue full
5        return
6    # async logger: hand to in-process queue, return now
7    inproc_queue.put_nowait(line)

Agent.enqueue

the when_full policy switch — block, drop, or spill

1def enqueue(line):
2    if len(queue.mem) < queue.mem.capacity:
3        queue.mem.append(line)        # 3200 events default
4        return
5    # buffer is full — backend is not draining fast enough
6    if when_full == 'block':
7        wait_until_room()             # caller's thread parks
8    elif when_full == 'drop':
9        metrics.lines_lost += 1       # silently discarded
10    elif when_full == 'spill':
11        spill_to_disk(line)           # queue.disk / WAL

Agent.spill_to_disk

the disk-backed overflow that keeps the app alive

1def spill_to_disk(line):
2    # queue.disk = 10 GB on Filebeat, storage.type=filesystem
3    # on Fluent Bit, WAL on Promtail, disk buffer on Vector
4    disk_buffer.append(line)
5 
6def replay_loop():           # runs when backend recovers
7    while disk_buffer and backend.healthy():
8        batch = disk_buffer.read_batch()
9        backend.send(batch)  # at-least-once
10        disk_buffer.advance(batch)

PreviousAn agent on every host NextString vs map — fields and labels