Build a distributed logging stack (ELK / Loki) (12 scenes)
Scene 02 · An agent on every host
A shipping agent tails each log file from a saved offset, batches lines, and POSTs them at-least-once — duplicates on retry are normal, not a bug.
Previously

Centralising means putting a network between the app and its log file — and that something on each host that has to keep the pipe full has a name. It is the **shipping agent**: Filebeat, Promtail, Fluent Bit, or Vector. They look different on the outside; under the skin they are the same three compartments.

Scene 02
An agent on every host
Diagram
Left zone, blown up to one host: an app process writing into /var/log/app.log, with the file's tail-cursor visible. Below it, a shipping-agent box with three labelled compartments — (1) positions.yaml showing the on-disk byte offset, (2) the in-memory batch buffer (capacity 4 line cells), (3) a retry slot. A dashed HTTP arrow flies from the agent across to the right zone — a generic black-box receiver. An ack-clock on the arrow turns green on success and red when an ack is dropped; positions.yaml only advances after the agent sees an ack.
HAPPY PATH · acks flowingHOST/var/log/app.logoffset=02025-05-08T12:00 app: ...2025-05-08T12:01 app: ...2025-05-08T12:02 app: ...2025-05-08T12:03 app: ...← tail cursorshipping agent1 · POSITIONS.YAML/var/log/app.logoffset: 02 · BATCH BUFFER0/43 · RETRY SLOT(empty)BACKENDreceiver:3100received 0 linesPOST /loki/api/v1/pushidleHappy path. The agent tails app.log, batches lines, POSTs, waits for the ack, then advances positions.yaml — …
The app writes lines into /var/log/app.log. The agent reads each new line into its in-memory batch. When the batch hits 4 lines, it POSTs the batch over HTTP. Only AFTER the backend acks does positions.yaml advance — that on-disk offset is what the agent re-reads from on restart.
Implementation
Agent.tail_loop
reads new lines from app.log into the in-RAM batch
1def tail_loop():
2 f = open('/var/log/app.log')
3 f.seek(self.offset) # offset from positions.yaml
4 while running:
5 line = f.readline()
6 if not line:
7 sleep(poll_interval); continue
8 batch.append(line)
9 if len(batch) >= batch_capacity:
10 send_queue.put(batch)
11 batch = []
Agent.send_loop
POSTs batches; advances positions only after a 2xx ack
1def send_loop():
2 while running:
3 batch = retry_slot or send_queue.get()
4 resp = http.post(backend_url, batch)
5 if 200 <= resp.status < 300:
6 self.offset += sum(len(l) for l in batch)
7 persist(positions_yaml, self.offset)
8 retry_slot = None
9 else: # network drop, 5xx, or no ack
10 retry_slot = batch # resend after backoff
11 sleep(backoff_with_jitter())
Agent.on_restart
the only durable state is positions.yaml — RAM is gone
1def on_restart():
2 # batch and retry_slot lived in RAM — both gone
3 self.offset = read(positions_yaml) or 0
4 # Promtail gotcha: if positions advanced when the line
5 # was READ rather than when the backend ACKED, every
6 # in-flight line between offset and EOF is silently lost.
7 spawn(tail_loop) # re-opens app.log, seeks to offset
8 spawn(send_loop) # starts with empty queue + slot