Build a distributed logging stack (ELK / Loki) (12 scenes)
Scene 02 · An agent on every host
A shipping agent tails each log file from a saved offset, batches lines, and POSTs them at-least-once — duplicates on retry are normal, not a bug.
Previously
Centralising means putting a network between the app and its log file — and that something on each host that has to keep the pipe full has a name. It is the **shipping agent**: Filebeat, Promtail, Fluent Bit, or Vector. They look different on the outside; under the skin they are the same three compartments.
Scene 02
An agent on every host
Diagram
Left zone, blown up to one host: an app process writing into /var/log/app.log, with the file's tail-cursor visible. Below it, a shipping-agent box with three labelled compartments — (1) positions.yaml showing the on-disk byte offset, (2) the in-memory batch buffer (capacity 4 line cells), (3) a retry slot. A dashed HTTP arrow flies from the agent across to the right zone — a generic black-box receiver. An ack-clock on the arrow turns green on success and red when an ack is dropped; positions.yaml only advances after the agent sees an ack.
The app writes lines into /var/log/app.log. The agent reads each new line into its in-memory batch. When the batch hits 4 lines, it POSTs the batch over HTTP. Only AFTER the backend acks does positions.yaml advance — that on-disk offset is what the agent re-reads from on restart.
Implementation
Agent.tail_loop
reads new lines from app.log into the in-RAM batch
1def tail_loop():2 f = open('/var/log/app.log')3 f.seek(self.offset) # offset from positions.yaml4 while running:5 line = f.readline()6 if not line:7 sleep(poll_interval); continue8 batch.append(line)9 if len(batch) >= batch_capacity:10 send_queue.put(batch)11 batch = []
Agent.send_loop
POSTs batches; advances positions only after a 2xx ack
1def send_loop():2 while running:3 batch = retry_slot or send_queue.get()4 resp = http.post(backend_url, batch)5 if 200 <= resp.status < 300:6 self.offset += sum(len(l) for l in batch)7 persist(positions_yaml, self.offset)8 retry_slot = None9 else: # network drop, 5xx, or no ack10 retry_slot = batch # resend after backoff11 sleep(backoff_with_jitter())
Agent.on_restart
the only durable state is positions.yaml — RAM is gone
1def on_restart():2 # batch and retry_slot lived in RAM — both gone3 self.offset = read(positions_yaml) or 04 # Promtail gotcha: if positions advanced when the line5 # was READ rather than when the backend ACKED, every6 # in-flight line between offset and EOF is silently lost.7 spawn(tail_loop) # re-opens app.log, seeks to offset8 spawn(send_loop) # starts with empty queue + slot