Build a distributed logging stack (ELK / Loki) (12 scenes)
Scene 08 · Retention vs deletion — the index has to forget
Retention is when the system stops promising you can read; deletion is when bytes are physically gone — and the gap is where compliance bugs live.
Previously

We have a cost ladder for where bytes live as they age. Aging through the ladder is the ILM policy; but the policy decides when bytes MOVE, not when bytes DIE — and 'die' has its own lifecycle.

Scene 08
Retention vs deletion — the index has to forget
Diagram
TWO horizontal timelines stacked with a deliberate vertical gap. TOP timeline = data records, each tagged with its ingest day; a retention cut-off line marks where the system stops promising reads, but expired records keep sitting on disk until the compactor cycles past them. BOTTOM timeline = index entries pointing at the records, with their OWN aging clock. The gap between the two timelines is the load-bearing detail. Overlays surface the failure modes: a `logs-legacy-*` template that inherits 7y retention (PII template leak, GDPR audit alarm), and a delete-by-query that writes TOMBSTONES into segments — segments larger than 5 GB don't auto-merge, so the tombstoned bytes stay on disk until a manual force-merge.
BASELINE · 30-day retentionDATA RECORDS · on-disk segmentscompactor last ran day 26 · lag 4dday 26r26EXPIREDday 27r27EXPIREDday 28r28EXPIREDday 29r29EXPIREDday 30r30day 31r31day 32r32day 33r33day 34r34cutoff · day 30INDEX ENTRIES · pointers into recordsages independently of the records they point atage 8dr26age 7dr27age 6dr28age 5dr29age 4dr30age 3dr31age 2dr32age 1dr33age 0dr34COMPACTORlast cycle · day 26 · lag 4d (expired bytes still on disk)Baseline retention: cluster default 30 days. The compactor cycles daily and trims expired records; the index shrinks in lockstep. Retention says "we don't promise reads past day 30"; deletion (bytes physically gone) only happens after the compactor runs.
Retention cut-off (day 30) — system stops promising reads
Expired but still on disk — compactor hasn't reached here yet
Watch the two timelines. The TOP row is the data on disk — each cell is a day of records. A retention cut-off line sits at day 30: anything to its left is retention-expired but still physically present. The BOTTOM row is the index — it has its own aging clock. The compactor ticks daily; only AFTER it runs do bytes actually leave disk and only AFTER it runs does the index forget. In steady state both shrink in lockstep — but they are NOT the same lifecycle.
Implementation
Retention.expire
daily ILM pass — flips records to expired (still on disk)
1# runs daily as part of the ILM delete action
2def expire(now):
3 for index in cluster.indices:
4 # per-index policy; templates can override cluster default
5 window = index.template.retention or cluster.default_retention
6 for record in index.records:
7 age = now - record.ingest_day
8 if age > window:
9 record.retention_expired = True # promise dropped
10 # NOTE: bytes still on disk until Compactor.run()
Compactor.run
periodic — rewrite/segment-merge that actually frees bytes
1# runs daily; this is when bytes physically LEAVE disk
2def run():
3 for segment in index.segments:
4 if segment.size_gb > max_merged_segment: # 5 GB default
5 continue # SKIP — too big to auto-merge
6 kept = [r for r in segment.records
7 if not r.retention_expired
8 and not r.tombstoned]
9 new_segment = rewrite(kept) # bytes finally gone
10 replace(segment, new_segment)
11 compactor.last_run_day = today()
DeleteByQuery.execute
writes tombstones; bytes leave only on merge (or force_merge)
1def execute(query):
2 # segments are immutable — no in-place removal
3 matches = index.search(query)
4 for doc in matches:
5 segment = doc.segment
6 segment.tombstones.add(doc.id) # marker, not eviction
7 # index now returns 0 hits for query (the promise)
8 # but bytes stay until segment.merge() rewrites it
9 # auto-merge SKIPS segments > max_merged_segment (5 GB)
10 return {deleted: len(matches), bytes_freed: 0}
11
12def force_merge(index): # operator-invoked; IO-heavy, blocking
13 for segment in index.segments:
14 if segment.size_gb > max_merged_segment:
15 continue # default: still skips >5 GB
16 rewrite_without(segment, segment.tombstones)