Build a distributed logging stack (ELK / Loki) (12 scenes)
Scene 07 · Hot, warm, cold, frozen
Two orders of magnitude in cost between NVMe and Deep Archive force tiering. ELK has four ILM phases; Loki collapses to S3 from day one.
Previously
High-cardinality data has to live in the body, and the body is BIG and most of it is OLD. NVMe is too expensive to hold months of body — economics force tiering.
Scene 07
Hot, warm, cold, frozen
Diagram
A vertical four-rung ladder labelled HOT / WARM / COLD / FROZEN on the LEFT lane (ELK Index Lifecycle Management). Each rung shows hardware, latency, replica count, and per-GB-month cost. An ILM clock above the ladder ticks days; an index tile is born on HOT and slides DOWN rung-by-rung as it ages. A 'searchable snapshot' callout pins to COLD/FROZEN — the primitive that lets those rungs exist without re-indexing. To the right, a SECOND lane labelled LOKI shows the entire body collapsed onto a single S3-Standard rung from day 0. Cost-per-month meters at the bottom compare ELK total vs Loki total for the same volume × retention. INGEST-STALL badge fires on HOT when the policy is misconfigured (HOT-only with no demotion targets).
Day 0 — HOT (NVMe, ~$0.10/GB-mo)
Watch one index tile age. It's born today on HOT (NVMe, replicated, ms latency). The ILM clock advances 7 days — it slides to WARM (force-merged, fewer replicas). 30 days — it slides to COLD (a searchable snapshot in S3, fully mounted, ~50% disk savings, no replicas). 90 days — it slides to FROZEN (small NVMe cache, partially mounted from S3, up to 20× warm capacity). To the right, the Loki lane shows the same body parked on S3-Standard from day 0 — no movement, ever.
Implementation
ILM.tick
the daily clock — for each index, evaluate phase actions
1# runs once per day on the master node2def ilm_tick():3 for index in cluster.indices:4 age = now() - index.creation_date5 policy = index.ilm_policy6 if age >= policy.hot.max_age: # rollover7 rollover(index)8 if age >= policy.warm.min_age:9 demote(index, phase='warm')10 if age >= policy.cold.min_age:11 demote(index, phase='cold')12 if age >= policy.frozen.min_age:13 demote(index, phase='frozen')
ILM.demote
the phase transition — force_merge, snapshot, partial mount
1def demote(index, phase):2 if phase == 'warm':3 force_merge(index, max_num_segments=1)4 set_replicas(index, 0)5 allocation.require(data_warm)6 elif phase == 'cold':7 snap = searchable_snapshot(index, repo=s3)8 mount(snap, type='full_copy') # ~50% disk savings9 delete_local_index(index)10 elif phase == 'frozen':11 snap = searchable_snapshot(index, repo=s3)12 mount(snap, type='partial') # NVMe cache + S3
Loki.compact_index
Loki's 'one tier' reality — only the index gets compacted
1# chunks were written to S3 the moment they flushed —2# the body never moves. Only the index is compacted.3def compact_index():4 shards = list_index_shards(object_store)5 for day in shards.by_day():6 merged = merge_boltdb_shards(day) # → TSDB7 upload(merged, object_store)8 delete(day.original_shards)9 # chunks: untouched, still on S3-Standard from day 0