Build a distributed logging stack (ELK / Loki) (12 scenes)
Scene 07 · Hot, warm, cold, frozen
Two orders of magnitude in cost between NVMe and Deep Archive force tiering. ELK has four ILM phases; Loki collapses to S3 from day one.
Previously

High-cardinality data has to live in the body, and the body is BIG and most of it is OLD. NVMe is too expensive to hold months of body — economics force tiering.

Scene 07
Hot, warm, cold, frozen
Diagram
A vertical four-rung ladder labelled HOT / WARM / COLD / FROZEN on the LEFT lane (ELK Index Lifecycle Management). Each rung shows hardware, latency, replica count, and per-GB-month cost. An ILM clock above the ladder ticks days; an index tile is born on HOT and slides DOWN rung-by-rung as it ages. A 'searchable snapshot' callout pins to COLD/FROZEN — the primitive that lets those rungs exist without re-indexing. To the right, a SECOND lane labelled LOKI shows the entire body collapsed onto a single S3-Standard rung from day 0. Cost-per-month meters at the bottom compare ELK total vs Loki total for the same volume × retention. INGEST-STALL badge fires on HOT when the policy is misconfigured (HOT-only with no demotion targets).
Retention economics — ELK ILM ladder vs Loki single S3 tierILM clock at day 0 of retentionELK · Index Lifecycle Managementd0HOT$0.100/GB-mop99 < 50 ms×2 replicaslogs…WARM$0.040/GB-motens of ms×1 replicaCOLD$0.013/GB-mohundreds of ms×0 replicasS3 backedFROZEN$0.0040/GB-moseconds – 10s…×0 replicaspartial mou…Loki · S3 from day 1S3 (object storage)$0.023/GB-mofirst byte < 1s · que…9000 GB totalCOST PER MONTHELK total$237HOTWARMCOLDFROZENLoki total$207Loki ≈ 0.87× ELK · ELK ≈ 1.1× LokiDefault ILM. Day 0. Tile sits on the HOT rung — ILM demotes on age, never deletes (deletion is scene 8). Two …
Day 0 — HOT (NVMe, ~$0.10/GB-mo)
Watch one index tile age. It's born today on HOT (NVMe, replicated, ms latency). The ILM clock advances 7 days — it slides to WARM (force-merged, fewer replicas). 30 days — it slides to COLD (a searchable snapshot in S3, fully mounted, ~50% disk savings, no replicas). 90 days — it slides to FROZEN (small NVMe cache, partially mounted from S3, up to 20× warm capacity). To the right, the Loki lane shows the same body parked on S3-Standard from day 0 — no movement, ever.
Implementation
ILM.tick
the daily clock — for each index, evaluate phase actions
1# runs once per day on the master node
2def ilm_tick():
3 for index in cluster.indices:
4 age = now() - index.creation_date
5 policy = index.ilm_policy
6 if age >= policy.hot.max_age: # rollover
7 rollover(index)
8 if age >= policy.warm.min_age:
9 demote(index, phase='warm')
10 if age >= policy.cold.min_age:
11 demote(index, phase='cold')
12 if age >= policy.frozen.min_age:
13 demote(index, phase='frozen')
ILM.demote
the phase transition — force_merge, snapshot, partial mount
1def demote(index, phase):
2 if phase == 'warm':
3 force_merge(index, max_num_segments=1)
4 set_replicas(index, 0)
5 allocation.require(data_warm)
6 elif phase == 'cold':
7 snap = searchable_snapshot(index, repo=s3)
8 mount(snap, type='full_copy') # ~50% disk savings
9 delete_local_index(index)
10 elif phase == 'frozen':
11 snap = searchable_snapshot(index, repo=s3)
12 mount(snap, type='partial') # NVMe cache + S3
Loki.compact_index
Loki's 'one tier' reality — only the index gets compacted
1# chunks were written to S3 the moment they flushed —
2# the body never moves. Only the index is compacted.
3def compact_index():
4 shards = list_index_shards(object_store)
5 for day in shards.by_day():
6 merged = merge_boltdb_shards(day) # → TSDB
7 upload(merged, object_store)
8 delete(day.original_shards)
9 # chunks: untouched, still on S3-Standard from day 0