Build an S3-style distributed object store (12 scenes)
Scene 02 · Disks fail weekly — so what does durable mean?
Millions of disks at a few % AFR: one dies every few minutes. Eleven nines — and why copies alone are too costly.
Previously
We cracked the bucket open and asked what 'forever' has to survive; the answer starts at the hardware — and the hardware is millions of disks that are dying continuously.
Scene 02
Disks fail weekly — so what does durable mean?
Diagram
A grid of disk tiles standing in for the whole fleet (green = alive, red = dead). A wall clock ticks and disks flip red on a cadence set by the fleet size and the annual failure rate. One outlined tile is a tracked object that lives on a single disk with no backup; if that disk dies the object is gone. The header readout names the target — 'eleven nines' means about one object lost per 10,000 years per 10 million stored objects, essentially never — and frames it as a question of whether your bytes still EXIST on some disk (durable), which is a separate promise from whether you can REACH them this instant (available).
Eleven nines = the designed-for target: ~1 object lost per 10,000 years per 10 million stored. Essentially never.
Watch the fleet. The clock ticks and disks flip from green to dead-red on a steady beat. One outlined tile holds a tracked object — its only copy. Keep watching that tile.
Implementation
Fleet.deathCadence
the arithmetic the two sliders drive
1def death_cadence(fleet_size, afr):2 # afr = per-disk annual failure rate (1%..4%)3 deaths_per_year = fleet_size * afr4 seconds_per_year = 365 * 24 * 36005 # the gap between two disk deaths, in seconds6 gap = seconds_per_year / deaths_per_year7 return gap89# one disk: gap is months — a death is rare10# millions of disks: gap collapses to minutes11# the per-disk rate never changed; the fleet did
Durability.target
what 'eleven nines' names — a target, not a guarantee
1# a designed-for probabilistic target, not a measured SLA2target = 0.99999999999 # eleven nines / year3expected_lost_fraction = 1 - target # 1e-1145# stored 10_000_000 objects?6# expect to lose ~1 object every 10_000 years7objects = 10_000_0008years_per_loss = 1 / (objects * expected_lost_fraction)910# this is DURABLE: do the bytes still EXIST.11# separate from AVAILABLE: can you REACH them now.
Not sure what to ask? Tap a question — the staff engineer answers in the chat panel.