Build an S3-style distributed object store (12 scenes)
Scene 07 · The payoff: durability is repair winning a race
Eleven nines is a rate equation: scrub finds rot, repair rebuilds lost fragments faster than disks destroy them.
Previously
Now that the index plane tracks every fragment and the data plane holds them, a background process can watch for losses and fix them — which is exactly the answer to the mystery we left open at the start: how do eleven nines survive disks failing every week?
Scene 07
The payoff: durability is repair winning a race
Diagram
One object's 17 data + 3 parity fragments live on 20 distinct disks inside a fleet of millions. A scrub worker (amber dashed) sweeps fragments at rest, recomputing checksums to catch silent bit rot; a repair worker (green glow) reads the survivors, rebuilds any lost fragment, and writes it to a fresh disk — restoring full redundancy. The REDUNDANCY DEFICIT gauge counts how many of the 20 are currently missing or bad.
Disks die on the fleet cadence — every few minutes, somewhere, one fails and takes a fragment with it. Watch the two background workers keep the object whole: the scrub worker sweeps fragments at rest checking they haven't silently gone bad, and the repair worker reads the survivors and rebuilds each lost fragment onto a fresh disk. The deficit gauge keeps drifting back to zero.
Implementation
Scrub.sweep
the background sweep that catches silent rot at rest
1# runs continuously on the background-ops fleet2def sweep():3 for frag in fleet.fragments_at_rest():4 stored = read(frag.location)5 if checksum(stored) != frag.expected_checksum:6 # silent bit rot: a live disk, a bad fragment7 mark_lost(frag) # hand to Repair8 advance_cursor()
Repair.rebuild
reads k survivors, reconstructs the lost fragment
1def rebuild(obj, lost_frag):2 survivors = read_any(obj.fragments, count = k)3 rebuilt = reed_solomon_decode(survivors)4 fresh = pick_disk(distinct_failure_domain)5 write(fresh, rebuilt) # full k+m restored6 index.point(obj, lost_frag, fresh)7 # MTTR = wall-clock this rebuild took
Durability.estimate
why repair time, not fragment count, sets the nines
1def annual_loss_probability(failure_rate, mttr):2 # one fragment down opens a window of length mttr3 window = failure_rate * mttr4 # need MORE THAN m losses inside one window to lose obj5 p_lose = window ** (m + 1)6 # m (code) sets the exponent; mttr scales the base7 return p_lose
Not sure what to ask? Tap a question — the staff engineer answers in the chat panel.