Build an S3-style distributed object store (12 scenes)
Scene 05 · Replication vs erasure coding: same safety, a third the cost
Split into k+m fragments — any k rebuild it — tolerating m losses at ~1.4× instead of 3× replication.
Previously

Now that an object is a sealed, never-changing blob, we can store it durably — and the cheapest safe way is to split it, not to copy it.

Scene 05
Replication vs erasure coding: same safety, a third the cost
Diagram
On the left, the sealed immutable object from the previous scene. An encoder in the center splits it two ways: as 3 identical whole copies (replication), or as k data fragments plus m parity fragments (erasure coding). A 'kill fragments' lever greys tiles out one at a time; while at least k survive, a reconstruct animation reads any k of them and rebuilds the original blob. Two readouts move in opposite directions: storage overhead and durability nines.
erasure coding · 17+3any k of the k+m fragments rebuild the objectobject{ }immutable blobencoder17+320 fragments · k=17 data + m=3 parityd0d1d2d3d4d5d6d7d8d9d10d11d12d13d14d15d16p0p1p2dataparitySTORAGE OVERHEAD1.18xDURABILITY~11 ninesFAULT TOLERANCEtolerates 3 losses
The sealed object feeds the encoder and comes out as 17 data fragments plus 3 small extra fragments computed from them. The lever kills three fragments; the survivors still rebuild the original byte-for-byte. Watch the two readouts: storage overhead and durability.
Implementation
Encoder.encode
split the sealed object once into k data + m parity
1def encode(blob, k, m):
2 data = split_into(blob, k) # k data fragments
3 # parity = matrix-multiply over GF(2^8)
4 parity = rs_encode(data, m) # m parity fragments
5 fragments = data + parity # k + m total
6 # immutable: computed once, never re-touched
7 for f in fragments:
8 node = pick_failure_domain(f)
9 node.write(f)
10 index.commit(key, locations(fragments))
Storage.read
the GET path — and the degraded read when fragments are dead
1def read(key):
2 locs = index.lookup(key)
3 alive = [f for f in locs if f.up]
4 if len(alive) < k:
5 raise Unrecoverable # fewer than k survive
6 if all_data_present(alive):
7 return concat(data_fragments(alive))
8 # degraded read: rebuild from ANY k survivors
9 k_subset = alive[:k]
10 return rs_decode(k_subset) # inverse-matrix solve
Layout.cost
the trade every layout makes: bytes stored vs losses tolerated
1def cost(scheme, k, m):
2 if scheme == 'replicate':
3 copies = m + 1
4 overhead = copies # 1 byte stored per copy
5 tolerated = copies - 1 # lose all-but-one
6 else: # erasure
7 overhead = (k + m) / k # thin parity, not copies
8 tolerated = m # lose any m of k+m
9 return overhead, tolerated
Not sure what to ask? Tap a question — the staff engineer answers in the chat panel.