All scenes
Build an S3-style distributed object store
12 scenes · ~84 min · build the primitive
Build your own S3-style distributed object store
Eleven nines of durability over disks that fail weekly. Build the object store from first principles — flat keyspace, immutable objects, erasure coding instead of replication, eventual consistency turned strong, multipart upload, lifecycle and tiering — and feel why every modern data lake sits on top of something shaped exactly like this.
- 01Hello, object store: it's just a bucketBucket, object, key, PUT, GET — the flat key→bytes model, before we earn the word "forever."~7 min
- 02Disks fail weekly — so what does durable mean?Millions of disks at a few % AFR: one dies every few minutes. Eleven nines — and why copies alone are too costly.~7 min
- 03Folders are a lie — the flat keyspace and its indexKeys are flat strings; the index maps key→location and shards by range, with a per-prefix throughput ceiling.~7 min
- 04Immutable objects: overwrite is a new version, not an editBytes never change: overwrite = new version + atomic pointer flip; delete drops a marker the old bytes hide behind.~7 min
- 05Replication vs erasure coding: same safety, a third the costSplit into k+m fragments — any k rebuild it — tolerating m losses at ~1.4× instead of 3× replication.~7 min
- 05aAn m-tolerant code dies if m+1 fragments share a rackThe code's m-fault tolerance is real only if placement spreads fragments across independent failure domains.~7 min
- 06Two planes: the index says where, storage holds whatIndex plane vs data plane; trace a GET (read k, reconstruct, verify, stream) and a PUT's atomic index commit.~7 min
- 07The payoff: durability is repair winning a raceEleven nines is a rate equation: scrub finds rot, repair rebuilds lost fragments faster than disks destroy them.~7 min
- 08Why strong consistency took S3 fourteen yearsEventual-consistency anomalies, the Dec 2020 flip, and the witness read-barrier — strong within a region only.~7 min
- 09Multipart upload and the ETag-that-isn't-an-MD5Parallel resumable parts, a completion manifest, and why the object ETag is a hash-of-hashes ending in -N.~7 min
- 10Lifecycle and tiering: cheap because bytes are immutableStorage classes trade retrieval latency for cost; a lifecycle rule slides an object down the ladder as a pointer move.~7 min
- 11Design canvas: defend your durability numberAssemble code, placement, consistency, and lifecycle for a workload — then survive a correlated-failure objection.~7 min