Build an S3-style distributed object store
12 scenes · ~84 min · build the primitive

Build your own S3-style distributed object store

Eleven nines of durability over disks that fail weekly. Build the object store from first principles — flat keyspace, immutable objects, erasure coding instead of replication, eventual consistency turned strong, multipart upload, lifecycle and tiering — and feel why every modern data lake sits on top of something shaped exactly like this.

  1. 01
  2. 02
  3. 03
  4. 04
  5. 05
  6. 05a
  7. 06
  8. 07
  9. 08
  10. 09
  11. 10
  12. 11
  1. 01
    Hello, object store: it's just a bucket
    Bucket, object, key, PUT, GET — the flat key→bytes model, before we earn the word "forever."
    ~7 min
  2. 02
    Disks fail weekly — so what does durable mean?
    Millions of disks at a few % AFR: one dies every few minutes. Eleven nines — and why copies alone are too costly.
    ~7 min
  3. 03
    Folders are a lie — the flat keyspace and its index
    Keys are flat strings; the index maps key→location and shards by range, with a per-prefix throughput ceiling.
    ~7 min
  4. 04
    Immutable objects: overwrite is a new version, not an edit
    Bytes never change: overwrite = new version + atomic pointer flip; delete drops a marker the old bytes hide behind.
    ~7 min
  5. 05
    Replication vs erasure coding: same safety, a third the cost
    Split into k+m fragments — any k rebuild it — tolerating m losses at ~1.4× instead of 3× replication.
    ~7 min
  6. 05a
    An m-tolerant code dies if m+1 fragments share a rack
    The code's m-fault tolerance is real only if placement spreads fragments across independent failure domains.
    ~7 min
  7. 06
    Two planes: the index says where, storage holds what
    Index plane vs data plane; trace a GET (read k, reconstruct, verify, stream) and a PUT's atomic index commit.
    ~7 min
  8. 07
    The payoff: durability is repair winning a race
    Eleven nines is a rate equation: scrub finds rot, repair rebuilds lost fragments faster than disks destroy them.
    ~7 min
  9. 08
    Why strong consistency took S3 fourteen years
    Eventual-consistency anomalies, the Dec 2020 flip, and the witness read-barrier — strong within a region only.
    ~7 min
  10. 09
    Multipart upload and the ETag-that-isn't-an-MD5
    Parallel resumable parts, a completion manifest, and why the object ETag is a hash-of-hashes ending in -N.
    ~7 min
  11. 10
    Lifecycle and tiering: cheap because bytes are immutable
    Storage classes trade retrieval latency for cost; a lifecycle rule slides an object down the ladder as a pointer move.
    ~7 min
  12. 11
    Design canvas: defend your durability number
    Assemble code, placement, consistency, and lifecycle for a workload — then survive a correlated-failure objection.
    ~7 min