Build an S3-style distributed object store (12 scenes)
Scene 03 · Folders are a lie — the flat keyspace and its index
Keys are flat strings; the index maps key→location and shards by range, with a per-prefix throughput ceiling.
Previously

Surviving disk death means the bytes live somewhere other than one disk — so something has to record where; that something is the index, the first plane we crack open.

Scene 03
Folders are a lie — the flat keyspace and its index
Diagram
Left: object keys shown as flat opaque strings — every slash is just a byte in one string, not a folder. Right: the index, a sorted map from key to a (dim) storage location, cut into key-range shards. Each shard carries a request-rate meter with a ceiling line; the console-folder toggle shows the same keys as a fake '/'-split tree.
OBJECT KEYSflat opaque strings — slashes are bytesTHE INDEX — sorted map, sharded by key rangeimg/cat.pngimg/dog.pnglogs/2026/06/03/app.loglogs/2026/06/03/db.loglogs/2026/06/04/app.logone string · no directory to lockresolve["" .. "img/")ceiling3,500 PUT/s · 5,500 GET/s per partition→ loc:••••• (data plane)["img/" .. "logs/")ceiling3,500 PUT/s · 5,500 GET/s per partition→ loc:••••• (data plane)["logs/" .. ∞)ceiling3,500 PUT/s · 5,500 GET/s per partition→ loc:••••• (data plane)1 prefix → aggregate:~5,500 GET/s (one partition)no global request limit — spread the keyspace
A key resolves through the sorted map to a (dim) storage location.
Each object is addressed by a key — and a key is one flat string; the slashes in "logs/2026/06/03/app.log" are just bytes, not folders. The map on the right sorts those strings and shards them by range. Try the console-folder toggle: it splits the keys on '/' into a tree, but there is no tree underneath.
Implementation
Index.resolve
a GET resolves a flat key string to its storage location
1def resolve(key):
2 # key is ONE opaque string; '/' is just a byte.
3 # No tree walk — binary search the sorted map.
4 part = partitionForKey(key)
5 entry = part.lookup(key) # key -> location + metadata
6 return entry.location
7
8def partitionForKey(key):
9 # partitions are contiguous lexicographic key ranges,
10 # so keys sharing a prefix land on ONE partition.
11 return bisect(partitionBoundaries, key)
Partition.serveRequest
each key-range partition meters its own request rate
1# limits are PER partition, not per bucket
2PUT_CEILING = 3500 # req/s
3GET_CEILING = 5500 # req/s
4
5def serveRequest(req):
6 rate = self.observedRate(req.kind)
7 ceiling = (PUT_CEILING if req.kind == PUT
8 else GET_CEILING)
9 if rate > ceiling:
10 return Http503('SlowDown')
11 return serve(req)
Index.maybeSplit
a saturated partition is gradually split into two
1# background loop on the index plane
2def maybeSplit(part):
3 if part.sustainedRate > part.ceiling:
4 mid = part.medianKey()
5 # split the range at mid -> two partitions,
6 # each owning half the keys. Takes minutes,
7 # so a sudden hot prefix throttles first.
8 lo, hi = part.splitAt(mid)
9 partitionBoundaries.insert(mid)
Not sure what to ask? Tap a question — the staff engineer answers in the chat panel.