Build an S3-style distributed object store (12 scenes)
Scene 09 · Multipart upload and the ETag-that-isn't-an-MD5
Parallel resumable parts, a completion manifest, and why the object ETag is a hash-of-hashes ending in -N.
Previously

With the index telling the truth and the object stored durably, the last piece of an object's life is the upload itself — and objects too big for one request expose a famous integrity gotcha.

Scene 09
Multipart upload and the ETag-that-isn't-an-MD5
Diagram
A large file on the left is sliced into N parts (each ≥ 5 MiB except the last) that upload in parallel; each returns its own ETag. The MANIFEST collects the (partNumber, ETag) pairs, and on completion the server stitches the parts into one object whose ETag is a hash-of-hashes ending in "-N". This whole split-upload-stitch flow is a MULTIPART UPLOAD; the trailing "-N" is the tell that the ETag is md5(concatenated part md5s), not md5(file).
Multipart uploadeach part ≥ 5 MiB · uploaded in parallel · server stitches on completepart size64 MiB → 4 partschange it → ETag changesSOURCE FILE200 MiB0/4 up#164 MiB#264 MiB#364 MiB#48 MiBparallel · resumableSERVERcollects parts,stitches on completeawaiting complete…MANIFEST(partNumber, ETag)no parts yet
A 200 MiB file is sliced into 4 parts. Each uploads in parallel along its own arrow and returns its own ETag into the manifest. Once all four land, the server runs CompleteMultipartUpload and stitches them into one object — whose ETag renders as a hash-of-hashes ending in "-4".
Implementation
Client.multipartUpload
slice the file, upload parts in parallel, then complete
1def multipartUpload(file, partSize):
2 uploadId = s3.createMultipartUpload(key)
3 manifest = []
4 for n, chunk in enumerate(slice(file, partSize), 1):
5 # each part >= 5 MiB except the last, partNumber 1..10000
6 etag = s3.uploadPart(uploadId, n, chunk)
7 manifest.append((n, etag)) # (partNumber, ETag)
8 return s3.completeMultipartUpload(uploadId, manifest)
Server.completeMultipartUpload
stitch the parts and compute the object ETag
1def completeMultipartUpload(uploadId, manifest):
2 parts = sortByPartNumber(manifest) # order, not arrival
3 stitchIntoObject(parts)
4 if len(parts) == 1 and wasSinglePut(uploadId):
5 objectEtag = md5(object.bytes) # == md5(file)
6 else:
7 joined = concat(md5Binary(p) for p in parts)
8 objectEtag = md5(joined) + "-" + str(len(parts))
9 return objectEtag
Not sure what to ask? Tap a question — the staff engineer answers in the chat panel.