Build an S3-style distributed object store (12 scenes)
Scene 09 · Multipart upload and the ETag-that-isn't-an-MD5
Parallel resumable parts, a completion manifest, and why the object ETag is a hash-of-hashes ending in -N.
Previously
With the index telling the truth and the object stored durably, the last piece of an object's life is the upload itself — and objects too big for one request expose a famous integrity gotcha.
Scene 09
Multipart upload and the ETag-that-isn't-an-MD5
Diagram
A large file on the left is sliced into N parts (each ≥ 5 MiB except the last) that upload in parallel; each returns its own ETag. The MANIFEST collects the (partNumber, ETag) pairs, and on completion the server stitches the parts into one object whose ETag is a hash-of-hashes ending in "-N". This whole split-upload-stitch flow is a MULTIPART UPLOAD; the trailing "-N" is the tell that the ETag is md5(concatenated part md5s), not md5(file).
A 200 MiB file is sliced into 4 parts. Each uploads in parallel along its own arrow and returns its own ETag into the manifest. Once all four land, the server runs CompleteMultipartUpload and stitches them into one object — whose ETag renders as a hash-of-hashes ending in "-4".
Implementation
Client.multipartUpload
slice the file, upload parts in parallel, then complete
1def multipartUpload(file, partSize):2 uploadId = s3.createMultipartUpload(key)3 manifest = []4 for n, chunk in enumerate(slice(file, partSize), 1):5 # each part >= 5 MiB except the last, partNumber 1..100006 etag = s3.uploadPart(uploadId, n, chunk)7 manifest.append((n, etag)) # (partNumber, ETag)8 return s3.completeMultipartUpload(uploadId, manifest)
Server.completeMultipartUpload
stitch the parts and compute the object ETag
1def completeMultipartUpload(uploadId, manifest):2 parts = sortByPartNumber(manifest) # order, not arrival3 stitchIntoObject(parts)4 if len(parts) == 1 and wasSinglePut(uploadId):5 objectEtag = md5(object.bytes) # == md5(file)6 else:7 joined = concat(md5Binary(p) for p in parts)8 objectEtag = md5(joined) + "-" + str(len(parts))9 return objectEtag
Not sure what to ask? Tap a question — the staff engineer answers in the chat panel.