#19YouTube / Netflix Streaming
ABR, CDN, transcoding, hot/cold — the egress, fan-out, and popularity Pareto.
Build a video streaming platform like YouTube or Netflix. Users press play and expect a 4 K stream to start within two seconds; creators upload 50 GB mezzanines and expect the full ABR ladder published within minutes. Almost every byte the system serves is read; almost every byte it stores was produced by transcoding. The architectural pressure is *not* QPS — it is **egress** (you are 15% of the US internet at peak), **fan-out** (one source mezzanine becomes 250 K encode jobs across rungs × codecs × shots), and the **popularity Pareto** (the top 1% of catalogue serves 50% of bytes; the bottom 50% serves <0.1%). The reference design below is what an SRE staff team at Netflix-class scale would actually run a five-year incident retro against — not the 45-minute whiteboard sketch. Every node has load-bearing internals; every edge declares wire semantics; every chaos card is grounded in a cited real-world incident. Focus is on the four pillars: **ABR**, **CDN**, **transcoding**, and **hot/cold tiering**. Comments / search / recommendations / ads are explicitly out of scope.
Reading: Netflix Tech Blog — Per-Title Encode Optimization (2015) · Netflix Tech Blog — Optimized Shot-Based Encodes (2018) · Netflix Tech Blog — Rebuilding Video Pipeline with Microservices (Cosmos, 2023) · Netflix Tech Blog — Serving 100 Gbps from a single Open Connect Appliance · Netflix Open Connect — Overview whitepaper + Fill Patterns docs · Google / YouTube — Reimagining YouTube Video Infrastructure (Argos VCU, 2021) · Apple — Enabling Low-Latency HLS · Hotstar Engineering — How we scaled to 25M concurrent for IPL · AWS — CloudFront Origin Shield in Multi-CDN Deployments · BBA paper — Stanford SIGCOMM 2014 (Buffer-Based ABR) · Pensieve — MIT SIGCOMM 2017 (RL-based ABR)
ABR (HLS / DASH / CMAF)
multi-tier CDN
per-title / per-shot encoding
DRM (Widevine / FairPlay / PlayReady)
hot/cold tiering
transcode fan-out