Problems
#35Live Viewer Count (YouTube/Twitch)
Millions of viewers on one entity. Approximate by design.
Saved on this device
The "1.4M watching now" counter on a live stream is the canonical approximate counting problem at internet scale. The number doesn't have to be exact — it has to be believable, monotone-feeling, fast-enough to feel live, and robust to coordinated viewbot attacks. Production teams at YouTube, Twitch, Meta, JioHotstar, and Discord have converged on a remarkably consistent shape: heartbeat ingest at MMevents/sec, two-stage HyperLogLog aggregation against hot keys, per-region sketch-merge into a designated primary region, monotone-display smoothing at the publish boundary, and hierarchical WebSocket fanout to push the displayed number back to every subscribed player. Every component has a real production failure mode; every knob has a story.
Reading: Flajolet, Fusy, Gandouet, Meunier — HyperLogLog (DMTCS 2007) · Heule, Nunkesser, Hall — HyperLogLog in Practice (EDBT 2013) · Cormode, Muthukrishnan — Count-Min Sketch (J.Alg. 2005) · Apache DataSketches — Theta sketch (Yahoo) · Engineering at Meta — Under the hood: Broadcasting live video to millions (2015) · Engineering at Meta — Scaling Live streaming for millions of viewers (2020) · Twitch Engineering — State of Engineering 2023 (Spade, PubSub, Kinesis) · Twitch Engineering — Breaking the Monolith at Twitch (2022) · Twitch Engineering — The QoUX Journey (2025) · Twitch Engineering — How Twitch Uses PostgreSQL (2016) · Twitch Developers — PubSub API (≤10 conns/IP, ≤50 topics/conn) · Discord — How Discord Scaled Elixir to 5,000,000 Concurrent Users (Manifold + Semaphore + FastGlobal) · Discord — Real-time Communication at Scale with Elixir (2020) · Slack Engineering — Real-time Messaging (Presence Servers + Gateway Servers) · Slack Engineering — Migrating Millions of Concurrent WebSockets to Envoy · HasGeek Rootconf — Scaling hotstar.com for 25M concurrent viewers (2019) · Pragmatic Engineer — Live streaming at world-record scale with Ashutosh Agrawal (JioHotstar) · Last9 — Cricket Scale Series #1 (IPL 30M concurrent) · ByteByteGo — How Disney+ Hotstar / JioHotstar scales (NAT-per-subnet, multi-CDN) · Cloudflare — June 21 2022 cross-region routing retro · AWS — Kinesis Data Streams Nov 25 2020 retro (thread-limit cascading failure) · Apache Flink — Stateful Stream Processing + Watermarks + RocksDB checkpoints · Confluent — KIP-429 Cooperative-Sticky Rebalance · Confluent — KIP-794 Strictly Uniform Sticky Partitioner · Kreps — Questioning the Lambda Architecture (Kappa, 2014) · Beyer et al. — SRE Workbook (Managing Load, Addressing Cascading Failures) · YouTube Help — How engagement metrics are counted (the 'we freeze on purpose' rule)
sampled heartbeats vs explicit presence
HyperLogLog (p=14, ±0.81%) and PFMERGE across regions
two-stage aggregation against hot streams
monotone-display + milestone-pinning UX rules
session-bound stream-tokens (viewbot defense at protocol)
WebSocket fanout tree (Twitch PubSub / Discord Manifold pattern)
tier-based sampling (100% / 10% / 1% by viewer count)
ad-spike cadence flip (30 s → 5 s heartbeat)
panic-mode load shedding (server-signaled client degradation)
promote-publish freshness budget (30–90 s industry norm)