Build a CDN (13 scenes)
Scene 11 · Hit ratio — the headline and the diagnostic ladder
Request hit ratio vs byte hit ratio, and the 5-step ladder when it crashes: Vary cardinality → TTL config → purge frequency → bypass rules → cookie key.
Previously
Every lever we've built — TTL, Vary, purge, shield, bypass — moves one number, origin RPS, and the headline that summarizes them all is hit ratio; this is the dashboard you read, and the ladder you walk when it's low.
Scene 11
Hit ratio — the headline and the diagnostic ladder
Diagram
Top: three big metric tiles — **request hit ratio** is the fraction of requests served from cache; **byte hit ratio** is the fraction of bytes served from cache. They diverge when small objects cache well but big ones don't. Origin RPS is the headline failure mode every other lever pushes or pulls. Middle: a Sankey-style flow where incoming requests split into a fat green HIT branch and a thin red MISS branch; the MISS branch fans into five named buckets — Vary explosion, no-store/no-cache, recent purge, bypass rule, cookie in cache key. The active incident's bucket inflates and throbs. Right: a 5-step diagnostic ladder; selecting the step whose bucket matches the incident lights up the matching bucket as the root cause.
Sources
- docCloudflare: Cache analytics (cache status & ratio)
- docFastly: Real-time analytics — hit ratio
- blogioRiver: Cache Hit Ratio — definition and diagnostics
- blogPatrick Meenan / WebPageTest — caching diagnostics in the wild
- codeAEM project archetype #680 — Vary: User-Agent in production
- docCloudflare community: Cache purge time / propagation effects on hit ratio
How do you tell if your CDN is earning its money? The first number you check is what fraction of requests it serves from its own cache vs forwarding to origin — that fraction is the **hit ratio**, and it splits into two: request hit ratio (what users feel) and byte hit ratio (what the CFO pays for). Healthy dashboard shown: request hit ratio 92%, byte hit ratio 88%, origin RPS low; the Sankey is mostly green-HIT and the thin red MISS share is spread evenly across the five buckets — no single failure mode dominates.
Implementation
Dashboard.computeHitRatio
the two headline numbers, computed from edge log lines
1def computeHitRatio(edge_logs):2 hits = count(l for l in edge_logs if l.cache == 'HIT')3 misses = count(l for l in edge_logs if l.cache == 'MISS')4 request_hit_ratio = hits / (hits + misses)56 bytes_from_cache = sum(l.bytes for l in edge_logs7 if l.cache == 'HIT')8 total_bytes = sum(l.bytes for l in edge_logs)9 byte_hit_ratio = bytes_from_cache / total_bytes1011 # diverges when one big-object class misses12 return request_hit_ratio, byte_hit_ratio
Operator.diagnose
the 5-step ladder, ordered cheapest-investigation-first
1def diagnose(metrics):2 if metrics.vary_cardinality_per_url > 5:3 return 'vary' # one curl, read Vary header4 if metrics.ttl_avg < 60 or metrics.no_store_pct > 0.1:5 return 'ttl' # curl asset, read Cache-Control6 if metrics.purge_rate_per_hour > 10:7 return 'purge' # check CI logs for zone-wide purges8 if metrics.bypass_pct > 0.2:9 return 'bypass' # audit /api/* style rules10 if metrics.cookie_in_key:11 return 'cookie' # strip cookies for asset paths12 return 'investigate_origin'