Hit ratio — the headline and the diagnostic ladder

Scene 11 · Hit ratio — the headline and the diagnostic ladder

Request hit ratio vs byte hit ratio, and the 5-step ladder when it crashes: Vary cardinality → TTL config → purge frequency → bypass rules → cookie key.

Previously

Every lever we've built — TTL, Vary, purge, shield, bypass — moves one number, origin RPS, and the headline that summarizes them all is hit ratio; this is the dashboard you read, and the ladder you walk when it's low.

Scene 11

Hit ratio — the headline and the diagnostic ladder

Watch

Diagram

Top: three big metric tiles — **request hit ratio** is the fraction of requests served from cache; **byte hit ratio** is the fraction of bytes served from cache. They diverge when small objects cache well but big ones don't. Origin RPS is the headline failure mode every other lever pushes or pulls. Middle: a Sankey-style flow where incoming requests split into a fat green HIT branch and a thin red MISS branch; the MISS branch fans into five named buckets — Vary explosion, no-store/no-cache, recent purge, bypass rule, cookie in cache key. The active incident's bucket inflates and throbs. Right: a 5-step diagnostic ladder; selecting the step whose bucket matches the incident lights up the matching bucket as the root cause.

Sources

How do you tell if your CDN is earning its money? The first number you check is what fraction of requests it serves from its own cache vs forwarding to origin — that fraction is the **hit ratio**, and it splits into two: request hit ratio (what users feel) and byte hit ratio (what the CFO pays for). Healthy dashboard shown: request hit ratio 92%, byte hit ratio 88%, origin RPS low; the Sankey is mostly green-HIT and the thin red MISS share is spread evenly across the five buckets — no single failure mode dominates.

Implementation

Dashboard.computeHitRatio

the two headline numbers, computed from edge log lines

1def computeHitRatio(edge_logs):
2    hits = count(l for l in edge_logs if l.cache == 'HIT')
3    misses = count(l for l in edge_logs if l.cache == 'MISS')
4    request_hit_ratio = hits / (hits + misses)
5 
6    bytes_from_cache = sum(l.bytes for l in edge_logs
7                           if l.cache == 'HIT')
8    total_bytes = sum(l.bytes for l in edge_logs)
9    byte_hit_ratio = bytes_from_cache / total_bytes
10 
11    # diverges when one big-object class misses
12    return request_hit_ratio, byte_hit_ratio

Operator.diagnose

the 5-step ladder, ordered cheapest-investigation-first

1def diagnose(metrics):
2    if metrics.vary_cardinality_per_url > 5:
3        return 'vary'        # one curl, read Vary header
4    if metrics.ttl_avg < 60 or metrics.no_store_pct > 0.1:
5        return 'ttl'         # curl asset, read Cache-Control
6    if metrics.purge_rate_per_hour > 10:
7        return 'purge'       # check CI logs for zone-wide purges
8    if metrics.bypass_pct > 0.2:
9        return 'bypass'      # audit /api/* style rules
10    if metrics.cookie_in_key:
11        return 'cookie'      # strip cookies for asset paths
12    return 'investigate_origin'

PreviousBypass — when caching is wrong, the CDN still earns its keep NextDesign your CDN configuration