Build a Service Mesh (Envoy / Istio style) (13 scenes)
Scene 06 · Timeout and retry budget — bounded patience
Naive multi-hop retries amplify load 243x on a failing backend. A retry budget caps total retries as a fraction of normal traffic so retries can't become the outage.
Previously
Once the proxy picked a replica and the replica is slow, the proxy has to decide how long to wait and whether to try again — and that decision, scaled across the fleet, is where outages are born.
Scene 06
Timeout and retry budget — bounded patience
Diagram
A 5-hop call chain (Client → S1 → S2 → S3 → S4 → S5) where S5 is slow/failing. The clock icon on each arrow represents a **timeout** — the upper bound on how long a proxy waits for a reply before giving up; the route-level timeout covers all retry attempts together (per-try timeout is the sub-bound on a single attempt). The number above each arrow is requests crossing this hop per logical client request; the italic number below is how many of those were retries. The card at the top reads the **effective load on S5**: with no budget, retries compound geometrically and that number climbs to 3^5 = 243. The meter at the bottom is the **retry budget** — a percentage cap on retries vs. active requests; when set, retries that would breach the cap are dropped, and the cluster never sees more than ~budget% extra load from retries. (Footnote: jitter on backoff desynchronises retry crowds and is the partner of backoff; idempotency is the precondition for any retry being safe — both live outside the budgeted vocabulary of this scene.)
clock = timeout (the bound that fires the retry)
each hop retries independently → 3^5 = 243
retry budget — cap retries as % of normal traffic
S5 is failing. Each upstream hop retries 3 times per attempt. Watch the per-hop counter climb back up the chain — the badge above the diagram lands on 243× (3^5), the load S5 actually sees per single client request.
Implementation
Proxy.sendWithRetry
the naive per-hop retry every proxy runs by default
1def send_with_retry(req):2 for attempt in range(retries + 1):3 resp = try_send(4 req, deadline = per_try_timeout,5 )6 if resp.ok:7 return resp8 sleep(backoff_with_jitter(attempt))9 return ERROR
// Why naive retry storms
the compounding that makes 5 hops × 3 retries = 243x
1# Hop 1 retries 3x on failure.2# Hop 2 inherits all of hop 1's load,3# and retries 3x of each failure too.4# Every hop multiplies, not adds.5#6# load_on_tail = retries ^ hops7# = 3 ^ 58# = 243x per single client request
Cluster.admitRetry
Envoy's token-bucket admission: retry only if below cap
1def admit_retry(cluster):2 budget = budget_percent / 100 # e.g. 0.203 active = cluster.active_request_count4 retrying = cluster.active_retry_count5 cap = max(6 active * budget,7 min_retry_concurrency,8 )9 if retrying >= cap:10 return DROP_RETRY # honor the cap11 return ALLOW_RETRY
route.yaml
the config that wires timeout, per-try, and budget together
1route:2 timeout: 30s # whole call, incl. retries3 retry_policy:4 retries: 35 per_try_timeout: 5s # bound on one attempt6 retry_budget:7 budget_percent: 20.0 # cap retries at 20% of traffic8 min_retry_concurrency: 3