Build a Service Mesh (Envoy / Istio style) (13 scenes)
Scene 06 · Timeout and retry budget — bounded patience
Naive multi-hop retries amplify load 243x on a failing backend. A retry budget caps total retries as a fraction of normal traffic so retries can't become the outage.
Previously

Once the proxy picked a replica and the replica is slow, the proxy has to decide how long to wait and whether to try again — and that decision, scaled across the fleet, is where outages are born.

Scene 06
Timeout and retry budget — bounded patience
Diagram
A 5-hop call chain (Client → S1 → S2 → S3 → S4 → S5) where S5 is slow/failing. The clock icon on each arrow represents a **timeout** — the upper bound on how long a proxy waits for a reply before giving up; the route-level timeout covers all retry attempts together (per-try timeout is the sub-bound on a single attempt). The number above each arrow is requests crossing this hop per logical client request; the italic number below is how many of those were retries. The card at the top reads the **effective load on S5**: with no budget, retries compound geometrically and that number climbs to 3^5 = 243. The meter at the bottom is the **retry budget** — a percentage cap on retries vs. active requests; when set, retries that would breach the cap are dropped, and the cluster never sees more than ~budget% extra load from retries. (Footnote: jitter on backoff desynchronises retry crowds and is the partner of backoff; idempotency is the precondition for any retry being safe — both live outside the budgeted vocabulary of this scene.)
EFFECTIVE LOAD ON S5243xretries/hop =3 · budget = offClienthealthyS1healthyS2healthyS3healthyS4healthyS5slow / failing243162 retries8154 retries2718 retries96 retries32 retriesRETRY BUDGET · offno cap — retries can dominate the clusternormal trafficretries67% of cluster traffic is retriesNo budget: each hop retries independently, multiplying load by 3^5 = 243× on S5.
clock = timeout (the bound that fires the retry)
each hop retries independently → 3^5 = 243
retry budget — cap retries as % of normal traffic
S5 is failing. Each upstream hop retries 3 times per attempt. Watch the per-hop counter climb back up the chain — the badge above the diagram lands on 243× (3^5), the load S5 actually sees per single client request.
Implementation
Proxy.sendWithRetry
the naive per-hop retry every proxy runs by default
1def send_with_retry(req):
2 for attempt in range(retries + 1):
3 resp = try_send(
4 req, deadline = per_try_timeout,
5 )
6 if resp.ok:
7 return resp
8 sleep(backoff_with_jitter(attempt))
9 return ERROR
// Why naive retry storms
the compounding that makes 5 hops × 3 retries = 243x
1# Hop 1 retries 3x on failure.
2# Hop 2 inherits all of hop 1's load,
3# and retries 3x of each failure too.
4# Every hop multiplies, not adds.
5#
6# load_on_tail = retries ^ hops
7# = 3 ^ 5
8# = 243x per single client request
Cluster.admitRetry
Envoy's token-bucket admission: retry only if below cap
1def admit_retry(cluster):
2 budget = budget_percent / 100 # e.g. 0.20
3 active = cluster.active_request_count
4 retrying = cluster.active_retry_count
5 cap = max(
6 active * budget,
7 min_retry_concurrency,
8 )
9 if retrying >= cap:
10 return DROP_RETRY # honor the cap
11 return ALLOW_RETRY
route.yaml
the config that wires timeout, per-try, and budget together
1route:
2 timeout: 30s # whole call, incl. retries
3 retry_policy:
4 retries: 3
5 per_try_timeout: 5s # bound on one attempt
6 retry_budget:
7 budget_percent: 20.0 # cap retries at 20% of traffic
8 min_retry_concurrency: 3