Timeout and retry budget — bounded patience — Build a Service Mesh (Envoy / Istio style)

Build a Service Mesh (Envoy / Istio style) (13 scenes)

Scene 06 · Timeout and retry budget — bounded patience

Naive multi-hop retries amplify load 243x on a failing backend. A retry budget caps total retries as a fraction of normal traffic so retries can't become the outage.

Previously

Once the proxy picked a replica and the replica is slow, the proxy has to decide how long to wait and whether to try again — and that decision, scaled across the fleet, is where outages are born.

Scene 06

Timeout and retry budget — bounded patience

Watch

Diagram

A 5-hop call chain (Client → S1 → S2 → S3 → S4 → S5) where S5 is slow/failing. The clock icon on each arrow represents a **timeout** — the upper bound on how long a proxy waits for a reply before giving up; the route-level timeout covers all retry attempts together (per-try timeout is the sub-bound on a single attempt). The number above each arrow is requests crossing this hop per logical client request; the italic number below is how many of those were retries. The card at the top reads the **effective load on S5**: with no budget, retries compound geometrically and that number climbs to 3^5 = 243. The meter at the bottom is the **retry budget** — a percentage cap on retries vs. active requests; when set, retries that would breach the cap are dropped, and the cluster never sees more than ~budget% extra load from retries. (Footnote: jitter on backoff desynchronises retry crowds and is the partner of backoff; idempotency is the precondition for any retry being safe — both live outside the budgeted vocabulary of this scene.)

clock = timeout (the bound that fires the retry)

each hop retries independently → 3^5 = 243

retry budget — cap retries as % of normal traffic

S5 is failing. Each upstream hop retries 3 times per attempt. Watch the per-hop counter climb back up the chain — the badge above the diagram lands on 243× (3^5), the load S5 actually sees per single client request.

Implementation

Proxy.sendWithRetry

the naive per-hop retry every proxy runs by default

1def send_with_retry(req):
2    for attempt in range(retries + 1):
3        resp = try_send(
4            req, deadline = per_try_timeout,
5        )
6        if resp.ok:
7            return resp
8        sleep(backoff_with_jitter(attempt))
9    return ERROR

// Why naive retry storms

the compounding that makes 5 hops × 3 retries = 243x

1# Hop 1 retries 3x on failure.
2# Hop 2 inherits all of hop 1's load,
3# and retries 3x of each failure too.
4# Every hop multiplies, not adds.
5#
6# load_on_tail = retries ^ hops
7#              = 3 ^ 5
8#              = 243x  per single client request

Cluster.admitRetry

Envoy's token-bucket admission: retry only if below cap

1def admit_retry(cluster):
2    budget = budget_percent / 100  # e.g. 0.20
3    active = cluster.active_request_count
4    retrying = cluster.active_retry_count
5    cap = max(
6        active * budget,
7        min_retry_concurrency,
8    )
9    if retrying >= cap:
10        return DROP_RETRY  # honor the cap
11    return ALLOW_RETRY

route.yaml

the config that wires timeout, per-try, and budget together

1route:
2  timeout: 30s              # whole call, incl. retries
3  retry_policy:
4    retries: 3
5    per_try_timeout: 5s     # bound on one attempt
6    retry_budget:
7      budget_percent: 20.0  # cap retries at 20% of traffic
8      min_retry_concurrency: 3

PreviousCluster and load balancing — pick one of many NextCircuit breaker — the state machine