Build a gRPC-style RPC framework (14 scenes)
Scene 07 · Retries: idempotency and a token budget
Only an idempotent method is safe to auto-retry, and even then a token-bucket budget must cap retries — or a brownout turns into a self-sustaining retry storm.
Previously

We learned to abort doomed work with deadlines and cancellation; now the mirror problem — work that FAILED and might be worth re-sending. But scene 1 warned we often can't tell whether the first attempt already ran, so a retry is a loaded gun.

Scene 07
Retries: idempotency and a token budget
Diagram
On the left a client calls a single backend (GreeterService) that is in a brownout (slow/failing). The arrows are the original call plus one per retry. The big top meter is the OFFERED LOAD on the backend — 1.0× means exactly its capacity; past ~4× it turns into a storm and the backend flatlines, latching a METASTABLE badge when it stays down after the original trigger clears. Idempotency: a method is idempotent when re-running it has no extra effect (greet) — its retry arrows are green (safe); a non-idempotent method (charge) turns them red because every retry risks doing the work twice. Retry budget (token bucket): the lower-left bucket holds tokens; each failure drains one, each success refills tokenRatio; once tokens drop below half the bucket, retries PAUSE and the offered-load meter caps near 1.0×.
clientgreet("Ada")idempotent ✓1 call + 3 retriesGreeterServiceflatlinedOFFERED LOAD ON BACKEND1.0× (capacity)11×RETRY STORMRETRY BUDGET (TOKEN BUCKET)OFFno cap on retriesevery failure re-sent — load compoundsEach failure is re-sent; with no budget the load compounds
The backend is browning out — slow and dropping calls. With no budget and 3 retries per failed call, every client re-sends at once. Watch the OFFERED-LOAD meter on the backend climb. Partway through, the original slowness clears (the trigger turns off) — but the load DOESN'T drop, because the retries are now feeding themselves. That self-sustaining overload, where the service stays down after its own cause is gone, is a *retry storm*: blind retries pile on exactly when a service can least afford it. This scene is about the two things that make retrying safe.
Implementation
Client.callWithRetry
the retry loop wrapping every outbound call
1def callWithRetry(method, req):
2 attempt = 0
3 while attempt < maxAttempts: # slider: retries + 1
4 status = send(method, req)
5 if status == OK:
6 budget.onSuccess() # refill tokenRatio
7 return
8 if status not in retryableStatusCodes:
9 raise # e.g. not UNAVAILABLE
10 budget.onFailure() # drain one token
11 if not budget.allow(): # bucket below half
12 raise
13 sleep(backoffWithJitter(attempt))
14 attempt += 1
RetryBudget.allow
token bucket capping retries as a fraction of traffic
1tokens = maxTokens # full bucket
2
3def onFailure():
4 tokens = max(0, tokens - 1)
5
6def onSuccess():
7 tokens = min(maxTokens, tokens + tokenRatio)
8
9def allow():
10 if not enabled:
11 return True # no budget: never pause
12 return tokens >= maxTokens / 2
Method.execute
why a replay is safe only for an idempotent method
1# greet is idempotent: re-running returns the same value
2def greet(name):
3 return 'hello ' + name
4
5# charge is NOT: each call moves money
6def charge(name, amount):
7 account[name].balance -= amount # replay double-bills
8 return receipt()
9
10# the leak: a retry can't tell if the first attempt ran