Build a Service Mesh (Envoy / Istio style) (13 scenes)
Scene 07 · Circuit breaker — the state machine
Closed → open → half-open. Fast-fail to a known-broken dependency and periodically probe for recovery, so callers stop wasting resources on guaranteed failures.
Previously

A retry budget caps how much extra load the fleet generates, but each individual caller is still spending its own time and connections waiting for a backend it could already tell is broken. What's missing is a switch that says 'stop calling this dependency for a while' — and that switch is the circuit breaker.

Scene 07
Circuit breaker — the state machine
Diagram
A **circuit breaker** is a three-state machine that fast-fails requests to a known-broken dependency: **closed** (let through), **open** (reject in milliseconds), **half-open** (let exactly one probe through to test recovery). The left panel shows the three labeled circles and the transitions between them; the right panel shows what that state means for live traffic — requests either reach the backend, bounce off the breaker, or one probe leaks through. Note: Envoy implements this pattern per-replica via outlier detection (next scene); Envoy's own `circuit_breakers` config is connection-pool ceilings, a separate concept.
CIRCUIT BREAKER · STATE MACHINEsustained errors > thresholderr 0% / 50%cooldown elapsedprobe succeededprobe failedCLOSEDtraffic flowsOPENfast-failHALF-OPENone probeREQUEST STREAMCLOSEDbackendreachable42ms42ms42ms42ms42ms42ms42ms42msSTOPWATCH~42ms · normalCALLER EXHAUSTION10%caller pool healthyHealthy backend. Breaker CLOSED — every request reaches the backend at normal latency.
closed: requests flow through →
Watch the breaker trip. Errors climb past the threshold; CLOSED hands off to OPEN; new requests fast-fail in milliseconds. After a cooldown, HALF-OPEN admits exactly one probe — and the result decides whether the breaker closes or stays open.
Implementation
Breaker.onRequest
the three-state machine: closed, open, half-open
1state = CLOSED
2error_rate_window = sliding(60s)
3open_started_at = 0
4
5def on_request(req):
6 if state == CLOSED:
7 outcome = forward(req, timeout=T)
8 error_rate_window.record(outcome)
9 if error_rate_window.error_rate() > THRESHOLD:
10 state = OPEN
11 open_started_at = now()
12 return outcome
13 if state == OPEN:
14 if now() - open_started_at > COOLDOWN:
15 state = HALF_OPEN
16 else:
17 return fast_fail_503() # ~1ms, no backend call
18 if state == HALF_OPEN:
19 outcome = forward(req, timeout=T) # single PROBE
20 state = CLOSED if outcome.ok else OPEN
21 open_started_at = now()
22 return outcome
Why OPEN protects the caller
the asymmetric payoff: 1ms reject vs full-timeout wait
1# Without breaker: every caller waits the full timeout.
2# N callers * T seconds = N*T thread-seconds blocked.
3# Caller's thread pool fills with stuck requests.
4
5# With breaker OPEN: every caller returns in ~1ms.
6# Caller frees the resource and degrades gracefully.
7# Traffic to OTHER (healthy) dependencies keeps flowing.
Envoy.cluster.circuit_breakers (footnote)
connection-pool ceilings — NOT the state machine above
1# Envoy's `circuit_breakers` config is connection-pool
2# ceilings per cluster, not closed/open/half-open.
3# Per-replica state-machine semantics live in
4# outlier_detection (next scene).
5cluster:
6 circuit_breakers:
7 max_connections: 1024
8 max_pending_requests: 1024
9 max_requests: 1024
10 max_retries: 3