Cluster and load balancing — pick one of many — Build a Service Mesh (Envoy / Istio style)

Build a Service Mesh (Envoy / Istio style) (13 scenes)

Scene 05 · Cluster and load balancing — pick one of many

Behind one destination name is a cluster of replicas; round-robin, least-request, or ring-hash decides who serves each request. Round-robin is the wrong default under heterogeneous latency.

Previously

Picking 'one of many replicas' is its own design decision: the route's destination name resolves to a cluster, and inside that cluster the proxy applies a load-balancing policy.

Scene 05

Cluster and load balancing — pick one of many

Watch

Diagram

On the left is the proxy with a stream of incoming requests. On the right is a **cluster** — a named group of replica endpoints (here, `checkout-svc` with R1..R4); R3 is artificially slow. Above each replica is its in-flight queue tower; below each is its service-time badge. The strip under the cluster is the **load-balancing policy** — the rule the cluster uses to pick one replica per request. The top-right gauge is the resulting p99 tail latency.

cluster — the group of replicas behind one destination name

load-balancing policy — the rule the cluster uses to pick a replica

Watch the proxy fan requests across the four replicas behind the `checkout-svc` cluster. The policy is round-robin — every replica gets the same share. R3 is slow (800 ms vs 50 ms). Watch R3's in-flight tower climb while the others stay flat, and the tail-latency gauge climb with it.

Implementation

Cluster.pick — round_robin

rotate the cursor; no feedback from replica state

1def round_robin(cluster):
2    i = (cluster.cursor + 1) % len(cluster.replicas)
3    cluster.cursor = i
4    return cluster.replicas[i]
5 
6# fair by request COUNT — never reads in_flight,
7# so a slow replica drains slower than it fills
8# and its queue grows without bound.

PreviousListener and route — bind and match NextTimeout and retry budget — bounded patience