Build a Service Mesh (Envoy / Istio style) (13 scenes)
Scene 05 · Cluster and load balancing — pick one of many
Behind one destination name is a cluster of replicas; round-robin, least-request, or ring-hash decides who serves each request. Round-robin is the wrong default under heterogeneous latency.
Previously
Picking 'one of many replicas' is its own design decision: the route's destination name resolves to a cluster, and inside that cluster the proxy applies a load-balancing policy.
Scene 05
Cluster and load balancing — pick one of many
Diagram
On the left is the proxy with a stream of incoming requests. On the right is a **cluster** — a named group of replica endpoints (here, `checkout-svc` with R1..R4); R3 is artificially slow. Above each replica is its in-flight queue tower; below each is its service-time badge. The strip under the cluster is the **load-balancing policy** — the rule the cluster uses to pick one replica per request. The top-right gauge is the resulting p99 tail latency.
cluster — the group of replicas behind one destination name
load-balancing policy — the rule the cluster uses to pick a replica
Watch the proxy fan requests across the four replicas behind the `checkout-svc` cluster. The policy is round-robin — every replica gets the same share. R3 is slow (800 ms vs 50 ms). Watch R3's in-flight tower climb while the others stay flat, and the tail-latency gauge climb with it.
Implementation
Cluster.pick — round_robin
rotate the cursor; no feedback from replica state
1def round_robin(cluster):2 i = (cluster.cursor + 1) % len(cluster.replicas)3 cluster.cursor = i4 return cluster.replicas[i]56# fair by request COUNT — never reads in_flight,7# so a slow replica drains slower than it fills8# and its queue grows without bound.