Build a Service Mesh (Envoy / Istio style) (13 scenes)
Scene 05 · Cluster and load balancing — pick one of many
Behind one destination name is a cluster of replicas; round-robin, least-request, or ring-hash decides who serves each request. Round-robin is the wrong default under heterogeneous latency.
Previously

Picking 'one of many replicas' is its own design decision: the route's destination name resolves to a cluster, and inside that cluster the proxy applies a load-balancing policy.

Scene 05
Cluster and load balancing — pick one of many
Diagram
On the left is the proxy with a stream of incoming requests. On the right is a **cluster** — a named group of replica endpoints (here, `checkout-svc` with R1..R4); R3 is artificially slow. Above each replica is its in-flight queue tower; below each is its service-time badge. The strip under the cluster is the **load-balancing policy** — the rule the cluster uses to pick one replica per request. The top-right gauge is the resulting p99 tail latency.
TAIL LATENCY (p99)proxyload balancerREQUEST STREAMpredict firstcluster · checkout-svc4 replicas?R150 ms?R250 ms?R3800 msSLOW?R450 msLB POLICYround-robinleast-requestring-hashround-robin: fair by request count, not by work
cluster — the group of replicas behind one destination name
load-balancing policy — the rule the cluster uses to pick a replica
Watch the proxy fan requests across the four replicas behind the `checkout-svc` cluster. The policy is round-robin — every replica gets the same share. R3 is slow (800 ms vs 50 ms). Watch R3's in-flight tower climb while the others stay flat, and the tail-latency gauge climb with it.
Implementation
Cluster.pick — round_robin
rotate the cursor; no feedback from replica state
1def round_robin(cluster):
2 i = (cluster.cursor + 1) % len(cluster.replicas)
3 cluster.cursor = i
4 return cluster.replicas[i]
5
6# fair by request COUNT — never reads in_flight,
7# so a slow replica drains slower than it fills
8# and its queue grows without bound.