Build a gRPC-style RPC framework (14 scenes)
Scene 09 · The L4 pinning trap: balance requests, not connections
HTTP/2's one long-lived connection means an L4 load balancer pins every RPC to one backend. Client-side balancing discovers all backends and picks one per request.
Previously

The interceptor chain handed our call to the transport — but the very property that made HTTP/2 great, ONE long-lived connection, means a connection-level load balancer pins all our calls to one unlucky backend.

Scene 09
The L4 pinning trap: balance requests, not connections
Diagram
A gRPC client on the left, a load balancer in the middle, and four backend pods on the right, each with a vertical load meter. The middle box's policy chip (pick_first vs round_robin) names the balancing mode. CONNECTION PINNING is what you see in L4 mode: the client's one HTTP/2 connection sticks to a single pod, so every greet("Ada") lands there and its meter pegs at 100% while the others flatline. CLIENT-SIDE LOAD BALANCING is the fix: the client itself discovers all four backend endpoints and picks one per call (round_robin), so the load meters balance.
gRPC client1 HTTP/2 conn0 RPCsone connectionL4 LB (ELB/nginx)balances CONNECTIONSpick_first→ pinned to 1 podBACKEND PODS · 4pod #10% load0 RPCs0pod #20% load0 RPCs0pod #3100% load0 RPCsPINNEDpod #40% load0 RPCs0One HTTP/2 connection pinned to pod #3 — every greet("Ada") lands there.
A quick refresher: an RPC (a function call dressed up to run on another machine) rides exactly one HTTP/2 stream, and many streams share ONE long-lived TCP connection — that single-connection trick is what made HTTP/2 fast. Now put a load balancer in front. The kind here is an *L4* balancer (it works at the TCP layer — it sees connections, not the individual calls inside them), like a typical cloud ELB or nginx in TCP mode. Watch: the client opens ONE connection, the L4 LB sends that connection to pod #3, and then every greet("Ada") the client fires rides that same pinned connection straight to pod #3. The other three pods stay dark. When one backend gets stuck with all the work because the connection never moves, that's *connection pinning* — defined below.
Implementation
Channel.dial
what the channel resolves and connects to at startup
1def dial(target, policy):
2 if policy == "pick_first":
3 # one VIP address -> the L4 LB picks the backend
4 addrs = [resolve_one(target)]
5 else: # round_robin
6 # headless service / xDS returns every backend IP
7 addrs = resolve_all(target)
8 # one subchannel == one long-lived HTTP/2 connection
9 self.subchannels = [Subchannel(a) for a in addrs]
10 self.next = 0
Channel.pickSubchannel
which connection this one call rides
1def pickSubchannel():
2 if policy == "pick_first":
3 # only one subchannel exists -> always the same
4 return self.subchannels[0]
5 # round_robin: rotate per call across all subchannels
6 sc = self.subchannels[self.next % len(self.subchannels)]
7 self.next += 1
8 return sc
Channel.send
the call opens one HTTP/2 stream on the picked connection
1def send(method, request):
2 sc = pickSubchannel()
3 # one RPC == one HTTP/2 stream on sc's connection
4 stream = sc.conn.new_stream()
5 stream.send_headers(method)
6 stream.send_data(encode(request))
7 return await stream.recv_response()
Not sure what to ask? Tap a question — the staff engineer answers in the chat panel.