Build a gRPC-style RPC framework (14 scenes)
Scene 09 · The L4 pinning trap: balance requests, not connections
HTTP/2's one long-lived connection means an L4 load balancer pins every RPC to one backend. Client-side balancing discovers all backends and picks one per request.
Previously
The interceptor chain handed our call to the transport — but the very property that made HTTP/2 great, ONE long-lived connection, means a connection-level load balancer pins all our calls to one unlucky backend.
Scene 09
The L4 pinning trap: balance requests, not connections
Diagram
A gRPC client on the left, a load balancer in the middle, and four backend pods on the right, each with a vertical load meter. The middle box's policy chip (pick_first vs round_robin) names the balancing mode. CONNECTION PINNING is what you see in L4 mode: the client's one HTTP/2 connection sticks to a single pod, so every greet("Ada") lands there and its meter pegs at 100% while the others flatline. CLIENT-SIDE LOAD BALANCING is the fix: the client itself discovers all four backend endpoints and picks one per call (round_robin), so the load meters balance.
A quick refresher: an RPC (a function call dressed up to run on another machine) rides exactly one HTTP/2 stream, and many streams share ONE long-lived TCP connection — that single-connection trick is what made HTTP/2 fast. Now put a load balancer in front. The kind here is an *L4* balancer (it works at the TCP layer — it sees connections, not the individual calls inside them), like a typical cloud ELB or nginx in TCP mode. Watch: the client opens ONE connection, the L4 LB sends that connection to pod #3, and then every greet("Ada") the client fires rides that same pinned connection straight to pod #3. The other three pods stay dark. When one backend gets stuck with all the work because the connection never moves, that's *connection pinning* — defined below.
Implementation
Channel.dial
what the channel resolves and connects to at startup
1def dial(target, policy):2 if policy == "pick_first":3 # one VIP address -> the L4 LB picks the backend4 addrs = [resolve_one(target)]5 else: # round_robin6 # headless service / xDS returns every backend IP7 addrs = resolve_all(target)8 # one subchannel == one long-lived HTTP/2 connection9 self.subchannels = [Subchannel(a) for a in addrs]10 self.next = 0
Channel.pickSubchannel
which connection this one call rides
1def pickSubchannel():2 if policy == "pick_first":3 # only one subchannel exists -> always the same4 return self.subchannels[0]5 # round_robin: rotate per call across all subchannels6 sc = self.subchannels[self.next % len(self.subchannels)]7 self.next += 18 return sc
Channel.send
the call opens one HTTP/2 stream on the picked connection
1def send(method, request):2 sc = pickSubchannel()3 # one RPC == one HTTP/2 stream on sc's connection4 stream = sc.conn.new_stream()5 stream.send_headers(method)6 stream.send_data(encode(request))7 return await stream.recv_response()
Not sure what to ask? Tap a question — the staff engineer answers in the chat panel.