The L4 pinning trap: balance requests, not connections

Build a gRPC-style RPC framework (14 scenes)

Scene 09 · The L4 pinning trap: balance requests, not connections

HTTP/2's one long-lived connection means an L4 load balancer pins every RPC to one backend. Client-side balancing discovers all backends and picks one per request.

Previously

The interceptor chain handed our call to the transport — but the very property that made HTTP/2 great, ONE long-lived connection, means a connection-level load balancer pins all our calls to one unlucky backend.

Scene 09

The L4 pinning trap: balance requests, not connections

Watch

Diagram

A gRPC client on the left, a load balancer in the middle, and four backend pods on the right, each with a vertical load meter. The middle box's policy chip (pick_first vs round_robin) names the balancing mode. CONNECTION PINNING is what you see in L4 mode: the client's one HTTP/2 connection sticks to a single pod, so every greet("Ada") lands there and its meter pegs at 100% while the others flatline. CLIENT-SIDE LOAD BALANCING is the fix: the client itself discovers all four backend endpoints and picks one per call (round_robin), so the load meters balance.

Sources

A quick refresher: an RPC (a function call dressed up to run on another machine) rides exactly one HTTP/2 stream, and many streams share ONE long-lived TCP connection — that single-connection trick is what made HTTP/2 fast. Now put a load balancer in front. The kind here is an *L4* balancer (it works at the TCP layer — it sees connections, not the individual calls inside them), like a typical cloud ELB or nginx in TCP mode. Watch: the client opens ONE connection, the L4 LB sends that connection to pod #3, and then every greet("Ada") the client fires rides that same pinned connection straight to pod #3. The other three pods stay dark. When one backend gets stuck with all the work because the connection never moves, that's *connection pinning* — defined below.

Implementation

Channel.dial

what the channel resolves and connects to at startup

1def dial(target, policy):
2    if policy == "pick_first":
3        # one VIP address -> the L4 LB picks the backend
4        addrs = [resolve_one(target)]
5    else:  # round_robin
6        # headless service / xDS returns every backend IP
7        addrs = resolve_all(target)
8    # one subchannel == one long-lived HTTP/2 connection
9    self.subchannels = [Subchannel(a) for a in addrs]
10    self.next = 0

Channel.pickSubchannel

which connection this one call rides

1def pickSubchannel():
2    if policy == "pick_first":
3        # only one subchannel exists -> always the same
4        return self.subchannels[0]
5    # round_robin: rotate per call across all subchannels
6    sc = self.subchannels[self.next % len(self.subchannels)]
7    self.next += 1
8    return sc

Channel.send

the call opens one HTTP/2 stream on the picked connection

1def send(method, request):
2    sc = pickSubchannel()
3    # one RPC == one HTTP/2 stream on sc's connection
4    stream = sc.conn.new_stream()
5    stream.send_headers(method)
6    stream.send_data(encode(request))
7    return await stream.recv_response()

Not sure what to ask? Tap a question — the staff engineer answers in the chat panel.

PreviousInterceptors: the middleware onion NextFlow control: a slow reader slows the writer