Build Redis (10 scenes)
Scene 06 · Replication is async — acked writes can vanish
PSYNC, replication backlog, and the AP-not-CP gotcha: WAIT doesn't fix it; min-replicas-to-write does (at the cost of unavailability).
Previously
One node, however well-tuned, is a single point of failure. The first answer is to keep a copy on another node — but that copy lags behind, and that lag is the whole story of this scene.
Scene 06
Replication is async — acked writes can vanish
Diagram
Master on the left, two replicas on the right. Writes flow into the master, get acked to the client immediately, then propagate down a replication stream to each replica with a small lag offset shown above each link. A backlog buffer on the master shows what a reconnecting replica can pull (partial resync) before falling back to a full RDB transfer.
Master accepts writes; the replication backlog ring fills; PSYNC streams keep both replicas' offsets within a tick of the master. This is the steady state — and the trap.
Implementation
Replica.psync
(replication-id, offset) handshake; partial vs full
1def psync():2 repl_id, offset = self.replicationId, self.offset3 resp = master.PSYNC(repl_id, offset)4 if resp == CONTINUE:5 # backlog still covers our gap6 partial_resync() # stream missed bytes7 else: # FULLRESYNC <new_id> <new_offset>8 self.replicationId = resp.new_id9 full_resync_via_rdb() # fork + ship RDB10 stream_replication_link()
Master.handleWrite
append locally, ack the client, then replicate
1def handleWrite(cmd):2 self.applyToDataset(cmd)3 self.offset += len(cmd)4 self.appendToBacklog(cmd)5 replyOK(client) # ack now6 for r in self.replicas:7 r.send(cmd) # fire-and-forget