Build Kafka (13 scenes)
Scene 04 · Replication — ISR is not a quorum
Why a write commits when the in-sync set fetches it, not a majority.
Previously
Each partition lives on multiple brokers. The question is: when is a write safe enough to acknowledge? Kafka's answer is the In-Sync Replica set — and it's a moving target, not a fixed quorum vote.
Scene 04
Replication — ISR is not a quorum
Diagram
One partition's leader on the left, two followers on the right, each with its own log cells. Followers fetch from the leader and the High Water Mark advances only when EVERY broker currently in the In-Sync Replica set has caught up. A follower that falls behind is evicted from the ISR — making the bar move again — instead of blocking the leader.
Watch the producer append records to the leader. Both followers fetch and their LEOs catch up. The leader's High Water Mark — the offset consumers can read — slides up to the slowest in-ISR follower's LEO, never past it.
Implementation
Follower.fetch
the fetch loop running on every follower broker
1loop forever:2 resp = leader.fetch(3 partition = p,4 fromOffset = self.leo,5 )6 for record in resp.records:7 self.log.append(record)8 self.leo += 19 # tells the leader 'I have everything up to leo'10 leader.recordFetchPosition(self.id, self.leo)11 sleep(replica.fetch.backoff.ms)
Leader.tryAdvanceHW
after every fetch the leader recomputes the high water mark
1def tryAdvanceHW():2 # ISR is the set of replicas currently caught up.3 # 'all' here means all of THESE, not all RF.4 inSyncLEOs = [5 leo for replicaId, leo in fetchPositions.items()6 if replicaId in ISR7 ]8 newHW = min(inSyncLEOs)9 if newHW > self.hw:10 self.hw = newHW11 notifyConsumers(self.hw) # records become readable
Leader.maybeShrinkISR
evicting a slow follower instead of blocking the HW
1# runs continuously on the leader2def maybeShrinkISR():3 for replicaId in list(ISR):4 lastFetch = fetchTimestamps[replicaId]5 if now() - lastFetch > replica.lag.time.max.ms:6 ISR.remove(replicaId)7 controller.notifyISRShrink(8 partition, ISR,9 )10 # HW recomputed against the smaller set11 tryAdvanceHW()
Not sure what to ask? Tap a question — the staff engineer answers in the chat panel.