Leader epoch — the vector clock that fixes truncation

Scene 06 · Leader epoch — the vector clock that fixes truncation

Why HW-based truncation could silently lose acked writes, and how KIP-101 closed the gap.

Previously

ISR explains commit on the happy path. But what happens at failover, when followers must reconcile divergent logs? Pre-KIP-101 used the High Water Mark and silently lost data — the leader epoch is the fix.

Scene 06

Leader epoch — the vector clock that fixes truncation

Watch

Diagram

A leader and two followers, each log cell stamped with a monotonic 'leader epoch' alongside its offset. When a leader fails and a follower takes over, the new leader bumps the epoch; the other followers consult an Epoch Cache to find the precise truncation point — instead of blindly trimming to the High Water Mark, which used to silently lose acked writes.

Sources

Leader B is at epoch e1. Producer writes m1, m2; follower A fetches them. Watch A's LEO catch up to 2 while its local HW stays at 0 — HW updates piggyback on A's NEXT fetch, so the follower's HW always lags the leader's by one round-trip.

Implementation

Follower.onBecomeFollower (pre-KIP-101)

the buggy version: truncate to local HW

1def onBecomeFollower():
2    # local HW lags leader HW by one fetch RTT
3    # — at restart it can be arbitrarily stale
4    self.log.truncateTo(self.hw)
5    self.leo = self.hw
6    # resume fetching from the new leader
7    loop:
8        resp = leader.fetch(fromOffset=self.leo)
9        self.log.append(resp.records)
10        self.leo += len(resp.records)

Not sure what to ask? Tap a question — the staff engineer answers in the chat panel.

PreviousLog compaction — keep the last value per key NextRebalance — stop-the-world vs. cooperative