Build Raft — consensus you can defend (12 scenes)
Scene 09 · Reads — ReadIndex and lease
Naive leader-reads break linearizability under partition. ReadIndex (commit barrier with no-op-on-election precondition) restores it; lease reads buy back the heartbeat round in exchange for a bounded-clock-skew assumption.
Previously

Scene 8 closed the safety arc — committed entries survive crashes, membership changes are safe, and log truncation preserves Log Matching. The READ path looks like it should already be safe (it's the leader answering, after all) — but the next surprise is that a naive read from 'the leader' silently breaks linearizability under a network partition.

Scene 09
Reads — ReadIndex and lease
Diagram
A 3-server cluster (S1 = the partitioned old leader, S2 = the new real leader, S3 = follower) with a partition wall between S1 and {S2, S3}. The slider picks which read protocol is in use. NAIVE: S1 answers from local memory — STALE chip lights up. READINDEX: S2 captures its commitIndex as a barrier, runs a heartbeat round to prove it's still leader, waits for its state machine to catch up, then answers — SAFE chip; gated on the no-op-on-election bar that scene 10 explains. LEASE: S2 holds a green time-bar lease; reads inside the bar cost zero RPCs, but a GC-pause toggle lets the bar overshoot real wall-clock and a stale read sneaks through.
(A) Naive(B) ReadIndex(C) LeaseSTALE READ SERVED — INVARIANT VIOLATEDS1STALE LEADERT:4commit:100applied:100S2LEADERT:5commit:103applied:103✓ no-opS3FOLLOWERT:5commit:103applied:103✓ no-op🚫READ x → S1clientHere's something that catches even experienced engineers off guard. Reading from the Raft leader sounds safe — but a partitioned …FRESH read · lease green windowSTALE / clock violatedAppendEntries (heartbeat)partitioned server
Here's something that catches even experienced engineers off guard. Suppose you're using etcd to look up a config value, and you ask the leader. The leader answers from its own memory. Sounds safe — it's the leader, right? Now imagine the network partitioned the leader (call it S1) from a majority of the cluster, and a new leader (S2) has already been elected on the other side. Our partitioned 'leader' S1 doesn't know it's been deposed yet — its election timeout hasn't fired, so by its own clock it's still in charge. It happily answers our read with a stale value while a newer write has already been committed by the new leader. We just got a **stale-leader read** — and we broke **linearizability** (the property that every read sees the result of every write that completed before it, as if the whole system were a single machine processing one operation at a time). Watch S1 hand the client `x=1` while S2 has already committed `x=2` on {S2, S3}.
Implementation
Leader.read (naive)
the broken baseline — read from local state, no quorum check
1# NAIVE: leader serves reads from local state with no quorum check.
2def on_client_read(key) (leader):
3 return state_machine.get(key)
4
5# BUG: a partitioned old leader still believes it leads.
6# Its election timeout hasn't fired yet, so it has no way to
7# know a new leader has already been elected and committed
8# a newer write. Stale read served. Linearizability broken.
Leader.read (ReadIndex)
§6.4 commit-barrier read, gated on no-op-on-election
1def on_client_read(key) (leader):
2 # precondition: leader must have committed an entry of its
3 # CURRENT term (the no-op-on-election). Otherwise commitIndex
4 # may be stale and ReadIndex would under-report.
5 if not has_committed_entry_in_current_term():
6 return defer # buffered in pendingReadIndexMessages
7 read_index = commit_index # 1. snapshot barrier
8 acks = { self } # 2. confirm leadership
9 for peer in cluster_minus_self:
10 send AppendEntries(heartbeat) -> peer
11 wait until |acks| >= majority
12 wait until last_applied >= read_index # 3. apply barrier
13 return state_machine.get(key) # 4. serve locally
Leader.read (lease)
§6.4.1 trade an RPC round for a clock-bound assumption
1# Lease refresh: at every successful heartbeat-round ack.
2def on_heartbeat_round_acked() (leader):
3 lease_expires_at = monotonic_now()
4 + election_timeout
5 - clock_skew_bound
6
7def on_client_read(key) (leader):
8 if monotonic_now() < lease_expires_at:
9 return state_machine.get(key) # zero RPCs
10 else:
11 return read_index_serve(key) # fall back
12
13# ASSUMPTION: clock skew is bounded. A GC pause, fsync stall,
14# or VM steal can let one replica's monotonic clock outrun
15# another's — a new leader is elected BEFORE the old leader's
16# lease expires from its own perspective, and a stale read
17# sneaks through. CockroachDB ties leases to Raft leadership
18# specifically to bound this risk.