All scenes
Build Raft — consensus you can defend
12 scenes · ~84 min · build the primitive
Build your own Raft — consensus you can defend
Replicate a deterministic state machine across N servers with safety as a theorem and liveness under partial synchrony. Build the protocol from term to commit to safety proof to reads, and feel why etcd, Cockroach, and TiKV ship slightly different Rafts.
- 01Replicated state machines under FLPConsensus on a replicated log lets N deterministic state machines converge — but FLP forbids any deterministic protocol from being both safe and live in a fully asynchronous network. Raft fixes safety as a theorem, concedes liveness to partial synchrony.~7 min
- 02Term — the logical clockA term is a monotonic int that segments time, with at most one leader per term. The universal step-down rule (any server seeing T' > currentTerm becomes a follower at T') is the simplest mechanism in Raft and underpins everything else.~7 min
- 03Leader election — vote, majority, restrictionElection timeout → candidate → RequestVote → majority grants → leader. Voters grant only if the candidate's log is at-least-as-up-to-date — a placeholder predicate refined precisely in scene 6.~7 min
- 03aPre-Vote and CheckQuorum — fixing the disruptive serverA partitioned member's currentTerm runaway forces a healthy leader to step down on heal. Pre-Vote probes without bumping term; CheckQuorum self-deposes a leader that can't reach a majority. Together they close the §9.6 disruption + the partial-omission liveness hole.~7 min
- 04Log replication — AppendEntries and Log MatchingAppendEntries with prevLogIndex/prevLogTerm consistency check + nextIndex backoff inductively maintains the Log Matching Property: if two logs share an entry at (index, term), they share every preceding entry.~7 min
- 05Commit, and the Figure 8 trapAn entry is committed when stored on a majority AND at least one entry from the leader's CURRENT term is replicated to a majority. The second clause is the one that prevents Figure 8 — without it, a 'committed' entry can be overwritten by a future leader.~7 min
- 06The five invariants — proof and refinementElection Safety, Leader Append-Only, Log Matching, Leader Completeness, State Machine Safety. Election restriction (sharpened to §5.4.1) plus current-term commit imply Leader Completeness, which reduces State Machine Safety to deterministic apply.~7 min
- 07Membership changes — joint consensus and single-serverNaive swap creates disjoint majorities and risks Election Safety violation. Single-server changes (production default) preserve majority overlap by N-vs-(N±1); joint consensus C_old,new requires both majorities during transition.~7 min
- 08Snapshots — compact without violating consistencyPer-replica snapshots at applied index. (lastIncludedIndex, lastIncludedTerm) substitute for the truncated tail in the AppendEntries consistency check, so Log Matching survives compaction. InstallSnapshot ships the prefix to far-behind followers.~7 min
- 09Reads — ReadIndex and leaseNaive leader-reads break linearizability under partition. ReadIndex (commit barrier with no-op-on-election precondition) restores it; lease reads buy back the heartbeat round in exchange for a bounded-clock-skew assumption.~7 min
- 10Operational reality — pipelining, batching, and the no-op trickThe no-op-on-election rule does triple duty: Figure 8 fix, ReadIndex correctness, single-server change safety. Plus the throughput knobs (pipelining, batching) and graceful TimeoutNow handoff that separate paper Raft from deployed Raft.~7 min
- 11Design your Raft deploymentCapstone: pick election timeout, batch window, RF, read mode, membership-change strategy, snapshot cadence for three workloads — etcd metadata, Cockroach-style txn-KV with lease reads, queue manager. The verifier traces every knob back to the scene that defends it.~7 min