Build Raft — consensus you can defend
12 scenes · ~84 min · build the primitive

Build your own Raft — consensus you can defend

Replicate a deterministic state machine across N servers with safety as a theorem and liveness under partial synchrony. Build the protocol from term to commit to safety proof to reads, and feel why etcd, Cockroach, and TiKV ship slightly different Rafts.

  1. 01
  2. 02
  3. 03
  4. 03a
  5. 04
  6. 05
  7. 06
  8. 07
  9. 08
  10. 09
  11. 10
  12. 11
  1. 01
    Replicated state machines under FLP
    Consensus on a replicated log lets N deterministic state machines converge — but FLP forbids any deterministic protocol from being both safe and live in a fully asynchronous network. Raft fixes safety as a theorem, concedes liveness to partial synchrony.
    ~7 min
  2. 02
    Term — the logical clock
    A term is a monotonic int that segments time, with at most one leader per term. The universal step-down rule (any server seeing T' > currentTerm becomes a follower at T') is the simplest mechanism in Raft and underpins everything else.
    ~7 min
  3. 03
    Leader election — vote, majority, restriction
    Election timeout → candidate → RequestVote → majority grants → leader. Voters grant only if the candidate's log is at-least-as-up-to-date — a placeholder predicate refined precisely in scene 6.
    ~7 min
  4. 03a
    Pre-Vote and CheckQuorum — fixing the disruptive server
    A partitioned member's currentTerm runaway forces a healthy leader to step down on heal. Pre-Vote probes without bumping term; CheckQuorum self-deposes a leader that can't reach a majority. Together they close the §9.6 disruption + the partial-omission liveness hole.
    ~7 min
  5. 04
    Log replication — AppendEntries and Log Matching
    AppendEntries with prevLogIndex/prevLogTerm consistency check + nextIndex backoff inductively maintains the Log Matching Property: if two logs share an entry at (index, term), they share every preceding entry.
    ~7 min
  6. 05
    Commit, and the Figure 8 trap
    An entry is committed when stored on a majority AND at least one entry from the leader's CURRENT term is replicated to a majority. The second clause is the one that prevents Figure 8 — without it, a 'committed' entry can be overwritten by a future leader.
    ~7 min
  7. 06
    The five invariants — proof and refinement
    Election Safety, Leader Append-Only, Log Matching, Leader Completeness, State Machine Safety. Election restriction (sharpened to §5.4.1) plus current-term commit imply Leader Completeness, which reduces State Machine Safety to deterministic apply.
    ~7 min
  8. 07
    Membership changes — joint consensus and single-server
    Naive swap creates disjoint majorities and risks Election Safety violation. Single-server changes (production default) preserve majority overlap by N-vs-(N±1); joint consensus C_old,new requires both majorities during transition.
    ~7 min
  9. 08
    Snapshots — compact without violating consistency
    Per-replica snapshots at applied index. (lastIncludedIndex, lastIncludedTerm) substitute for the truncated tail in the AppendEntries consistency check, so Log Matching survives compaction. InstallSnapshot ships the prefix to far-behind followers.
    ~7 min
  10. 09
    Reads — ReadIndex and lease
    Naive leader-reads break linearizability under partition. ReadIndex (commit barrier with no-op-on-election precondition) restores it; lease reads buy back the heartbeat round in exchange for a bounded-clock-skew assumption.
    ~7 min
  11. 10
    Operational reality — pipelining, batching, and the no-op trick
    The no-op-on-election rule does triple duty: Figure 8 fix, ReadIndex correctness, single-server change safety. Plus the throughput knobs (pipelining, batching) and graceful TimeoutNow handoff that separate paper Raft from deployed Raft.
    ~7 min
  12. 11
    Design your Raft deployment
    Capstone: pick election timeout, batch window, RF, read mode, membership-change strategy, snapshot cadence for three workloads — etcd metadata, Cockroach-style txn-KV with lease reads, queue manager. The verifier traces every knob back to the scene that defends it.
    ~7 min