Build Raft — consensus you can defend (12 scenes)
Scene 01 · Replicated state machines under FLP
Consensus on a replicated log lets N deterministic state machines converge — but FLP forbids any deterministic protocol from being both safe and live in a fully asynchronous network. Raft fixes safety as a theorem, concedes liveness to partial synchrony.
Scene 01
Replicated state machines under FLP
Diagram
You're looking at three servers — SM_A, SM_B, SM_C — each running the exact same program (a deterministic function: same inputs in the same order always produce the same output). In the middle, a shared to-do list of commands they all execute: [SET x=1, SET x=2, INC x]. The systems-design term for this whole picture is a **replicated state machine**. Arrows show the network delivering commands; when the network-condition slider stalls one arrow, a banner appears explaining why the other servers cannot tell whether SM_C is slow or has crashed. On the right, each server shows its current value as a small key:value table.
Sources
Suppose you run a service that has to stay up even if one or two of its servers crash. You can't trust a single machine — machines die, disks fail, networks drop packets — so you keep several copies (servers) and you need all the surviving copies to give the client the same answer. The tricky part is that the network between them is not reliable: messages get delayed, reordered, or lost, and a server that has gone silent could be either crashed or just slow.
The systems-design name for the picture you're about to watch is a **replicated state machine** — three identical copies of a deterministic program (one whose output depends only on its inputs, never on a clock or a random number), each fed the SAME list of commands in the SAME order. That list is called a **log** in the systems sense: an append-only, totally ordered sequence of commands. When the same deterministic program processes the same log, every copy must end up in the same state. Watch the three commands [SET x=1, SET x=2, INC x] apply on all three servers — they converge to x = 3.
Implementation
Replica.applyLoop() — the replicated state machine model
every replica runs THIS loop; determinism + same log ⇒ same state
1# Each replica holds:2# log — append-only, totally ordered commands3# stateMachine — DETERMINISTIC: (state, cmd) -> state'4# lastApplied — index of the last command fed to stateMachine56def applyLoop():7 while True:8 # block until log[lastApplied + 1] is committed9 entry = waitForCommitted(log, lastApplied + 1)10 stateMachine.apply(entry.command)11 lastApplied += 112 # invariant: same log prefix, same lastApplied,13 # same stateMachine state — on every replica.
Replica.onTimeout() — why FLP forbids a deterministic decision
a slow replica is indistinguishable from a crashed one
1# FLP (Fischer, Lynch, Paterson 1985):2# no deterministic protocol can guarantee BOTH safety AND3# liveness in a fully asynchronous network, even with one4# crash failure. Below is the shape of WHY.56on receive heartbeat from leader:7 reset election_timeout89on election_timeout fires:10 # The protocol cannot distinguish:11 # leader is SLOW (timeout was too aggressive — wait)12 # leader is CRASHED (timeout was correct — elect new)13 # Any deterministic resolution in finite time can be14 # made WRONG by an adversarial scheduler.15 start_election()