Build Raft — consensus you can defend (12 scenes)
Scene 01 · Replicated state machines under FLP
Consensus on a replicated log lets N deterministic state machines converge — but FLP forbids any deterministic protocol from being both safe and live in a fully asynchronous network. Raft fixes safety as a theorem, concedes liveness to partial synchrony.
Scene 01
Replicated state machines under FLP
Diagram
You're looking at three servers — SM_A, SM_B, SM_C — each running the exact same program (a deterministic function: same inputs in the same order always produce the same output). In the middle, a shared to-do list of commands they all execute: [SET x=1, SET x=2, INC x]. The systems-design term for this whole picture is a **replicated state machine**. Arrows show the network delivering commands; when the network-condition slider stalls one arrow, a banner appears explaining why the other servers cannot tell whether SM_C is slow or has crashed. On the right, each server shows its current value as a small key:value table.
STATE MACHINESDELIVERY CHANNELSHARED COMMAND LOGSET x=1SET x=2INC x3/3 SHOWNSM_A✓ LIVEAPPLIED LOG12(no commands applied yet)STATE(empty)SM_B✓ LIVEAPPLIED LOG12(no commands applied yet)STATE(empty)SM_C✓ LIVEAPPLIED LOG12(no commands applied yet)STATE(empty)Async network: OFF·Partial synchrony: OFFThree servers, one shared to-do list of commands. Each server runs the same commands in the same order, so each server ends up in…
Suppose you run a service that has to stay up even if one or two of its servers crash. You can't trust a single machine — machines die, disks fail, networks drop packets — so you keep several copies (servers) and you need all the surviving copies to give the client the same answer. The tricky part is that the network between them is not reliable: messages get delayed, reordered, or lost, and a server that has gone silent could be either crashed or just slow. The systems-design name for the picture you're about to watch is a **replicated state machine** — three identical copies of a deterministic program (one whose output depends only on its inputs, never on a clock or a random number), each fed the SAME list of commands in the SAME order. That list is called a **log** in the systems sense: an append-only, totally ordered sequence of commands. When the same deterministic program processes the same log, every copy must end up in the same state. Watch the three commands [SET x=1, SET x=2, INC x] apply on all three servers — they converge to x = 3.
Implementation
Replica.applyLoop() — the replicated state machine model
every replica runs THIS loop; determinism + same log ⇒ same state
1# Each replica holds:
2# log — append-only, totally ordered commands
3# stateMachine — DETERMINISTIC: (state, cmd) -> state'
4# lastApplied — index of the last command fed to stateMachine
5
6def applyLoop():
7 while True:
8 # block until log[lastApplied + 1] is committed
9 entry = waitForCommitted(log, lastApplied + 1)
10 stateMachine.apply(entry.command)
11 lastApplied += 1
12 # invariant: same log prefix, same lastApplied,
13 # same stateMachine state — on every replica.
Replica.onTimeout() — why FLP forbids a deterministic decision
a slow replica is indistinguishable from a crashed one
1# FLP (Fischer, Lynch, Paterson 1985):
2# no deterministic protocol can guarantee BOTH safety AND
3# liveness in a fully asynchronous network, even with one
4# crash failure. Below is the shape of WHY.
5
6on receive heartbeat from leader:
7 reset election_timeout
8
9on election_timeout fires:
10 # The protocol cannot distinguish:
11 # leader is SLOW (timeout was too aggressive — wait)
12 # leader is CRASHED (timeout was correct — elect new)
13 # Any deterministic resolution in finite time can be
14 # made WRONG by an adversarial scheduler.
15 start_election()