Build Kafka (13 scenes)
Scene 05 · Durability is four knobs, not one
acks, min.insync.replicas, RF, unclean — and how 'all' silently means one.
Previously

ISR decides commit, the controller decides leadership. Now the four knobs that decide what 'durable' actually means — and how a wrong combination turns 'acks=all' into 'acks=one' without any error.

Scene 05
Durability is four knobs, not one
Diagram
A producer on the left sending writes to a partition that has N replicas across brokers. Four knobs are exposed as side controls — acks (0/1/all), min.insync.replicas, RF (replication factor), unclean.leader.election. The diagram lights up the failure path that opens when the chosen combination silently degrades — for example, ISR shrinking to {leader} when min.insync.replicas=1.
Producersend(record)Brokeracks=all · MIR=2 · RF=3 · unclean=offPARTITIONS · 3LeaderLEADER · in ISRFollower 1follower · in ISRFollower 2follower · in ISRDOWNSTREAM CONSUMERwhat would survive a leader crash right nowConsumerreads up to HW (safe)
Default config: RF=3, min.insync.replicas=2, acks=all, unclean.leader.election=off. Producer sends, all three replicas append, leader acks. The downstream reader sees a fully replicated log — a leader crash here loses zero acked writes.
Implementation
Producer.send(record, acks=all)
the producer-side wait loop — block until the broker acks
1def send(record):
2 leader = metadata.leaderFor(record.partition)
3 resp = leader.produce(record, acks=cfg.acks)
4 if cfg.acks == 0:
5 return # fire-and-forget, no wait
6 if cfg.acks == 1:
7 return resp # leader-only ack
8 # acks == 'all': leader replies only after the
9 # broker-side commit gate has accepted the record.
10 return resp
Broker.onProduce(record)
the commit gate — reject before silently degrading
1def onProduce(record):
2 self.log.append(record) # leader appends first
3 if cfg.acks != 'all':
4 return Ack() # no ISR check on acks=0/1
5 if len(ISR) < cfg.min_insync_replicas:
6 raise NotEnoughReplicas(
7 isr=len(ISR), required=cfg.min_insync_replicas,
8 )
9 waitForFetch(record.offset, replicas=ISR)
10 self.hw = min(leo[r] for r in ISR)
11 return Ack()
Controller.maybeUncleanElect(partition)
the gate that lets a stale replica win an empty-ISR election
1def maybeUncleanElect(partition):
2 if len(ISR) > 0:
3 return promote(pickFromISR())
4 if not cfg.unclean_leader_election_enable:
5 partition.state = UNAVAILABLE
6 return None # KIP-106 default
7 candidate = pickAnyLiveReplica()
8 candidate.epoch += 1 # bumps leader epoch
9 return promote(candidate)
Not sure what to ask? Tap a question — the staff engineer answers in the chat panel.