Build a wide-column store (Cassandra / DynamoDB family) (13 scenes)
Scene 10 · Hold the write until it wakes up
When a replica is briefly unreachable, the coordinator stashes the write and replays it on return.
Previously

Partitions are dramatic; the everyday case is a single replica blipping for a few seconds — for that the cluster has a softer trick.

Scene 10
Hold the write until it wakes up
Diagram
A 5-node consistent-hash ring with replica **E** marked DOWN. The featured key's writes normally fan to RF=3 successors, but one of those successors is E. The healthy coordinator sprouts an orange **hinted handoff** envelope — a coordinator-side stash of writes destined for a temporarily unreachable replica, replayed when the replica returns. Inside the envelope, two meters: the **hints disk** (fills as the outage drags on; coordinator pays in disk pressure) and the **hint TTL** age bar — the maximum time a hint will be retained, default 3h in Cassandra. Past it, hints are discarded — replica returns 'stale forever' for those writes.
mode: with-hintABCDE (do…DEADuser-42R1R2R3HINT BUFFERcoordinator stash for EHINTS DISK0% fullAGE / TTL0.0h / 3.0h→ replay on replica returnbest-effort — coordinator-only.coordinator stashes a hint per missed write; replays on E's return.
Replica E just went DOWN. Watch the simulated outage clock tick — for every missed write, the coordinator stashes a hint envelope. The age meter creeps toward the 3h TTL. Within TTL, the envelope replays when E returns; past it, the envelope dissolves.
Implementation
Coordinator.put
the write path with the dead-replica hint branch
1def put(key, value):
2 replicas = ring.replicas_for(key)
3 for r in replicas:
4 if r.alive:
5 send_async(r, Write(key, value))
6 else:
7 # stash on local disk for later replay
8 hint_store.append(
9 Hint(target=r, write=(key, value),
10 created_at=now()),
11 )
12 return Ack # to W replicas (best-effort hints)
HintStore.replay_on_return
fires when the down replica gossips back as alive
1def replay_on_return(replica):
2 for hint in hint_store.for(replica):
3 age = now() - hint.created_at
4 if age <= max_hint_window_in_ms:
5 send(replica, hint.write)
6 hint_store.remove(hint)
7 else:
8 # past TTL — replica returns 'stale forever'
9 hint_store.remove(hint)
HintStore.sweep
background TTL sweep — discard hints past max_hint_window
1# runs continuously on the coordinator
2def sweep():
3 for hint in list(hint_store):
4 age = now() - hint.created_at
5 if age > max_hint_window_in_ms:
6 hint_store.remove(hint) # silent loss
7 sleep(hints_flush_period_ms)