Build a wide-column store (Cassandra / DynamoDB family) (13 scenes)
Scene 11 · Heal on read, heal on a schedule
When a read sees disagreement, push the winner to stale replicas; on a cron, compare every key.
Previously
Hints expire and replicas drift cold; the cluster needs a way to find and fix divergence on its own — one mechanism for hot data on every read, one for cold data on a schedule.
Scene 11
Heal on read, heal on a schedule
Diagram
The ring at RF=3, with one featured key replicated to three successive nodes. **read repair**: when a coordinator sees replicas disagree on a read, it picks the LWW winner and asynchronously updates the stale replica — but only for the keys actually read. **anti-entropy**: a scheduled background process diffs replicas using Merkle trees (the data structure used to diff two replicas in O(log n) — each subtree carries a hash of its leaves, so mismatched roots descend straight to the differing rows) and bulk-heals every divergent key, regardless of read activity.
RF=3 — three replicas of user-42, but they don't agree
read repair → coordinator pushes LWW winner to stale replica (only this key)
Merkle tree: hash of subtree → spot mismatches in O(log n)
Watch the read on user-42. The three replicas reply with different values; the coordinator picks the latest by timestamp and pushes the winner back to the stale replica. The Merkle inset on the right shows where the two replicas disagreed — one glowing leaf.
Implementation
Coordinator.read_repair
heal the just-read key, then return to the client
1def read_repair(key, replicas):2 responses = [r.read(key) for r in replicas]3 winner = max(responses, key=lambda x: x.ts)4 # only the keys actually read get healed5 for resp in responses:6 if resp.ts < winner.ts:7 send_async(resp.replica,8 Write(key, winner.value, winner.ts))9 return winner.value
AntiEntropy.merkle_repair
diff two replicas' trees; bulk-heal every mismatched leaf
1def merkle_repair(a, b, range):2 tree_a = a.build_merkle(range)3 tree_b = b.build_merkle(range)4 diffs = diff_subtree(tree_a.root, tree_b.root)5 # diffs is the set of mismatched leaf ranges6 for leaf_range in diffs:7 rows_a = a.fetch(leaf_range)8 rows_b = b.fetch(leaf_range)9 merged = lww_merge(rows_a, rows_b)10 a.bulk_write(merged)11 b.bulk_write(merged)
AntiEntropy.diff_subtree
recursive descent — skip whole subtrees on hash match
1def diff_subtree(node_a, node_b):2 if node_a.hash == node_b.hash:3 return [] # entire subtree matches; skip4 if node_a.is_leaf:5 return [node_a.range]6 out = []7 out += diff_subtree(node_a.left, node_b.left)8 out += diff_subtree(node_a.right, node_b.right)9 return out