The dual-write trap

Build a CDC pipeline (Debezium + outbox) (12 scenes)

Scene 01 · The dual-write trap

Service writes to its DB and publishes to Kafka — and any crash between those two writes is permanent inconsistency. Four scenarios, four divergences, one structural fix.

Scene 01

The dual-write trap

Watch

Diagram

A Service node sits center-left. Two arrows leave it — one down to a DB cylinder, one right to a Kafka topic strip — and a dotted box around the service is the only transaction it has; the two arrows visibly escape that box. **Dual-write** — two writes to two systems (the DB and Kafka here) with no shared commit, so any crash between them produces a permanent skew. **Atomicity boundary** — the region inside which a set of writes either all commit or all roll back together; the diagram shows it covering ONLY the DB write, not the publish. A vertical timeline above the arrows carries four crash markers (T1..T4); the side panel reads out 'DB says X, Kafka says Y' for the selected scenario.

Sources

Watch the healthy baseline. The service writes user.balance=100 to its DB, then publishes BalanceUpdated to Kafka — two separate writes to two separate systems. That pair-without-a-shared-transaction is the dual-write problem. As long as nothing crashes, both downstream views agree.

Implementation

Service.handleSignup(event)

the dual-write — two writes, no shared transaction

1def handleSignup(event):
2    tx = db.begin()
3    tx.execute('UPDATE users SET balance=100 ...')
4    tx.commit()                       # leg 1: DB
5    kafka.publish('BalanceUpdated',   # leg 2: Kafka
6                  {balance: 100})
7    return ok

Service.handleSignup_with_retry(event)

the 'obvious' fix — wrap publish in retry, walk into T3

1def handleSignup(event):
2    tx = db.begin()
3    tx.execute('UPDATE users SET balance=100 ...')
4    tx.commit()
5    for attempt in range(MAX_RETRIES):
6        try:
7            kafka.publish('BalanceUpdated',
8                          {balance: 100})
9            return ok
10        except AckTimeout:
11            continue              # broker may have it already
12    raise PublishFailed

// Four crash windows, one root cause

no atomicity boundary spans both legs

1# T1: commit OK, publish dies   -> event lost
2# T2: publish OK, commit fails  -> phantom event
3# T3: both OK, ack lost, retry  -> duplicate event
4# T4: two writers race          -> DB and Kafka
5#                                  disagree on order
6#
7# All four are the same bug: the service has one
8# transaction (the DB tx); the publish escapes it.

NextPolling CDC — the lossy fix