Build a CDC pipeline (Debezium + outbox) (12 scenes)
Scene 11 · At-least-once is the ceiling
Debezium delivers at-least-once and Kafka EOS only covers Debezium↔Kafka — end-to-end correctness depends on the sink being idempotent, typically by deduplicating on (table, pk, lsn) or eventId.
Previously
Within-key ordering is preserved end-to-end — but Debezium's delivery guarantee is at-least-once, so the consumer will see the same ordered event more than once on retries and restarts unless it is built to deduplicate.
Scene 11
At-least-once is the ceiling
Diagram
Debezium tails the database WAL and emits change events into one Kafka partition; two sinks read the same partition. **At-least-once** — Debezium's delivery guarantee: every event is delivered at least once, but may be delivered more than once on retries or restarts. **Idempotent consumer** — a sink built to handle duplicates, typically by UPSERT-ing on a stable dedup key (eventId, or (table, pk, lsn)) so applying the same event twice has the same effect as applying it once. **Change event** — one structured message per row change, carrying op (c/u/d), before, after, and source (table, lsn, ts). **LSN** — Log Sequence Number, a monotonic byte offset into the WAL that uniquely identifies a record's position.
Debezium tails the ledger table's WAL and emits one change event per row change. Both sinks agree on the running balance — they're seeing each event exactly once, in order. The interesting question (next phase) is what happens on a connector restart.
Implementation
SinkA.applyEvent # idempotent
UPSERT keyed on a stable dedup key — replays no-op.
1def apply_event(event):2 # dedup key chosen at sink config time3 if dedup_mode == 'eventId':4 key = event.eventId # outbox: SMT-assigned UUID5 else:6 key = (event.table, event.pk, event.lsn) # raw CDC78 db.execute('''9 INSERT INTO accounts(account_id, balance)10 VALUES (%s, %s)11 ON CONFLICT (account_id) DO UPDATE12 SET balance = EXCLUDED.balance13 WHERE accounts.dedup_key IS DISTINCT FROM %s14 ''', [event.pk, event.after_balance, key])
SinkB.applyEvent # non-idempotent
balance += amount — every delivery moves state.
1def apply_event(event):2 # no dedup key, no UPSERT, no fingerprint check3 db.execute('''4 UPDATE accounts5 SET balance = balance + %s6 WHERE account_id = %s7 ''', [event.amount, event.pk])8 # replay drifts balance by replayed sum
enableEosOnDebezium # the misleading switch
EOS scope ends at Kafka — does not extend to the sink.
1def configure_connector():2 # Kafka 3.3+ source-connector EOS3 connector.set('exactly.once.support', 'required')4 # Fences Debezium ↔ Kafka writes inside a transaction.56 # NOT covered by this switch:7 # - the sink's reads from Kafka8 # - the sink's writes to its own store9 # The sink is outside the transaction; it must dedup itself.