Build a CDC pipeline (Debezium + outbox) (12 scenes)
Scene 11 · At-least-once is the ceiling
Debezium delivers at-least-once and Kafka EOS only covers Debezium↔Kafka — end-to-end correctness depends on the sink being idempotent, typically by deduplicating on (table, pk, lsn) or eventId.
Previously

Within-key ordering is preserved end-to-end — but Debezium's delivery guarantee is at-least-once, so the consumer will see the same ordered event more than once on retries and restarts unless it is built to deduplicate.

Scene 11
At-least-once is the ceiling
Diagram
Debezium tails the database WAL and emits change events into one Kafka partition; two sinks read the same partition. **At-least-once** — Debezium's delivery guarantee: every event is delivered at least once, but may be delivered more than once on retries or restarts. **Idempotent consumer** — a sink built to handle duplicates, typically by UPSERT-ing on a stable dedup key (eventId, or (table, pk, lsn)) so applying the same event twice has the same effect as applying it once. **Change event** — one structured message per row change, carrying op (c/u/d), before, after, and source (table, lsn, ts). **LSN** — Log Sequence Number, a monotonic byte offset into the WAL that uniquely identifies a record's position.
Debezium c…tails WAL ·…topic: ledger.eventsat-least-once deliveryPARTITIONS · 1P0→ ledger groupCONSUMER GROUP · LEDGERtwo sinks reading the same partitionSink A · UPSERTidempotent · dedup by eventId · balance = 0Sink B · balance += amountnon-idempotent · balance = 0
Debezium tails the ledger table's WAL and emits one change event per row change. Both sinks agree on the running balance — they're seeing each event exactly once, in order. The interesting question (next phase) is what happens on a connector restart.
Implementation
SinkA.applyEvent # idempotent
UPSERT keyed on a stable dedup key — replays no-op.
1def apply_event(event):
2 # dedup key chosen at sink config time
3 if dedup_mode == 'eventId':
4 key = event.eventId # outbox: SMT-assigned UUID
5 else:
6 key = (event.table, event.pk, event.lsn) # raw CDC
7
8 db.execute('''
9 INSERT INTO accounts(account_id, balance)
10 VALUES (%s, %s)
11 ON CONFLICT (account_id) DO UPDATE
12 SET balance = EXCLUDED.balance
13 WHERE accounts.dedup_key IS DISTINCT FROM %s
14 ''', [event.pk, event.after_balance, key])
SinkB.applyEvent # non-idempotent
balance += amount — every delivery moves state.
1def apply_event(event):
2 # no dedup key, no UPSERT, no fingerprint check
3 db.execute('''
4 UPDATE accounts
5 SET balance = balance + %s
6 WHERE account_id = %s
7 ''', [event.amount, event.pk])
8 # replay drifts balance by replayed sum
enableEosOnDebezium # the misleading switch
EOS scope ends at Kafka — does not extend to the sink.
1def configure_connector():
2 # Kafka 3.3+ source-connector EOS
3 connector.set('exactly.once.support', 'required')
4 # Fences Debezium ↔ Kafka writes inside a transaction.
5
6 # NOT covered by this switch:
7 # - the sink's reads from Kafka
8 # - the sink's writes to its own store
9 # The sink is outside the transaction; it must dedup itself.