Build a CDC pipeline (Debezium + outbox) (12 scenes)
Scene 02 · Polling CDC — the lossy fix
SELECT WHERE updated_at > last_seen is technically Change Data Capture but cannot see deletes, collapses intra-interval flips, and trades latency against DB load.
Previously

Dual-write is broken because there is no commit boundary covering both the DB row and the Kafka event — so the natural reach is for some way to drive the event from the DB itself. Polling the table for changes is the obvious first attempt — does it work?

Scene 02
Polling CDC — the lossy fix
Diagram
A clock on the left ticks every poll interval; the SELECT pulls rows whose updated_at exceeds last_seen, and each surviving change becomes one Kafka message. **Change Data Capture (CDC)** — any technique that turns DB writes into a stream of change events; this scene shows polling, the simplest implementation. **Polling CDC** — the SELECT-based variant: cheap to deploy, structurally lossy. **Dual-write** (from scene 1) — committing to the DB and publishing to Kafka in two separate steps with no shared atomicity boundary.
poll every 5slast_seen = t-0sSELECT WHERE updated_at > :last_seenorders tablehistory per row →r1pendingr2pendingr3pendingr4pendingemitKafka topic(waiting for poll)Polled every 5s · last_seen=0s · 0 events downstream
↓ Change Data Capture — turning DB writes into a stream of change events
We need a way to observe every row change as a stream — a single source feeding many consumers. The first attempt is the path of least resistance: a job that polls the table on an interval and ships whatever has changed.
Implementation
PollingJob.pollOnce(last_seen)
the SELECT loop — emits whatever the WHERE clause returns
1def pollOnce(last_seen):
2 rows = db.query(
3 'SELECT id, status, updated_at FROM orders'
4 ' WHERE updated_at > :last_seen',
5 last_seen=last_seen,
6 )
7 for row in rows:
8 topic.emit(rowId=row.id, status=row.status)
9 return max(r.updated_at for r in rows) or last_seen
PollingJob.detectDeletes() # impossible
the function that cannot be written with a SELECT
1def detectDeletes():
2 # A hard-deleted row has no row.
3 # SELECT ... WHERE updated_at > :last_seen
4 # cannot return what is not there.
5 # There is no DELETE-marker in the table.
6 # There is no updated_at on a non-existent row.
7 raise NotImplementedError(
8 'polling CDC cannot observe deletes',
9 )
Three structural defects
what polling CDC misses, by construction
1# 1. missing-deletes:
2# no row -> no SELECT match -> silent loss
3# 2. collapsed-flips:
4# A -> B -> A between polls -> 0 events
5# A -> B -> C between polls -> 1 event (C only)
6# 3. latency-floor:
7# end_to_end_latency >= poll_interval
8# shrinking it costs DB load on every poll