All scenes
Build a CDC pipeline (Debezium + outbox)
12 scenes · ~84 min · build the primitive
Build your own CDC pipeline (Debezium + outbox)
Your service writes to its DB and publishes to Kafka — and any crash between those two writes is permanent inconsistency. Build a Change Data Capture pipeline (modeled on Debezium + the outbox pattern) that closes the gap by making the database itself the event source.
- 01The dual-write trapService writes to its DB and publishes to Kafka — and any crash between those two writes is permanent inconsistency. Four scenarios, four divergences, one structural fix.~7 min
- 02Polling CDC — the lossy fixSELECT WHERE updated_at > last_seen is technically Change Data Capture but cannot see deletes, collapses intra-interval flips, and trades latency against DB load.~7 min
- 03The DB already has a logPostgres WAL, MySQL binlog — the database already keeps a durable, ordered log of every change for replication. CDC reads this log instead of the tables.~7 min
- 04Debezium tails the logDebezium is a connector that registers as a replica, decodes each WAL/binlog record into a structured change event, and emits one Kafka message per row change.~7 min
- 05Replication slot — the bookmark that fills disksA replication slot is a server-side cursor identified by an LSN that stops Postgres from recycling WAL the connector hasn't read — and an inactive slot is the #1 Debezium production failure.~7 min
- 06Snapshot, then streamDebezium runs a consistent snapshot (op=r events), records the LSN at snapshot time, then switches to streaming from that LSN — so history and live updates stitch at one seam.~7 min
- 07Schema evolution leaks the tableRaw CDC inherits the source DDL — an ALTER TABLE RENAME COLUMN silently breaks downstream unless a Schema Registry enforces a compatibility mode that rejects the change at registration.~7 min
- 08The outbox is a contract, not a tableWrite the event into a dedicated outbox table inside the same transaction as the business write — the DB transaction makes both atomic, and CDC tailing the outbox emits domain events decoupled from the business tables.~7 min
- 09Outbox cleanup — pick your poisonHard-delete after emit, tombstone + log compaction, partition-by-date drop — three trade-offs across WAL noise, race risk, and operational complexity. INSERT+DELETE same-tx collapses cleanup into the write.~7 min
- 10Ordering is per key, never globalKafka guarantees order within a partition; partition by aggregate_id keeps per-aggregate events ordered while accepting that cross-aggregate order is never preserved.~7 min
- 11At-least-once is the ceilingDebezium delivers at-least-once and Kafka EOS only covers Debezium↔Kafka — end-to-end correctness depends on the sink being idempotent, typically by deduplicating on (table, pk, lsn) or eventId.~7 min
- 12Design canvas — compose your CDC pipelineCapstone: pick a workload (search index / audit log / microservice events / read model), configure the six slots, fire failures. The verifier traces each absorbed or broken outcome back to the scene that introduced the responsible component.~7 min