Build a CDC pipeline (Debezium + outbox)
12 scenes · ~84 min · build the primitive

Build your own CDC pipeline (Debezium + outbox)

Your service writes to its DB and publishes to Kafka — and any crash between those two writes is permanent inconsistency. Build a Change Data Capture pipeline (modeled on Debezium + the outbox pattern) that closes the gap by making the database itself the event source.

  1. 01
  2. 02
  3. 03
  4. 04
  5. 05
  6. 06
  7. 07
  8. 08
  9. 09
  10. 10
  11. 11
  12. 12
  1. 01
    The dual-write trap
    Service writes to its DB and publishes to Kafka — and any crash between those two writes is permanent inconsistency. Four scenarios, four divergences, one structural fix.
    ~7 min
  2. 02
    Polling CDC — the lossy fix
    SELECT WHERE updated_at > last_seen is technically Change Data Capture but cannot see deletes, collapses intra-interval flips, and trades latency against DB load.
    ~7 min
  3. 03
    The DB already has a log
    Postgres WAL, MySQL binlog — the database already keeps a durable, ordered log of every change for replication. CDC reads this log instead of the tables.
    ~7 min
  4. 04
    Debezium tails the log
    Debezium is a connector that registers as a replica, decodes each WAL/binlog record into a structured change event, and emits one Kafka message per row change.
    ~7 min
  5. 05
    Replication slot — the bookmark that fills disks
    A replication slot is a server-side cursor identified by an LSN that stops Postgres from recycling WAL the connector hasn't read — and an inactive slot is the #1 Debezium production failure.
    ~7 min
  6. 06
    Snapshot, then stream
    Debezium runs a consistent snapshot (op=r events), records the LSN at snapshot time, then switches to streaming from that LSN — so history and live updates stitch at one seam.
    ~7 min
  7. 07
    Schema evolution leaks the table
    Raw CDC inherits the source DDL — an ALTER TABLE RENAME COLUMN silently breaks downstream unless a Schema Registry enforces a compatibility mode that rejects the change at registration.
    ~7 min
  8. 08
    The outbox is a contract, not a table
    Write the event into a dedicated outbox table inside the same transaction as the business write — the DB transaction makes both atomic, and CDC tailing the outbox emits domain events decoupled from the business tables.
    ~7 min
  9. 09
    Outbox cleanup — pick your poison
    Hard-delete after emit, tombstone + log compaction, partition-by-date drop — three trade-offs across WAL noise, race risk, and operational complexity. INSERT+DELETE same-tx collapses cleanup into the write.
    ~7 min
  10. 10
    Ordering is per key, never global
    Kafka guarantees order within a partition; partition by aggregate_id keeps per-aggregate events ordered while accepting that cross-aggregate order is never preserved.
    ~7 min
  11. 11
    At-least-once is the ceiling
    Debezium delivers at-least-once and Kafka EOS only covers Debezium↔Kafka — end-to-end correctness depends on the sink being idempotent, typically by deduplicating on (table, pk, lsn) or eventId.
    ~7 min
  12. 12
    Design canvas — compose your CDC pipeline
    Capstone: pick a workload (search index / audit log / microservice events / read model), configure the six slots, fire failures. The verifier traces each absorbed or broken outcome back to the scene that introduced the responsible component.
    ~7 min