Build a CDC pipeline (Debezium + outbox) (12 scenes)
Scene 08 · The outbox is a contract, not a table
Write the event into a dedicated outbox table inside the same transaction as the business write — the DB transaction makes both atomic, and CDC tailing the outbox emits domain events decoupled from the business tables.
Previously
CDC of business tables couples consumers to internal table shape; the way out is to stop treating tables as the contract and start emitting domain events from a table that exists only to be published. That table is the outbox.
Scene 08
The outbox is a contract, not a table
Diagram
Center: a single BEGIN/COMMIT box wrapping two stacked INSERTs — one into the business table (`orders`), one into the **outbox** table. **Outbox** — a dedicated table whose rows exist only to be published; CDC tails it like any other table, but its semantics are 'this row IS the event.' Below the box, the WAL strip records both inserts under one xid — either both visible to the connector or neither. The Debezium connector is filtered to the outbox table; an **SMT (Single Message Transform)** — the Outbox Event Router — re-shapes each raw change event into a domain event before it lands on the topic. Recall: a **dual-write** is the fragmented commit pattern from scene 1 (DB then Kafka, no shared atomicity); a **change event** is Debezium's structured envelope per row change (scene 4); **schema evolution** is the practice of changing source DDL safely (scene 7).
A user clicks 'place order'. We need the OrderPlaced event to commit atomically with the orders row — anything less reopens the dual-write gap from scene 1. The DB already gives us atomicity for two rows in the same transaction. So write both: the business row, and an event row. That second table — the outbox — exists for one purpose: to be published. Watch the transaction commit, then watch the connector emit one domain event.
Implementation
Service.placeOrder(input)
one transaction, two INSERTs — atomic by construction
1def placeOrder(input):2 tx = db.begin() # BEGIN3 tx.execute(4 "INSERT INTO orders(id, status, amount)"5 " VALUES (?, 'pending', ?)",6 input.id, input.amount,7 )8 tx.execute(9 "INSERT INTO outbox"10 " (aggregate_id, event_type, payload)"11 " VALUES (?, 'OrderPlaced', ?)",12 input.id, domainEvent(input),13 )14 tx.commit() # both rows or neither
OutboxRouter.transform(rawEvent)
Single Message Transform — outbox row to domain event
1def transform(rawEvent):2 # rawEvent is a Debezium 'c' (insert) on the outbox table.3 row = rawEvent.after4 return KafkaRecord(5 topic = topicPrefix + row.event_type, # OrderPlaced → orders.events6 key = row.aggregate_id, # routes by aggregate7 value = row.payload, # domain event, not row delta8 headers = { 'eventId': row.id },9 )
compareTo_directCdc()
what reaches the topic — row delta vs. domain event
1# CDC of the orders table (no outbox):2# topic = db.public.orders3# key = pk(id=42)4# value = { op: 'c', after: {5# id, customer_id, status, amount, ... } }6# ↑ every column on the wire; rename leaks straight through.78# CDC of the outbox table (+ Outbox Event Router SMT):9# topic = orders.events10# key = aggregate_id11# value = OrderPlaced{ orderId, total, currency }12# ↑ domain-shaped; business-table DDL is invisible.
Service.placeOrder_dualWrite(input) # anti-pattern
the broken version from scene 1 — for contrast only
1def placeOrder_dualWrite(input):2 db.execute(3 "INSERT INTO orders(id, status, amount)"4 " VALUES (?, 'pending', ?)",5 input.id, input.amount,6 )7 # ⚠ no shared atomicity past this point8 kafka.send('orders.events',9 key=input.id,10 value=domainEvent(input))11 # crash here → row committed, event lost (or vice versa)