Build a workflow engine (Temporal / Airflow / Cadence style) (13 scenes)
Scene 09 · The saga: compensate partial failure in reverse
You can't wrap steps across services in one transaction; a saga gives each step an undo and runs the compensators for completed steps in reverse — a refund, not a rollback.
Previously

The saga safely backs ORDER #1001 out of partial work, refunding and releasing in reverse. But we've kept ORDER #1001 as ONE giant workflow with one growing history — and real orders split across warehouses, run for months, and bill forever. A single history can't grow without bound, and one monolithic workflow is hard to reason about. How do we compose and how do we run forever?

Scene 09
The saga: compensate partial failure in reverse
Diagram
ORDER #1001's steps stack as a tower. A 'saga' is a sequence of steps, each paired with a compensating action that semantically undoes it (RefundCard undoes ChargeCard; ReleaseInventory undoes ReserveInventory). When a step fails partway, the tower un-builds in REVERSE order, running the compensators for the steps that already completed. 'Compensation (semantic undo)' is the key point: a refund is not a rollback — the history shows both the charge AND the refund; it restores an acceptable approximation, never the exact prior byte-state. Compensators can fail too, so they must be retryable.
Sources
SAGA — ORDER #1001tower builds clean — no failureFAILURE POINTnoneat Shipat ReserveChargeCard$42ReserveInventory1 × WidgetShipPackagewarehouse → doorSendConfirmationEma…to customersagaRefundCardundoes ChargeCardReleaseInventoryundoes ReserveInvent…EVENT HISTORY — append-only · nothing erased#1WorkflowStarted(Order 1001)A refund is a new event, not an eraser: the history keeps the charge AND the refund.
ORDER #1001 has four steps that live on four different services: ChargeCard ($42), ReserveInventory (one Widget), ShipPackage, then SendConfirmationEmail. Watch them stack into a tower as each one completes — first the charge clears, then the Widget is reserved. Then ShipPackage can't complete: the warehouse is out of stock, permanently. Now you're stuck in the worst place: you've already taken $42 and reserved a Widget, but you can't ship — and there is no single database transaction wrapping these four separate services that you could simply roll back. Notice the ghost block sitting next to each completed step — RefundCard beside ChargeCard, ReleaseInventory beside ReserveInventory. That's a sequence of steps where every step carries its own undo. The systems-design name for modeling a long operation this way — local steps, each with a compensating action — is a **saga** (Garcia-Molina & Salem, 1987): the practical substitute for a distributed transaction when you can't get one.
Implementation
OrderWorkflow.run
run the steps forward, compensate on partial failure
1def run(order):
2 done = [] # completed steps, in order
3 for step in [Charge, Reserve, Ship, Email]:
4 try:
5 step.do(order) # an activity on its service
6 done.append(step)
7 except StepFailed:
8 compensate(done) # reverse-undo what completed
9 raise
Saga.compensate
undo the completed steps, newest first
1def compensate(done):
2 for step in reversed(done): # backward recovery
3 if step.compensator is None:
4 continue # no clean undo (Ship/Email)
5 compensateStep(step.compensator)
Saga.compensateStep
a compensator is just another retryable activity
1def compensateStep(comp):
2 delay = 1.0 # initialInterval
3 while True:
4 try:
5 comp.do(idempotency_key) # semantic undo, dedup-safe
6 return # append Compensated(comp)
7 except ActivityFailed:
8 sleep(delay)
9 delay *= 2.0 # backoffCoefficient
Not sure what to ask? Tap a question — the staff engineer answers in the chat panel.