Build your own workflow engine (Temporal / Airflow / Cadence style)

A function that survives crashes, restarts, and re-deploys — and still finishes. Build a durable execution engine where workflow code is replayed deterministically from an event history, activities retry with exponential backoff, sagas compensate on failure, and the same workflow definition runs identically a year later. Internalize why 'just retry the cron job' breaks at the second step.

01
The crash that charges you twice
A plain function keeps its progress in RAM, so a crash after step one erases it — and a naive cron retry re-runs from the top and charges the card a second time.
~7 min
02
The event history is the source of truth
Stop trusting RAM: append every step's result to a durable, append-only event history, so a crash that wipes memory leaves the record of what already happened intact.
~7 min
03
Replay: re-run the code against the history
To resume, the engine re-runs your code from the top and hands back the recorded results instead of redoing them — which only works if the code is deterministic.
~7 min
04
Activities: quarantine for side effects
Replay re-runs workflow code, so every side effect must move into an activity whose result is recorded — replay hands the result back instead of charging again.
~7 min
05
Task queues: workers pull, so redeploys are safe
The engine never pushes work; stateless workers pull tasks from a queue, so a redeploy is just 'no worker for a moment' and the task simply waits to be picked up.
~7 min
06
Retries and exponential backoff
The engine retries a failed activity on its own, widening the gap between attempts so a sick downstream can recover instead of being pinned down by a retry storm.
~7 min
06a
Idempotency keys: the last hole in the double-charge
An activity can run twice if it succeeds then crashes before recording — a stable idempotency key lets the downstream recognize the repeat and refuse the second charge.
~7 min
07
Durable timers: sleep 30 days on zero compute
A thread that sleeps for a month dies on the first crash; a durable timer records the wait as an event, so the workflow goes dormant until the engine fires the wake-up.
~7 min
08
Signals and queries: the workflow as an actor
A running workflow is an addressable actor: a signal delivers external input durably and can change its path; a query reads its state without mutating it.
~7 min
09
The saga: compensate partial failure in reverse
You can't wrap steps across services in one transaction; a saga gives each step an undo and runs the compensators for completed steps in reverse — a refund, not a rollback.
~7 min
10
Child workflows and ContinueAsNew
Child workflows isolate sub-units with their own histories, and ContinueAsNew restarts an endless workflow with a fresh history but the same ID before it hits the limit.
~7 min
11
Versioning: the year-later replay
A year-old in-flight execution can't be hot-fixed; a version gate routes old executions down the old path and new ones down the new path, so both replay deterministically.
~7 min
12
Design canvas: choose the right engine
Match each workload to its model — code-as-workflow replay, a DAG scheduler, or a state machine — and defend every choice with the scene that taught the requirement.
~7 min