Build a workflow engine (Temporal / Airflow / Cadence style)
13 scenes · ~91 min · build the primitive

Build your own workflow engine (Temporal / Airflow / Cadence style)

A function that survives crashes, restarts, and re-deploys — and still finishes. Build a durable execution engine where workflow code is replayed deterministically from an event history, activities retry with exponential backoff, sagas compensate on failure, and the same workflow definition runs identically a year later. Internalize why 'just retry the cron job' breaks at the second step.

  1. 01
  2. 02
  3. 03
  4. 04
  5. 05
  6. 06
  7. 06a
  8. 07
  9. 08
  10. 09
  11. 10
  12. 11
  13. 12
  1. 01
    The crash that charges you twice
    A plain function keeps its progress in RAM, so a crash after step one erases it — and a naive cron retry re-runs from the top and charges the card a second time.
    ~7 min
  2. 02
    The event history is the source of truth
    Stop trusting RAM: append every step's result to a durable, append-only event history, so a crash that wipes memory leaves the record of what already happened intact.
    ~7 min
  3. 03
    Replay: re-run the code against the history
    To resume, the engine re-runs your code from the top and hands back the recorded results instead of redoing them — which only works if the code is deterministic.
    ~7 min
  4. 04
    Activities: quarantine for side effects
    Replay re-runs workflow code, so every side effect must move into an activity whose result is recorded — replay hands the result back instead of charging again.
    ~7 min
  5. 05
    Task queues: workers pull, so redeploys are safe
    The engine never pushes work; stateless workers pull tasks from a queue, so a redeploy is just 'no worker for a moment' and the task simply waits to be picked up.
    ~7 min
  6. 06
    Retries and exponential backoff
    The engine retries a failed activity on its own, widening the gap between attempts so a sick downstream can recover instead of being pinned down by a retry storm.
    ~7 min
  7. 06a
    Idempotency keys: the last hole in the double-charge
    An activity can run twice if it succeeds then crashes before recording — a stable idempotency key lets the downstream recognize the repeat and refuse the second charge.
    ~7 min
  8. 07
    Durable timers: sleep 30 days on zero compute
    A thread that sleeps for a month dies on the first crash; a durable timer records the wait as an event, so the workflow goes dormant until the engine fires the wake-up.
    ~7 min
  9. 08
    Signals and queries: the workflow as an actor
    A running workflow is an addressable actor: a signal delivers external input durably and can change its path; a query reads its state without mutating it.
    ~7 min
  10. 09
    The saga: compensate partial failure in reverse
    You can't wrap steps across services in one transaction; a saga gives each step an undo and runs the compensators for completed steps in reverse — a refund, not a rollback.
    ~7 min
  11. 10
    Child workflows and ContinueAsNew
    Child workflows isolate sub-units with their own histories, and ContinueAsNew restarts an endless workflow with a fresh history but the same ID before it hits the limit.
    ~7 min
  12. 11
    Versioning: the year-later replay
    A year-old in-flight execution can't be hot-fixed; a version gate routes old executions down the old path and new ones down the new path, so both replay deterministically.
    ~7 min
  13. 12
    Design canvas: choose the right engine
    Match each workload to its model — code-as-workflow replay, a DAG scheduler, or a state machine — and defend every choice with the scene that taught the requirement.
    ~7 min