Build a workflow engine (Temporal / Airflow / Cadence style) (13 scenes)
Scene 07 · Durable timers: sleep 30 days on zero compute
A thread that sleeps for a month dies on the first crash; a durable timer records the wait as an event, so the workflow goes dormant until the engine fires the wake-up.
Previously
We've now survived crashes at every step of a FAST order. But real orders aren't fast — ORDER #1001 must wait 7 days for the 'rate this product' email. A thread that sleeps for a week dies on the first crash in that week. So the engine records the wait as a durable event in the history and the workflow goes completely dormant, using zero compute, until the engine fires the wake-up event. Time becomes a first-class durable object: the timer.
Scene 07
Durable timers: sleep 30 days on zero compute
Diagram
TOP: a thread-based sleep(7d) that holds a real worker — a crash during the week erases the in-RAM countdown and the order stalls. BOTTOM: a durable timer — the wait is recorded as a TimerStarted event in ORDER #1001's history, the engine owns the countdown, and the workflow box goes dark: dormant, using essentially zero compute. When the deadline arrives the engine fires TimerFired and wakes the workflow via the task queue. The 'crashes during the wait' slider shows the thread model losing the timer while the durable model survives every restart.
ORDER #1001 has shipped. Now it must wait 7 days before sending the "rate this product" email. The obvious way — call sleep(7 days) — holds a real worker process for the whole week, which is what the TOP model shows. But the engine has a better way. When your workflow code asks to wait, the engine doesn't block a thread; it appends a **TimerStarted** event to ORDER #1001's history and the workflow box goes completely dark: **dormant** — holding no worker and burning essentially zero compute, because the only thing tracking the deadline now is the engine itself. When the countdown hits zero, the engine appends a **TimerFired** event and wakes the workflow via the task queue. That whole mechanism — a wait recorded as a TimerStarted/TimerFired event pair that the engine owns — is a **durable timer**: time is stored as an event in history, not held in a sleeping thread. Watch the durable model start the wait and go dormant, then fire on schedule.
Implementation
Workflow.waitForRating
two ways to wait — one dies on crash, one records an event
1def waitForRating(order):2 ship(order)3 # BROKEN: held thread, countdown lives in RAM4 sleep(days=7) # dies on first crash5 # DURABLE: yields, recording a TimerStarted event6 await workflow.sleep(days=7) # returns control; goes dormant7 send_rating_email(order)
Engine.onTimerStarted
persist the deadline to history, then unload the workflow
1def onTimerStarted(wf_id, duration):2 fire_at = now() + duration3 history.append(wf_id, TimerStarted(fire_at))4 schedule.add(wf_id, fire_at) # engine owns the countdown5 unload(wf_id) # dormant: no worker held
Engine.fireTimers
deadline (or recovery) appends TimerFired and re-queues
1def fireTimers(): # also runs on recovery after a crash2 for wf_id, fire_at in schedule.due(now()):3 history.append(wf_id, TimerFired())4 task_queue.put(wf_id) # wake it5 replay(wf_id) # resumes right after the sleep
Not sure what to ask? Tap a question — the staff engineer answers in the chat panel.