Build a workflow engine (Temporal / Airflow / Cadence style) (13 scenes)
Scene 05 · Task queues: workers pull, so redeploys are safe
The engine never pushes work; stateless workers pull tasks from a queue, so a redeploy is just 'no worker for a moment' and the task simply waits to be picked up.
Previously

Stateless workers pulling tasks made redeploys safe: kill them all and the task just waits. But surviving the worker dying is different from surviving the WORK failing — the payment API returns a 503, the warehouse is briefly unreachable. We can't fail ORDER #1001 over a transient blip, so how does the engine re-attempt a failed activity without re-running the whole workflow?

Scene 05
Task queues: workers pull, so redeploys are safe
Diagram
LEFT: the engine and its history DB — this is where ORDER #1001's state actually lives. MIDDLE: the task queue, where the engine parks the next unit of work. RIGHT: a pool of workers that long-poll the queue, pull a task, replay the workflow to current state, run the next step, and report the result back as a history event. Workers are stateless and fungible — kill them, scale them, redeploy them. The 'live workers' slider at 0 is a redeploy: the task simply waits on the queue, the workflow is paused, not lost.
ORDER #1001 — state lives in the engine, work flows through a queuelive workers: 1 — long-polling the queueENGINE + HISTORY DBthis is where the order's state actually livesevent history (append-only) · order #1001ChargeCard ok $42replayedReserveInventory okreplayedShipPackageliveEmailReceiptfutureenqueue next ta…TASK QUEUEnext unit of work waits hereTASKShipPackagetask-step3being drainedpull taskreport resultWORKERS (long-poll the queue)stateless · fungible · freely killableworker 11. pulled task2. replay history3. run ShipPacka…holds NO state of its ownA worker pulls the task, replays ORDER #1001 to step 3, runs ShipPackage, and reports the result back.
task queue: the engine parks the next step here — it waits, it's never lost
Here's the surprise that makes everything else possible: the engine never reaches into your process to run your code. It can't — your workers might be redeploying, scaled to zero, or on fire. So instead the engine does something humbler. It writes down the next unit of work — "run ShipPackage for ORDER #1001" — and drops it somewhere safe to wait. That somewhere is the **task queue**: a queue where the engine parks the next step of a workflow until something is ready to do it. Notice nothing has run yet, and nothing is lost. On the right sit your **workers**: plain, stateless processes that sit and *long-poll* the queue — "anything for me? anything for me?" — and when a task appears, one of them claims it, replays ORDER #1001's history to the current step, runs ShipPackage for real, and reports the result back as a new history event. Watch it happen. The load-bearing detail: when the worker is done it keeps NO state of its own. Everything that matters — the charge, the reservation — already lives in the engine's history.
Implementation
Engine.dispatchNextStep
parks the next unit of work — never pushes into a worker
1def dispatchNextStep(workflow_id):
2 history = db.load(workflow_id) # source of truth
3 next_step = fold(history).next # ShipPackage
4 if next_step is None:
5 return # workflow done
6 # place a task; the engine holds NO worker connection
7 task_queue.put(Task(workflow_id, next_step))
8 # task now waits here until SOME worker polls
Worker.pollLoop
stateless: long-poll, replay history, run the step, report back
1def pollLoop(task_queue):
2 while True:
3 task = task_queue.poll() # long-poll; blocks if empty
4 history = engine.getHistory(task.workflow_id)
5 state = replay(history) # hands back recorded results
6 result = run(task.step) # ShipPackage, for real
7 engine.report(task, result) # becomes a history event
8 # worker keeps nothing — loops back to poll
TaskQueue.poll / requeue
a claimed task stays invisible until report — or a crash timeout
1def poll():
2 task = take_visible() # claim it
3 task.invisible_until = now() + visibility_timeout
4 return task # held, not yet acked
5
6def sweep(): # runs continuously
7 for t in claimed:
8 if now() > t.invisible_until: # worker died before report
9 make_visible(t) # a peer will grab it
Not sure what to ask? Tap a question — the staff engineer answers in the chat panel.