Build a workflow engine (Temporal / Airflow / Cadence style) (13 scenes)
Scene 05 · Task queues: workers pull, so redeploys are safe
The engine never pushes work; stateless workers pull tasks from a queue, so a redeploy is just 'no worker for a moment' and the task simply waits to be picked up.
Previously
Stateless workers pulling tasks made redeploys safe: kill them all and the task just waits. But surviving the worker dying is different from surviving the WORK failing — the payment API returns a 503, the warehouse is briefly unreachable. We can't fail ORDER #1001 over a transient blip, so how does the engine re-attempt a failed activity without re-running the whole workflow?
Scene 05
Task queues: workers pull, so redeploys are safe
Diagram
LEFT: the engine and its history DB — this is where ORDER #1001's state actually lives. MIDDLE: the task queue, where the engine parks the next unit of work. RIGHT: a pool of workers that long-poll the queue, pull a task, replay the workflow to current state, run the next step, and report the result back as a history event. Workers are stateless and fungible — kill them, scale them, redeploy them. The 'live workers' slider at 0 is a redeploy: the task simply waits on the queue, the workflow is paused, not lost.
task queue: the engine parks the next step here — it waits, it's never lost
Here's the surprise that makes everything else possible: the engine never reaches into your process to run your code. It can't — your workers might be redeploying, scaled to zero, or on fire. So instead the engine does something humbler. It writes down the next unit of work — "run ShipPackage for ORDER #1001" — and drops it somewhere safe to wait. That somewhere is the **task queue**: a queue where the engine parks the next step of a workflow until something is ready to do it. Notice nothing has run yet, and nothing is lost. On the right sit your **workers**: plain, stateless processes that sit and *long-poll* the queue — "anything for me? anything for me?" — and when a task appears, one of them claims it, replays ORDER #1001's history to the current step, runs ShipPackage for real, and reports the result back as a new history event. Watch it happen. The load-bearing detail: when the worker is done it keeps NO state of its own. Everything that matters — the charge, the reservation — already lives in the engine's history.
Implementation
Engine.dispatchNextStep
parks the next unit of work — never pushes into a worker
1def dispatchNextStep(workflow_id):2 history = db.load(workflow_id) # source of truth3 next_step = fold(history).next # ShipPackage4 if next_step is None:5 return # workflow done6 # place a task; the engine holds NO worker connection7 task_queue.put(Task(workflow_id, next_step))8 # task now waits here until SOME worker polls
Worker.pollLoop
stateless: long-poll, replay history, run the step, report back
1def pollLoop(task_queue):2 while True:3 task = task_queue.poll() # long-poll; blocks if empty4 history = engine.getHistory(task.workflow_id)5 state = replay(history) # hands back recorded results6 result = run(task.step) # ShipPackage, for real7 engine.report(task, result) # becomes a history event8 # worker keeps nothing — loops back to poll
TaskQueue.poll / requeue
a claimed task stays invisible until report — or a crash timeout
1def poll():2 task = take_visible() # claim it3 task.invisible_until = now() + visibility_timeout4 return task # held, not yet acked56def sweep(): # runs continuously7 for t in claimed:8 if now() > t.invisible_until: # worker died before report9 make_visible(t) # a peer will grab it
Not sure what to ask? Tap a question — the staff engineer answers in the chat panel.