Task queues: workers pull, so redeploys are safe

Build a workflow engine (Temporal / Airflow / Cadence style) (13 scenes)

Scene 05 · Task queues: workers pull, so redeploys are safe

The engine never pushes work; stateless workers pull tasks from a queue, so a redeploy is just 'no worker for a moment' and the task simply waits to be picked up.

Previously

Stateless workers pulling tasks made redeploys safe: kill them all and the task just waits. But surviving the worker dying is different from surviving the WORK failing — the payment API returns a 503, the warehouse is briefly unreachable. We can't fail ORDER #1001 over a transient blip, so how does the engine re-attempt a failed activity without re-running the whole workflow?

Scene 05

Task queues: workers pull, so redeploys are safe

Watch

Diagram

LEFT: the engine and its history DB — this is where ORDER #1001's state actually lives. MIDDLE: the task queue, where the engine parks the next unit of work. RIGHT: a pool of workers that long-poll the queue, pull a task, replay the workflow to current state, run the next step, and report the result back as a history event. Workers are stateless and fungible — kill them, scale them, redeploy them. The 'live workers' slider at 0 is a redeploy: the task simply waits on the queue, the workflow is paused, not lost.

Sources

docTemporal — Temporal Service & Durable Execution

task queue: the engine parks the next step here — it waits, it's never lost

Here's the surprise that makes everything else possible: the engine never reaches into your process to run your code. It can't — your workers might be redeploying, scaled to zero, or on fire. So instead the engine does something humbler. It writes down the next unit of work — "run ShipPackage for ORDER #1001" — and drops it somewhere safe to wait. That somewhere is the **task queue**: a queue where the engine parks the next step of a workflow until something is ready to do it. Notice nothing has run yet, and nothing is lost. On the right sit your **workers**: plain, stateless processes that sit and *long-poll* the queue — "anything for me? anything for me?" — and when a task appears, one of them claims it, replays ORDER #1001's history to the current step, runs ShipPackage for real, and reports the result back as a new history event. Watch it happen. The load-bearing detail: when the worker is done it keeps NO state of its own. Everything that matters — the charge, the reservation — already lives in the engine's history.

Implementation

Engine.dispatchNextStep

parks the next unit of work — never pushes into a worker

1def dispatchNextStep(workflow_id):
2    history = db.load(workflow_id)   # source of truth
3    next_step = fold(history).next   # ShipPackage
4    if next_step is None:
5        return                        # workflow done
6    # place a task; the engine holds NO worker connection
7    task_queue.put(Task(workflow_id, next_step))
8    # task now waits here until SOME worker polls

Worker.pollLoop

stateless: long-poll, replay history, run the step, report back

1def pollLoop(task_queue):
2    while True:
3        task = task_queue.poll()      # long-poll; blocks if empty
4        history = engine.getHistory(task.workflow_id)
5        state = replay(history)       # hands back recorded results
6        result = run(task.step)       # ShipPackage, for real
7        engine.report(task, result)   # becomes a history event
8        # worker keeps nothing — loops back to poll

TaskQueue.poll / requeue

a claimed task stays invisible until report — or a crash timeout

1def poll():
2    task = take_visible()             # claim it
3    task.invisible_until = now() + visibility_timeout
4    return task                       # held, not yet acked
5 
6def sweep():                          # runs continuously
7    for t in claimed:
8        if now() > t.invisible_until: # worker died before report
9            make_visible(t)           # a peer will grab it

Not sure what to ask? Tap a question — the staff engineer answers in the chat panel.

PreviousActivities: quarantine for side effects NextRetries and exponential backoff