Versioning: the year-later replay

Build a workflow engine (Temporal / Airflow / Cadence style) (13 scenes)

Scene 11 · Versioning: the year-later replay

A year-old in-flight execution can't be hot-fixed; a version gate routes old executions down the old path and new ones down the new path, so both replay deterministically.

Previously

A version gate lets ORDER #1001, started a year ago under v1, wake and replay safely alongside v2 executions. You now hold the full toolkit: history, replay, activities, workers, retries, idempotency, timers, signals, sagas, children, versioning. The last question isn't a new mechanism — it's judgment: for a real workload, is this code-as-workflow replay model even the right tool, or is a DAG scheduler or a state machine the better fit?

Scene 11

Versioning: the year-later replay

Watch

Diagram

Two executions of the order workflow replay through ONE deployed codebase (v2 inserts a FraudCheck between ChargeCard and ReserveInventory). ORDER #1001 is in-flight — started under v1, still running on its timer, its history has no FraudCheck. Without a version gate, #1001 replays the v2 path, expects a FraudCheck that isn't in its history, diverges, and throws the non-determinism error from scene 3. The getVersion gate reads the version recorded in each execution's OWN history and routes #1001 down the v1 branch and #1002 down the v2 branch — both deterministic. In-flight execution: a run started under older code that is still alive and carries frozen expectations in its history. Version gate (getVersion/patching): the if-branch that reads each execution's recorded version and sends old runs down the old path, new runs down the new. Editing live workflow code is a breaking change, not a hot-fix; gates accumulate as cruft until old executions drain out of retention.

Sources

docTemporal — Versioning / patching

↓ one deployed codebase (v2) — both runs replay through it

It's a year later. ORDER #1001 charged the card $42, then went to sleep on its 30-day timer — and it's STILL alive, waiting to wake. Today you ship v2 of the order workflow: it inserts a FraudCheck between ChargeCard and ReserveInventory. Here's the trap you can't see in a normal service. There is only ONE deployed codebase now — v2 — and BOTH executions replay through it. But #1001's recorded history was written under v1: it never has a FraudCheck event, because that step didn't exist when it ran. A run like #1001 — started under older code and still alive, carrying expectations frozen into its history — is an **in-flight execution**. #1002, started today, ran against v2 from its first step, so its history already carries FraudCheck. Watch both executions appear on the strip: same code ahead of them, but two different histories behind them.

Implementation

Worker.replay

re-run the code, match each command against history

1def replay(execution):
2    history = execution.history   # frozen, append-only
3    cursor = Cursor(history)
4    for cmd in run_workflow(execution):
5        recorded = cursor.next_event()
6        if recorded is None:
7            return execute_live(cmd)   # caught up
8        if cmd.kind != recorded.kind:
9            raise NonDeterminismError(cmd, recorded)
10        feed_back(cmd, recorded.result)  # don't re-run

Workflow.orderV2

the one deployed codebase both runs replay

1def order_workflow(order):
2    charge_card(order, 42)
3    v = get_version('fraud', min=1, max=2)
4    if v >= 2:
5        fraud_check(order)
6    reserve_inventory(order)
7    ship(order); send_email(order)

Engine.getVersion

read the version from THIS run's own history

1def get_version(change_id, min, max):
2    marker = history.find(change_id)
3    if marker is not None:
4        return marker.version   # what this run committed to
5    # first time: record max so future replays agree
6    record(VersionMarker(change_id, max))
7    return max

Not sure what to ask? Tap a question — the staff engineer answers in the chat panel.

PreviousChild workflows and ContinueAsNew NextDesign canvas: choose the right engine