What the simulator models (and what it doesn't)

Short version: it is a deterministic, request-level trace engine over the diagram you drew — not a queueing simulator and not a load tester. Here is exactly where every number comes from.

What happens when you hit Run

The engine walks your graph. A trace is the cheapest directed path from a start kind (client, web, mobile) to an end kind (a datastore), biased by the signals you put on the diagram, strongest first: the edge's declared op (read/write/publish/…), its intents, node labels, then node kinds. Templates can require or forbid kinds — a durable write is never traced through a cache.

Where the milliseconds come from

A fixed per-kind cost table (load balancer ≈ 2ms, service ≈ 10ms, cache ≈ 1ms, SQL primary ≈ 5ms server-side, …). These are editorial constants chosen to be realistic, not measurements of your system.
A Dapper-style request / server / response split per hop, derived from the edge protocol.
Latencies you declare on an edge (requestLatencyMs / responseLatencyMs — a fat S3 upload, a slow unindexed query). The homepage demo's 90ms "SELECT by slug — no index" is exactly this: a declared number, shown on the edge, honored by the engine.
Internals-aware sub-hops from each node's configured internals: leader WAL write, sync replication that blocks on follower acks (RPO = 0) vs async that doesn't, quorum reads/writes, cross-shard 2PC, shard routing.

Failure injection

Chaos scenarios mutate the graph structurally, then re-run the same trace templates as probes: kill a node, kill the leader of a leader-follower store (writes blocked for the configured failover RTO; async replication loses up to RPO seconds of writes), drop a quorum, spike replication lag, partition the network, kill a 2PC coordinator. The before/after diff and the violation sentences ("writes UNAVAILABLE for ~15s") are computed from your declared internals — not generated by a language model.

What it does NOT model

No queueing theory and no load. There is no RPS input; latency does not change with traffic volume. "Fails at 10k RPS" is a claim this engine cannot make, so we don't make it.
No percentile distributions. A trace is one deterministic request. The p50/p95 figures used to flag a hop as "slow" come from a reference band table, not from a simulated distribution.
No retries, timeouts, or backpressure dynamics — retry policy is a label on the edge, not a simulated behavior.
No true parallel fan-out. Branches (cache hit vs miss, fallback paths) are alternate sequential traces, not a parallel DAG with overlap semantics.

Where the AI sits

The staff-engineer critique is an LLM reading the engine's output. It narrates and argues; it never writes a number on the trace. Every millisecond, RTO, and RPO on screen is engine-derived.

The engine is ~4,000 lines of plain TypeScript and runs in your browser. If you want to argue with a constant, run the homepage design and watch which hops it flags.