Single-node by design; HA is somebody else's problem

Build a Prometheus-style time-series database (12 scenes)

Scene 10.5 · Single-node by design; HA is somebody else's problem

The TSDB itself isn't replicated. HA = two parallel scrapers; durability = ship every sample to a remote-write receiver that dedups.

Previously

The single-node TSDB is a complete unit. Scaling out is a *separate* tier with separate trade-offs.

Scene 11

Single-node by design; HA is somebody else's problem

Watch

Diagram

Top: a stream of scrape events from 1000 targets. Two parallel Prometheus instances (A and B) read independent cursors over the same stream — that's how HA works without replication. Bottom: a remote-write receiver consumes from both and dedups by `__replica__` label, becoming the durable long-range store. Thanos / Cortex / Mimir / VictoriaMetrics are all flavors of that receiver.

Two Prometheus instances scrape the same 1000 targets — that's the entire HA story. Both ship every sample to the remote-write receiver downstream. The TSDB itself doesn't replicate, so how do you survive a node dying? Run two of them, scraping the same targets, and ship every sample to a separate cluster. That ship is called **remote write**. The receiver sees each sample twice (with `__replica__=A` and `__replica__=B`) and drops the duplicate.

Implementation

Prometheus.run

every interval: scrape, append locally, queue remote-write

1def run(self):
2    while alive:
3        sleep(scrape_interval)              # 15s default
4        for target in self.targets:         # SAME targets as peer
5            samples = target.scrape()
6            for s in samples:
7                s.labels['__replica__'] = self.replica_id
8                self.tsdb.append(s)         # local single-node store
9                self.remote_write_queue.put(s)

RemoteWriteReceiver.handle

key = (series, ts); if seen with different __replica__, drop

1def handle(self, sample):
2    fp = fingerprint(
3        sample.metric, sample.labels_without_replica,
4    )
5    key = (fp, sample.timestamp)
6    prev = self.seen.get(key)
7    if prev is None:
8        self.seen[key] = sample.labels['__replica__']
9        self.store.append(sample)           # durable cluster store
10        return
11    if prev != sample.labels['__replica__']:
12        self.dedup_drops += 1               # duplicate from peer

PreviousDownsampling — a retention pyramid NextDesign canvas: pick a workload, ship a config