Why one row per point is wrong

Build a Prometheus-style time-series database (12 scenes)

Scene 02 · Why one row per point is wrong

Storing each point as a SQL row spends 60+ bytes of metadata to carry an 8-byte float — the labels JSON repeats on every row of the firehose.

Previously

We have a clean four-part tuple — now we try to store it the obvious way and watch the bytes explode.

Scene 02

Why one row per point is wrong

Watch

Diagram

Left: a SQL-style table with columns `metric | labels | ts | value`. Right: each row's bytes broken down into a stack of colored cells — gray for the row header, blue for the metric name, red for the labels JSON, green for ts, yellow for value. A running-total bar at the bottom accumulates bytes as rows arrive, and a projection panel shows the per-series-per-day cost extrapolated to a 1000-target fleet.

Six scrapes of the same series stream in one at a time. Watch the labels JSON column (red) take more bytes than the row header, metric, ts, and value combined — every single row.

Implementation

Storage.naiveInsert

one SQL row per point — labels JSON repeated every row

1def naiveInsert(point):
2    # rowHeader 8 + metric 22 + labels 60 + ts 8 + value 8
3    #   = ~106 B per row to carry an 8-byte float
4    db.execute(
5        'INSERT INTO points',
6        '  (metric, labels, ts, value)',
7        '  VALUES (?, ?, ?, ?)',
8        point.metric,        # 'http_requests_total'
9        json(point.labels),  # full label set, every row
10        point.ts,
11        point.value,
12    )
13

Storage.dedupeLabels

store the label set once; each row keeps a 2-byte ref

1labelDict = {}  # labelSet -> seriesId (16-bit)
2 
3def dedupeInsert(point):
4    key = canonical(point.labels)
5    if key not in labelDict:
6        labelDict[key] = nextSeriesId()
7    sid = labelDict[key]    # 2-byte ref
8    db.execute(
9        'INSERT INTO points (sid, ts, value)',
10        '  VALUES (?, ?, ?)',
11        sid, point.ts, point.value,
12    )
13    # row shrinks to ~28 B: header + 2B ref + ts + value

Storage.projectPerSeriesPerDay

multiply the per-row cost by the scrape count

1def projectStorage(
2    seriesCount, labelBytes, scrapesPerDay,
3):
4    rowHeader, metric, ts, value = 8, 22, 8, 8
5    perRow = rowHeader + metric + labelBytes + ts + value
6    perSeriesPerDay = perRow * scrapesPerDay
7    fleet = perSeriesPerDay * seriesCount
8    return perSeriesPerDay, fleet
9 
10# perRow is paid on every scrape — it doesn't amortize.
11# fleet scales the per-series number by seriesCount.

PreviousWhat a metric point actually is NextDelta-of-delta crushes timestamps