Build a Prometheus-style time-series database (12 scenes)
Scene 02 · Why one row per point is wrong
Storing each point as a SQL row spends 60+ bytes of metadata to carry an 8-byte float — the labels JSON repeats on every row of the firehose.
Previously

We have a clean four-part tuple — now we try to store it the obvious way and watch the bytes explode.

Scene 02
Why one row per point is wrong
Diagram
Left: a SQL-style table with columns `metric | labels | ts | value`. Right: each row's bytes broken down into a stack of colored cells — gray for the row header, blue for the metric name, red for the labels JSON, green for ts, yellow for value. A running-total bar at the bottom accumulates bytes as rows arrive, and a projection panel shows the per-series-per-day cost extrapolated to a 1000-target fleet.
SAMPLES TABLE · per-row bytesBYTE STACK · coloured by categorymetriclabels (JSON)tsvaluehttp_request…{method=GET,status=200,path…171500…12473http_request…{method=GET,status=200,path…171500…12490http_request…{method=GET,status=200,path…171500…12507http_request…{method=GET,status=200,path…171500…12524http_request…{method=GET,status=200,path…171500…12541http_request…{method=GET,status=200,path…171500…12558r18226088106 Br28226088106 Br38226088106 Br48226088106 Br58226088106 Br68226088106 BRUNNING TOTAL0 B0 B127 B254 B382 B509 B636 BPROJECTION610560 B/sample × 5,760 scrapes/day × 1,000 targets ≈ 0.6 GB/dayWaiting for the first scrape — every row will carry the same labels JSON.
Six scrapes of the same series stream in one at a time. Watch the labels JSON column (red) take more bytes than the row header, metric, ts, and value combined — every single row.
Implementation
Storage.naiveInsert
one SQL row per point — labels JSON repeated every row
1def naiveInsert(point):
2 # rowHeader 8 + metric 22 + labels 60 + ts 8 + value 8
3 # = ~106 B per row to carry an 8-byte float
4 db.execute(
5 'INSERT INTO points',
6 ' (metric, labels, ts, value)',
7 ' VALUES (?, ?, ?, ?)',
8 point.metric, # 'http_requests_total'
9 json(point.labels), # full label set, every row
10 point.ts,
11 point.value,
12 )
13
Storage.dedupeLabels
store the label set once; each row keeps a 2-byte ref
1labelDict = {} # labelSet -> seriesId (16-bit)
2
3def dedupeInsert(point):
4 key = canonical(point.labels)
5 if key not in labelDict:
6 labelDict[key] = nextSeriesId()
7 sid = labelDict[key] # 2-byte ref
8 db.execute(
9 'INSERT INTO points (sid, ts, value)',
10 ' VALUES (?, ?, ?)',
11 sid, point.ts, point.value,
12 )
13 # row shrinks to ~28 B: header + 2B ref + ts + value
Storage.projectPerSeriesPerDay
multiply the per-row cost by the scrape count
1def projectStorage(
2 seriesCount, labelBytes, scrapesPerDay,
3):
4 rowHeader, metric, ts, value = 8, 22, 8, 8
5 perRow = rowHeader + metric + labelBytes + ts + value
6 perSeriesPerDay = perRow * scrapesPerDay
7 fleet = perSeriesPerDay * seriesCount
8 return perSeriesPerDay, fleet
9
10# perRow is paid on every scrape — it doesn't amortize.
11# fleet scales the per-series number by seriesCount.