Build a Prometheus-style time-series database (12 scenes)
Scene 02 · Why one row per point is wrong
Storing each point as a SQL row spends 60+ bytes of metadata to carry an 8-byte float — the labels JSON repeats on every row of the firehose.
Previously
We have a clean four-part tuple — now we try to store it the obvious way and watch the bytes explode.
Scene 02
Why one row per point is wrong
Diagram
Left: a SQL-style table with columns `metric | labels | ts | value`. Right: each row's bytes broken down into a stack of colored cells — gray for the row header, blue for the metric name, red for the labels JSON, green for ts, yellow for value. A running-total bar at the bottom accumulates bytes as rows arrive, and a projection panel shows the per-series-per-day cost extrapolated to a 1000-target fleet.
Six scrapes of the same series stream in one at a time. Watch the labels JSON column (red) take more bytes than the row header, metric, ts, and value combined — every single row.
Implementation
Storage.naiveInsert
one SQL row per point — labels JSON repeated every row
1def naiveInsert(point):2 # rowHeader 8 + metric 22 + labels 60 + ts 8 + value 83 # = ~106 B per row to carry an 8-byte float4 db.execute(5 'INSERT INTO points',6 ' (metric, labels, ts, value)',7 ' VALUES (?, ?, ?, ?)',8 point.metric, # 'http_requests_total'9 json(point.labels), # full label set, every row10 point.ts,11 point.value,12 )13
Storage.dedupeLabels
store the label set once; each row keeps a 2-byte ref
1labelDict = {} # labelSet -> seriesId (16-bit)23def dedupeInsert(point):4 key = canonical(point.labels)5 if key not in labelDict:6 labelDict[key] = nextSeriesId()7 sid = labelDict[key] # 2-byte ref8 db.execute(9 'INSERT INTO points (sid, ts, value)',10 ' VALUES (?, ?, ?)',11 sid, point.ts, point.value,12 )13 # row shrinks to ~28 B: header + 2B ref + ts + value
Storage.projectPerSeriesPerDay
multiply the per-row cost by the scrape count
1def projectStorage(2 seriesCount, labelBytes, scrapesPerDay,3):4 rowHeader, metric, ts, value = 8, 22, 8, 85 perRow = rowHeader + metric + labelBytes + ts + value6 perSeriesPerDay = perRow * scrapesPerDay7 fleet = perSeriesPerDay * seriesCount8 return perSeriesPerDay, fleet910# perRow is paid on every scrape — it doesn't amortize.11# fleet scales the per-series number by seriesCount.