Build a Prometheus-style time-series database (12 scenes)
Scene 07 · How a query becomes points
A read is four stages — parse, resolve label-selectors to series IDs, decompress the matching chunks, then aggregate. Stage 3 dominates.
Previously
Writes are durable. Now flip the system: a query says `{method=POST, status=500}` and we need to walk from those labels to actual bytes on disk.
Scene 07
How a query becomes points
Diagram
Four stages from left to right: Parse (the query becomes selectors and a time range), Resolve (selectors hit a black-box index that emits a small set of series IDs), Decompress (the matching chunks unpack into raw points), Aggregate (the points fold into a single result). A timing bar at the bottom shows ms spent per stage.
A PromQL query enters on the left. Watch it walk the four stages: parse splits text into selectors and a range, resolve hits a (still-opaque) index that emits a small set of series IDs, decompress unpacks the matching chunks for the [5m] window, and aggregate folds them into one number. The timing bar at the bottom shows where the milliseconds went.
Implementation
TSDB.query
the read path: four sequential stages, one per call
1def query(text, t0, t1, fn):2 selectors, range = parseQuery(text)3 series_ids = resolveSeries(selectors)4 points = decompressAndAggregate(5 series_ids, t0, t1, fn,6 )7 return points
parseQuery
stage 1 — tokenise text into selectors and a range
1def parseQuery(text):2 ast = promql.parse(text)3 selectors = []4 for matcher in ast.label_matchers:5 selectors.append(6 (matcher.name, matcher.value),7 )8 range = ast.range # e.g. [5m], [1h], [30d]9 return selectors, range
resolveSeries
stage 2 — selectors hit the inverted index (black box)
1def resolveSeries(selectors):2 # postings list per (label, value)3 # cost depends on cardinality, NOT on range4 postings = [5 index.postings(label, value)6 for (label, value) in selectors7 ]8 return intersect(postings) # → {S3, S7}
decompressAndAggregate
stages 3 & 4 — unpack chunks in [t0,t1], then fold
1def decompressAndAggregate(ids, t0, t1, fn):2 points = []3 for sid in ids:4 for chunk in chunksFor(sid):5 if chunk.overlaps(t0, t1):6 # delta-of-delta + XOR decode7 points += chunk.decompress()8 return fn(points) # rate / sum / avg