Build a Prometheus-style time-series database (12 scenes)
Scene 08 · Inverted index — labels to series
For each (label, value) pair the database stores a sorted list of series IDs (a postings list). A multi-label query is the intersection.
Previously

Stage 2 is a search engine in disguise — postings lists and intersections, just like Lucene.

Scene 08
Inverted index: labels to series
Diagram
Left: a column of (label = value) keys, each with its postings list — a sorted row of series-ID tiles, with a size badge. Right: an intersection lane that takes the selected lists and animates the merge-walk; the smallest list is highlighted as the driver of cost. Bottom: a result chip with the surviving series IDs.
POSTINGS · label = value → series idsINTERSECTION LANEmethod=GET12468n=5method=POST357n=3status=200134678n=6status=50025n=2path=/api12345n=5path=/health678n=3drv25L2357MERGE WALKO(2) · 0 cmpRESULT · 1 series5Query {method=POST, status=500} → select two postings lists. status=500 is smaller (2 IDs) — it drives the walk.
How does the database go from `{method=POST, status=500}` to series IDs without scanning every series? It keeps a tiny sorted list of series IDs for every (label, value) pair — a postings list. The query becomes an intersection of two such lists. Watch status=500 (the smaller list) drive the walk; each of its 2 IDs is probed against POST.
Implementation
Index.addSeries
every label of every series fans out into one postings list
1def addSeries(seriesId, labels):
2 for (k, v) in labels:
3 # postings: (label, value) -> sorted [seriesId, ...]
4 postings[(k, v)].sortedInsert(seriesId)
5 # special list for the no-selector query {}
6 postings[allPostingsKey].sortedInsert(seriesId)
Index.resolveSeries
selectors -> postings lists, smallest first
1def resolveSeries(selectors):
2 if len(selectors) == 0:
3 return postings[allPostingsKey]
4 lists = [postings[(k, v)] for (k, v) in selectors]
5 # planner's only heuristic: drive the walk by the
6 # smallest list — the corpus size never appears.
7 lists.sort(key = len)
8 if len(lists) == 1:
9 return lists[0]
10 return intersect(lists)
Index.intersect
walk the smallest list, probe the others
1def intersect(lists):
2 driver = lists[0] # smallest after sort
3 others = lists[1:]
4 out = []
5 for candidate in driver:
6 # gallop / binary search — never a linear scan
7 if all(other.contains(candidate)
8 for other in others):
9 out.append(candidate)
10 return out # cost = O(|driver|)