Build a Prometheus-style time-series database (12 scenes)
Scene 08 · Inverted index — labels to series
For each (label, value) pair the database stores a sorted list of series IDs (a postings list). A multi-label query is the intersection.
Previously
Stage 2 is a search engine in disguise — postings lists and intersections, just like Lucene.
Scene 08
Inverted index: labels to series
Diagram
Left: a column of (label = value) keys, each with its postings list — a sorted row of series-ID tiles, with a size badge. Right: an intersection lane that takes the selected lists and animates the merge-walk; the smallest list is highlighted as the driver of cost. Bottom: a result chip with the surviving series IDs.
How does the database go from `{method=POST, status=500}` to series IDs without scanning every series? It keeps a tiny sorted list of series IDs for every (label, value) pair — a postings list. The query becomes an intersection of two such lists. Watch status=500 (the smaller list) drive the walk; each of its 2 IDs is probed against POST.
Implementation
Index.addSeries
every label of every series fans out into one postings list
1def addSeries(seriesId, labels):2 for (k, v) in labels:3 # postings: (label, value) -> sorted [seriesId, ...]4 postings[(k, v)].sortedInsert(seriesId)5 # special list for the no-selector query {}6 postings[allPostingsKey].sortedInsert(seriesId)
Index.resolveSeries
selectors -> postings lists, smallest first
1def resolveSeries(selectors):2 if len(selectors) == 0:3 return postings[allPostingsKey]4 lists = [postings[(k, v)] for (k, v) in selectors]5 # planner's only heuristic: drive the walk by the6 # smallest list — the corpus size never appears.7 lists.sort(key = len)8 if len(lists) == 1:9 return lists[0]10 return intersect(lists)
Index.intersect
walk the smallest list, probe the others
1def intersect(lists):2 driver = lists[0] # smallest after sort3 others = lists[1:]4 out = []5 for candidate in driver:6 # gallop / binary search — never a linear scan7 if all(other.contains(candidate)8 for other in others):9 out.append(candidate)10 return out # cost = O(|driver|)