Same query, two execution plans

Build a distributed logging stack (ELK / Loki) (12 scenes)

Scene 5.5 · Same query, two execution plans

ELK answers via posting-list intersection — milliseconds; Loki resolves labels to chunks, fetches them from S3, and greps in-process — seconds to minutes. Opposite ends of the same trade-off curve.

Previously

Segments-with-inverted-index on one side, chunks-plus-tiny-index on the other. Same input, different disk shape — now run the same question through both and watch the **execution plans** diverge.

Scene 5.5

Same query, two execution plans

Watch

Diagram

Two horizontal swim-lanes share a single query bubble on the far left: `service=api AND level=ERROR` over the selected time window. The TOP lane is ELK's execution plan — four boxes left-to-right: parse KQL, resolve the time-range index pattern, scatter to shards and INTERSECT two posting lists (service=api ∩ level=ERROR, animated as a sorted-merge), fetch the top-N source docs. Per-stage timings are in milliseconds; the total meter lands in tens of ms. The BOTTOM lane is Loki's: parse LogQL, resolve label matchers to a list of chunk_refs via the labels-only index, FETCH those chunks from S3 across a slow arrow, then DECOMPRESS block-by-block and run any line filter IN the querier process. Per-stage timings are in milliseconds too, but the totals land in seconds-to-minutes.

One query: `service=api AND level=ERROR` over the last hour. Both systems return the SAME result set — the difference is the work each one does to get there. We call that ordered set of stages an **execution plan** (or **read path**). Watch ELK's plan run across the top, then Loki's plan across the bottom. Note where the time goes in each lane.

Implementation

ELK.search

coordinating node fans out, posting lists do the filtering

1def search(kql, timeWindow):
2    ast = parseKQL(kql)
3    indices = resolveIndexPattern(
4        alias='logs-*', window=timeWindow,
5    )
6    shards = scatter(ast, indices)
7    perShard = []
8    for shard in shards:
9        # sorted-merge intersect across query terms
10        docIds = intersect_postings(shard, ast.terms)
11        topN = score_bm25(shard, docIds)[:N]
12        perShard.append(topN)
13    merged = mergeTopN(perShard)
14    return fetchSourceDocs(merged)

Loki.query

labels resolve to chunk_refs; the regex runs in the querier

1def query(logql, timeWindow):
2    ast = parseLogQL(logql)
3    chunkRefs = index.lookup(
4        labelMatchers=ast.matchers,
5        window=timeWindow,
6    )
7    results = []
8    for ref in chunkRefs:
9        chunk = objectStore.get(ref)  # S3
10        for block in chunk.decompress():
11            for line in block:
12                if ast.lineFilter.match(line):
13                    results.append(line)
14    return results

Lucene.intersect_postings

galloping sorted-merge, smallest list drives

1def intersect_postings(shard, terms):
2    lists = [shard.postings(t) for t in terms]
3    lists.sort(key=len)  # smallest drives
4    out = []
5    for candidate in lists[0]:
6        keep = True
7        for other in lists[1:]:
8            # gallop forward to >= candidate
9            if not other.advanceTo(candidate):
10                keep = False
11                break
12        if keep:
13            out.append(candidate)
14    return out

PreviousInverted index vs labels-only index NextCardinality is the killer