Build a columnar OLAP store (ClickHouse / Druid style) (13 scenes)
Scene 08 · ORDER BY — filtering becomes range-scan
Rows inside a part are sorted by ORDER BY; WHERE on a leftmost-prefix column is a contiguous range, anything else is a full scan.
Previously
Merge keeps the part count bounded, but what a query does inside each part — scan everything, or jump to a range — depends on how the rows were sorted when the part was written.
Scene 08
ORDER BY: filtering becomes range-scan
Diagram
One part shown as a horizontal strip of 60 row tiles in physical on-disk order, colored by the leading sort-key column. A WHERE clause at the top of the strip either draws a contiguous green range bracket (the engine reads one slice) or scattered red arrows (the engine has to scan every tile). Below the strip, a badge reports the cardinality of the leading sort-key column — the lever that decides whether ORDER BY can prune at all.
Watch the part fill up with 60 row tiles in physical on-disk order — all the api rows first, then web, then cron, because the part was written with ORDER BY (service, ts). Once the strip lands, WHERE service='api' lights up the first 20 tiles as a contiguous range — the engine reads that slice and nothing else.
Implementation
Planner.plan_query(part, where)
walk ORDER BY left-to-right, match WHERE, pick range vs scan
1def plan_query(part, where):2 sort_cols = part.order_by # e.g. ('service', 'ts')3 # Find the longest prefix of sort_cols that WHERE4 # constrains with an equality / range predicate.5 prefix = []6 for col in sort_cols:7 if where.constrains(col):8 prefix.append(col)9 else:10 break # gap kills everything to the right11 if not prefix:12 return FullPartScan() # non-prefix WHERE13 return RangeScanBySortKey(prefix, where)
RangeScanBySortKey.execute(part, where)
on (service, ts): seek to service='api', read until it flips
1def range_scan_by_sort_key(part, where):2 # Binary-search the in-memory primary.idx to slice3 # the granule range where the prefix could live.4 lo = bisect_left(part.sort_columns, where.prefix_lo)5 hi = bisect_right(part.sort_columns, where.prefix_hi)6 # Seek to first row where service = 'api',7 # then sequential read until service != 'api'.8 for row in part.rows[lo:hi]:9 if where.matches(row):10 yield row
# Leftmost-prefix rule
the schema-design law the planner is enforcing
1# ORDER BY (A, B):2# WHERE A = ... -> range scan (fast)3# WHERE A = .. AND B = .. -> range scan (fastest)4# WHERE B = ... -> full scan (B's range5# restarts inside every A)6#7# ORDER BY (user_id, ts) with user_id unique per row:8# primary.idx has one entry per granule, each a9# different user_id -> no granule can be skipped,10# every WHERE degenerates to a full part scan.11# This is the canonical ClickHouse anti-pattern.