The same query: 30 minutes vs 200 ms

Build a columnar OLAP store (ClickHouse / Druid style) (13 scenes)

Scene 01 · The same query: 30 minutes vs 200 ms

Same SELECT, same rows. ClickHouse runs 9000× faster than Postgres because the column store reads only the columns the query touches.

Scene 01

Watch

Diagram

One query bar fires against two engines side by side. Left is a **row store** (Postgres): each row's columns are stored together on disk. Right is a **column store** (ClickHouse): each column lives in its own file. Both panels show bytes pulled off disk and wall-clock time for the same SELECT.

row store — every column of every row sits together on disk

column store — one file per column, read only what the query asks for

this 9000x gap is what 'OLAP' is named after — Online Analytical Processing

Same query, two engines. The row store pulls 360 GB off disk and spins to 30 minutes. The column store pulls 28 GB and finishes at 200 ms. Same hardware, same row count — only the on-disk shape differs.

Implementation

RowStore.scan(query)

every page holds all N columns — pays the full row width

1def scan(query):                       # Postgres-shaped
2    bytes_read = 0
3    for page in heap.pages():          # 8 KB pages, row-major
4        for row in page.rows():        # all N cols interleaved
5            bytes_read += row.width    # = sum(width(c) for c in N)
6            tuple = decode(row)        # 30 fields materialized
7            project = [tuple[c] for c in query.columns]
8            emit(project)              # K-of-N kept, N-K discarded
9    return bytes_read                  # ~= table_size_bytes

ColumnStore.scan(query)

open one file per queried column — skip the rest entirely

1def scan(query):                       # ClickHouse-shaped
2    bytes_read = 0
3    files = [open(f'{c}.bin') for c in query.columns]  # K files
4    # the other (N - K) column files are never opened
5    for granule in zipGranules(files): # vector of K cols
6        bytes_read += sum(len(b) for b in granule)
7        emit(granule)                  # already projected
8    return bytes_read                  # ~= (K / N) * table_size

compare(query, table)

the K/N ratio that the slider is sweeping

1def compare(query, table):
2    N = table.totalColumns             # 30 or 100
3    K = len(query.columns)             # 3, 30, or 1
4    row_bytes = table.size_bytes       # always full table
5    col_bytes = table.size_bytes * (K / N)
6    ratio = row_bytes / col_bytes      # = N / K
7    return (row_bytes, col_bytes, ratio)

NextSame table, two on-disk shapes