Build a columnar OLAP store (ClickHouse / Druid style) (13 scenes)
Scene 12 · Joins — denormalize or pay
Column-store joins are RAM-bound or shuffle-bound; the canonical OLAP answer is to denormalize at write time — storage is cheap, latency is the constraint.
Previously

Materialized views collapse one big scan into a tiny one. But the reader's other OLTP instinct — JOIN across normalized tables — hits a different kind of wall on a column store, and the answer feels wrong.

Scene 12
Joins: denormalize or pay
Diagram
Three panels for the same logical query `SELECT country, tier, count() FROM events`. **Star schema · hash join** — build a hash table on the users dim, probe per event row; memory tracks the right side. **Denormalized wide table** — collapse the JOIN at write time by baking dim columns (user_tier, country_code, signup_cohort, device) into every event row. **Dictionary · dictGet()** — load the small side into RAM as a key→value map and replace JOIN with a pointwise lookup. Each panel shows a query-latency meter plus the side cost it pays (RAM, disk multiplier, or refresh).
Star schema · hash joinDenormalized wide tableDictionary lookup · dictGet()events (1B, narrow)users (1M, dim)HASH TABLE (RAM)200 MBbuild from usersprobe per event rowQUERY LATENCY4.0 sRAM: 200 MBright: 1M rowsruntime join: build cost paid once per query, probe paid once per fact rowHash table built on the RIGHT side (users); probed once per event row — RAM-bound by the dim, not by the SQL.
↑ hash table built on the RIGHT side (users) — memory tracks dim size
→ probe per event row — ~4 s wall-clock
The star-schema panel runs the OLTP-instinct query: hash table builds from the 1M-row users dim (200 MB in RAM), then events probes it row by row. Wall-clock pins near 4 s — the hash join is RAM-bound by the right side, not by how cleverly you wrote the SQL.
Implementation
hash_join_star_schema()
build hash table on RIGHT (users), probe per LEFT (event) row
1def hash_join_star_schema():
2 # algorithm = 'hash' (ClickHouse default)
3 ht = {} # hash table in RAM
4 for u in scan(users): # build on RIGHT
5 ht[u.id] = u.tier # ~1M rows -> ~200 MB
6 out = Counter()
7 for e in scan(events): # probe per LEFT row
8 tier = ht.get(e.user_id) # one lookup per event
9 out[(e.country, tier)] += 1
10 return out # ~4 s; RAM-bound on |users|
denormalized_query()
no hash table; only column reads off a wide fact table
1def denormalized_query():
2 # tier was baked into the row at write time
3 out = Counter()
4 for e in scan(events_wide, # only the columns we need
5 cols=['country', 'tier']):
6 out[(e.country, e.tier)] += 1
7 return out # ~0.2 s; one wide scan
8
9# update cost: a user upgrading tier => rewrite every
10# event row that user ever produced (parts are immutable).
dictionary_lookup_query()
in-RAM dict pre-loaded once; dictGet per row replaces the JOIN
1def dictionary_lookup_query():
2 # users_dict pre-loaded once into RAM (~30 MB)
3 # algorithm = 'direct' (no hash-build at query time)
4 out = Counter()
5 for e in scan(events, cols=['country', 'user_id']):
6 tier = dictGet('users_dict', 'tier', e.user_id)
7 out[(e.country, tier)] += 1
8 return out # ~0.4 s
9
10# refresh: LIFETIME(MIN 300 MAX 600) reloads from source
11# every ~5-10 min — no event-row rewrites on tier change.