Build a columnar OLAP store (ClickHouse / Druid style) (13 scenes)
Scene 11 · Materialized views are INSERT triggers
A column-store MV is not a refresh — it's a trigger over the incoming block. Direct INSERT into the target silently bypasses it and drifts.
Previously
Skip indexes prune at read time. The bigger lever is to PRE-COMPUTE the answer at write time — turn a billion-row scan into a two-hundred-row scan by materializing the aggregation as data lands.
Scene 11
Materialized views are INSERT triggers
Diagram
LEFT is the source table `events` — wide parts of raw rows. MIDDLE is the **materialized view** (a stored SELECT + a per-block trigger): it fires on every INSERT into the source and runs its SELECT against the INCOMING BLOCK only, not the whole table. RIGHT is the target rollup `events_by_country_hourly` — narrow pre-aggregated parts that the dashboard reads. The dashed bypass arrow at the bottom appears when something writes straight to the target without the MV running; the warn banner appears when the MV was attached AFTER data already existed in the source.
An INSERT lands in `events`. The MV's SELECT runs against the just-arrived 1M-row block and appends ~5 rollup rows to `events_by_country_hourly`. The dashboard reads the rollup in 2 ms instead of scanning the source for 30 s.
Implementation
Source.on_insert_to_source
INSERT into source fires every attached MV against the incoming block
1def on_insert_to_source(source, block):2 writePart(source, block) # immutable part on source3 for mv in source.attached_mvs: # post-INSERT trigger4 rollup = mv.select.run(block) # block-only, not whole table5 writePart(mv.target, rollup) # appends to target rollup
Backfill.direct_insert_to_target
writes straight into the target — the trigger never fires
1def direct_insert_to_target(target, rows):2 # writes a part on the TARGET table3 writePart(target, rows)4 # nothing inspects target.attached_mvs — MVs trigger off SOURCE5 # rollup now contains rows the source has no record of6 return # silent drift: target > source
MaterializedView.create
trigger attaches NOW — only future inserts pass through it
1def create_mv(source, select_stmt, target, populate=False):2 mv = Mv(select_stmt, target)3 source.attached_mvs.append(mv) # active from this moment on4 if populate: # opt-in one-shot backfill5 mv.select.run(scan(source)) >> writePart(target)6 # rows already in `source` were inserted BEFORE mv existed —7 # they never ran through select_stmt. Backfill by hand:8 # INSERT INTO target SELECT ... FROM source WHERE ts < cutoff9 return mv