Design canvas — pick a workload, ship a schema — Build a columnar OLAP store (ClickHouse / Druid style)

Build a columnar OLAP store (ClickHouse / Druid style) (13 scenes)

Scene 13 · Design canvas — pick a workload, ship a schema

Every knob — sort key, partition, skip index, MV, denormalization — is a workload-shaped choice. The right observability schema is wrong for BI.

Previously

We've now seen every lever — layout, encoding, execution, parts, merge, sort key, sparse index, skip index, MVs, denormalization. The last move is to face a real workload and pick the configuration end-to-end.

Scene 13

Design canvas: pick a workload, ship a schema

Watch

Diagram

Top: WORKLOAD ARCHETYPE cards — a workload archetype is the recurring shape of writes + queries that dictates which knobs matter (observability / BI / event analytics). Middle: the engine config you'd ship — **sort key** (Scene 8), **partition** (Scene 6), **skip indexes** (Scene 10), **materialized view** rollups (Scene 11), **denormalization** vs dictGet (Scene 12), shards × replicas. Right: live simulation — insert rate, **parts** vs threshold (Scene 6), **query p99** (Scene 8/9), merge queue depth, MV drift. Bottom: status banner — green / yellow / red — with a chip naming the scene whose insight diagnoses the failure (and 'wrong tool' when the archetype doesn't fit at all).

Sources

Workload A — Observability. Defaults are pre-loaded: ORDER BY (service, ts), partition monthly. The verifier walks green and the chips trace every knob back to an earlier scene. This is what twelve scenes of column-store thinking cash out to.

Implementation

Stop A — Observability schema

(service, ts) prefix · monthly partition · skip indexes on host/trace_id

1CREATE TABLE events (
2  service   LowCardinality(String),
3  host      LowCardinality(String),
4  ts        DateTime CODEC(Delta, ZSTD),
5  trace_id  String,
6  metric    Float64,
7  INDEX idx_host  host     TYPE set(100)       GRANULARITY 4,
8  INDEX idx_trace trace_id TYPE bloom_filter   GRANULARITY 4
9) ENGINE = MergeTree
10ORDER BY (service, ts)
11PARTITION BY toYYYYMM(ts)
12SETTINGS index_granularity = 8192;

Stop B — BI dashboards schema

fact table + AggregatingMergeTree MV · LowCardinality dim columns

1CREATE TABLE events (
2  service     LowCardinality(String),
3  region      LowCardinality(String),
4  ts          DateTime CODEC(Delta, ZSTD),
5  customer_id UInt64,
6  revenue     Decimal(18, 2)
7) ENGINE = MergeTree
8ORDER BY (service, ts)
9PARTITION BY toYYYYMM(ts);
10 
11CREATE MATERIALIZED VIEW daily_revenue_mv
12ENGINE = AggregatingMergeTree
13ORDER BY (toDate(ts), region) AS
14SELECT toDate(ts) AS d, region,
15       sumState(revenue) AS revenue_state
16FROM events GROUP BY d, region;

Stop D — Anti-pattern: per-row UPDATE

verifier REFUSES — every UPDATE rewrites a whole part

1CREATE TABLE inventory (
2  sku_id   UInt64,
3  qty      Int32,
4  price    Decimal(18, 2),
5  updated  DateTime
6) ENGINE = MergeTree
7ORDER BY (sku_id)
8PARTITION BY toYYYYMM(updated);
9 
10-- Per-row UPDATE arrives from the OLTP app:
11ALTER TABLE inventory UPDATE qty = qty - 1
12  WHERE sku_id = 42;   -- rewrites every part touching `qty`
13 
14-- Verifier REFUSES: parts are immutable; one new part per write.
15-- Merges cannot keep up → 'too many parts' fires in minutes.
16-- Right answer: use Postgres, or stage in PG and CDC to ClickHouse.

Wrong knobs — yellow / red on stops A & B

leading with ts → useless sparse index · hourly → too many parts

1CREATE TABLE events_bad (
2  service LowCardinality(String),
3  ts      DateTime,
4  metric  Float64
5) ENGINE = MergeTree
6ORDER BY (ts)
7-- Leading with ts: every WHERE service=... scans every
8--   granule; sparse index prunes nothing.
9PARTITION BY toStartOfHour(ts);
10-- Hourly × shards → tiny-part explosion within minutes;
11--   INSERTs throw 'too many parts', merges never catch up.

PreviousJoins — denormalize or pay