Build a columnar OLAP store (ClickHouse / Druid style) (13 scenes)
Scene 07 · Merge — and the 'too many parts' crash
Background worker fuses small parts into bigger ones; when write rate exceeds merge rate, parts_to_throw_insert fires and inserts get rejected.
Previously

Many small parts pile up fast. If the count grows unbounded, every query has to interrogate every part — there must be a background process that fuses them, or the engine collapses.

Scene 07
Merge — and the 'too many parts' crash
Diagram
A write firehose on the left drops new parts into a stack. A background MERGE worker fuses 3-5 adjacent parts into one bigger part at the next level (the merged part on the right; the originals dim and delete). The gauge at the top tracks active parts per partition against two thresholds. **Merge / compaction** — the background process that fuses small parts into fewer bigger ones, keeping the per-partition count bounded. **Too many parts** — the failure mode that fires when write rate sustainably exceeds merge rate: at parts_to_delay_insert (1000) INSERTs are artificially throttled, at parts_to_throw_insert (3000) they are rejected outright.
ACTIVE PARTS40 / 3,000delay_insert (1000)throw_insert (3000)WRITE FIREHOSE5 batches/secINSERT … VALUES (10k rows)PART STACKnewest on top+20 more parts belowMERGE WORKER3 in → 1 outmerge rate: 5 parts/secMERGED PARTlevel L+1rows: sum of inputsold parts dim and deleteBalanced: merge rate ≥ write rate. The per-partition part count hovers.
Watch the steady-state baseline. Batched INSERTs land as new parts on the stack; the background merge worker picks 3-5 adjacent parts and fuses them into one bigger part at the next level. The active-parts counter hovers around 40 — well below the two thresholds.
Implementation
MergeWorker.run
background loop: pick adjacent same-level parts, sort-merge, swap
1def merge_worker():
2 while True:
3 for partition in active_partitions():
4 # candidates = adjacent parts at the SAME level
5 picks = pick_adjacent_same_level(partition)
6 if len(picks) < 2:
7 continue
8 merged = sort_merge(picks) # in sorted-key order
9 emit_larger_part(merged) # level += 1
10 delete(picks) # originals dim and go
MergeTree.insert_path
admission control: throttle at delay, reject at throw
1def insert_path(block, partition):
2 active_parts = count_active_parts(partition)
3 if active_parts > parts_to_throw_insert: # default 3000
4 raise "Too many parts (" + str(active_parts) + ")"
5 if active_parts > parts_to_delay_insert: # default 1000
6 sleep(backoff(active_parts)) # artificial throttle
7 new_part = write_part(block, partition) # 1 INSERT = 1 part
8 register(new_part)
MergeTree.pick_adjacent_same_level
partition-key effect: merge is bounded by partition walls
1def pick_adjacent_same_level(partition):
2 parts = list_parts(partition) # ONE partition only
3 # merges NEVER cross partition boundaries —
4 # a high-cardinality partition key (e.g. hourly)
5 # leaves each partition with too few adjacent
6 # same-level parts to fuse, so merge starves.
7 return longest_adjacent_run_at_same_level(parts)