Build a B-tree storage engine (SQLite-style) (11 scenes)
Scene 01 · Find the row — why a file scan won't do
A SQL engine has to answer point, range, and indexed lookups against ONE file — the only family of structures that can do all three is an ordered tree.
Scene 01
Find the row — why a file scan won't do
Diagram
On the left, **users.db** — one file holding every **row** of the **users table**, drawn as scattered colored cells. Each cell's badge is the row's **primary key (PK)** — its rowid. On the right, three queries fire against the file: a **point** lookup by id, a **range** scan over an id band, and an **indexed-column** lookup by email. The cost meter underneath reports rows scanned per query.
FILEusers.dblayout: unsorted (hash-like)#17grace#3bob#22judy#8#1alice#14frank#24ken#5carol#19ivy#11eve#7dan#21#4#16#2#23#9#13#6#20#18#10#15#12POINTSELECT * FROM users WHERE id = 14point lookup by primary key0 scannedRANGESELECT * FROM users WHERE id BETWEEN 5AND 120 scannedINDEXEDSELECT * FROM users WHERE email ='eve@x'0 scannedROWS SCANNED0 / 24Three queries, three full scans. Cost is N for every shape.
users.db — one file, every row of the users table
point: one row by PK
Three queries fire one after another — point, range, indexed. Watch the cursor walk every row in the file for each one. The counter tells you the cost.
Implementation
select_row(file, target_id)
unsorted users.db: every query walks every cell
1def select_row(file, target_id):
2 # no order to exploit — read the cells in disk order
3 for cell in file.cells: # O(N) page reads
4 if cell.rowid == target_id:
5 return cell.row # point hit (still scanned)
6 return None # range / indexed: full file
insert_row_into_sorted(file, row)
sorted-flat users.db: log-N to find, but N to make room
1def insert_row_into_sorted(file, row):
2 # file is kept in PK order on disk
3 pos = bisect_left(file.cells, row.rowid) # O(log N)
4 # make room at pos by sliding the tail one slot right
5 for i in range(len(file.cells) - 1, pos - 1, -1):
6 file.cells[i + 1] = file.cells[i] # O(N) writes
7 file.cells[pos] = row
8 file.length += 1