Build a Bitcask-style KV store (9 scenes)
Scene 03 · One hash lookup, one pread
get is a constant-time keydir lookup + at most one pread; bounded by one disk seek, zero when the page cache is hot.
Previously

The writer was sequential because the disk cares. The keydir doesn't — it's a hash, and lookups are O(1). One pread closes the loop.

Scene 03
One hash lookup, one pread
Diagram
Pread cursor flies from the keydir row to value_sz bytes in the data file; the page-cache panel turns warm when bytes are reused.
DATA DIRECTORY001.dataclosed · immutablecrc:7a|t04|k…k1 = v1-o…crc:9b|t05|k…k7 = v7002.dataclosed · immutablecrc:c2|t11|k…k1 = v1crc:1f|t12|k…k2 = v2crc:55|t13|k…k3 = v3003.dataactive · acceptingcrc:33|t14|k…k4 = v4crc:e1|t15|k…k5 = v5KEYDIR (in RAM)keyfile_idvalue_posvalue_szk1002143264Bk2002152896Bk3002162448Bk40038480Bk500318056Bk700122464Bpread 64 B @ 1432OPERATIONGET k1002.data@1432 · 64B 0 µsPAGE CACHEwarm: 0 / 64 pageshit rate: hit 0% · 0 preadshitmisslast 0Cold cache — first get pays one pread (~ms). The seek is bounded; a B-tree would pay log N.
Watch a single GET k1. The keydir row pulses, an arrow flies to file 002 at offset 1432, and the pread cursor highlights exactly value_sz bytes — not the whole record (value_pos already skips the header). The pread counter ticks up by one.
Implementation
KeydirEntry (struct)
fixed-size RAM entry per live key — no value bytes
1struct KeydirEntry:
2 file_id # which .data file holds the live record
3 value_pos # byte offset of the VALUE (header skipped)
4 value_sz # exact length to pread
5 tstamp # last-write timestamp (wins on replay)
Bitcask.get
O(1) keydir lookup, then one pread at the recorded offset
1def get(key):
2 entry = keydir.get(key) # O(1) hash probe
3 if entry is None:
4 return None # key never existed / deleted
5 fd = fd_cache.open(entry.file_id)
6 buf = pread(fd,
7 n=entry.value_sz, # exactly value_sz bytes
8 offset=entry.value_pos)
9 return buf # caller may verify CRC
OS.pread (kernel)
the syscall is the same; the kernel decides if disk is touched
1def pread(fd, n, offset):
2 page = (offset // PAGE_SIZE) * PAGE_SIZE
3 if page in page_cache: # HOT — bytes already in RAM
4 return copy_from(page_cache[page], offset, n)
5 block = disk.seek_and_read(fd, page) # COLD — one seek
6 page_cache[page] = block
7 return copy_from(block, offset, n)