Build a Bitcask-style KV store (9 scenes)
Scene 01 · Log on disk, hash in RAM
A Bitcask is exactly two structures: an append-only log of records on disk + a single in-memory hash (the keydir). Every other property follows from this split.
Scene 01
Log on disk, hash in RAM
Diagram
Left: data-file directory · active at bottom. Right: keydir hash. Bottom: latest op.
An empty Bitcask: no data files written, no keydir rows. Watch what a PUT actually does — one record lands on disk, one row appears in the keydir. Two structures, growing in lockstep.
Implementation
Record on disk
fixed-size header + two variable blobs; appended at tail
1# one entry in <file_id>.data, written end-to-end:2record = struct(3 crc: u32, # checksum of the bytes that follow4 tstamp: u32, # logical write time5 ksz: u32, # key size in bytes6 vsz: u32, # value size in bytes7 key: bytes[ksz],8 value: bytes[vsz],9)10# closed files are immutable forever; only the active file appends.
Bitcask.put
append a record at the active tail, then retarget the keydir slot
1def put(key, value):2 tstamp = now()3 bytes = encode(crc, tstamp, len(key), len(value), key, value)4 offset = active_file.size # current end-of-file5 active_file.append(bytes) # sequential write6 fsync_if_policy(active_file) # sync_strategy7 keydir[key] = Entry(8 file_id = active_file.id,9 value_pos = offset + header_sz + len(key),10 value_sz = len(value),11 tstamp = tstamp,12 ) # one slot per LIVE key13 return ok
Bitcask.get
one hash lookup; one pread at the recorded offset
1def get(key):2 entry = keydir.lookup(key)3 if entry is None: return not_found4 fd = open_or_reuse(entry.file_id)5 bytes = pread(fd, entry.value_pos, entry.value_sz)6 return bytes # at most one disk seek