Build a Bitcask-style KV store (9 scenes)
Scene 6.5 · fsync — pick two of safe, fast, simple
sync_strategy is a three-way trade (o_sync / interval / none). fsync controls RECENCY; CRC controls VALIDITY — two separate guarantees.
Previously
CRCs answered VALIDITY — the log is parseable no matter what. fsync answers RECENCY — how much of the most-recent tail you keep. Two separate guarantees, and Bitcask makes you pick the recency budget.
Scene 6.5
fsync — pick two of safe, fast, simple
Diagram
Active file's tail. Durable-up-to marker trails the write head; yellow glow = bytes the OS hasn't fsynced. Throughput meter + kill -9.
A 100k writes/sec stream lands in the active file. Watch the durable-up-to marker walk behind the write head; the yellow glow between them is the vulnerable region — bytes the OS hasn't flushed to disk yet. The interval=1s policy is showing here; you'll switch policies in a moment.
Implementation
Writer.append_with_policy
the put hot path — fsync is RECENCY, branched on sync_strategy
1def append_with_policy(record, policy):2 active_file.write(encode(record)) # append to tail3 if policy == 'o_sync':4 os.fsync(active_file.fd) # block until on platter5 durable_up_to = active_file.tell()6 elif policy == 'interval':7 pass # interval_fsync_thread handles it8 else: # 'none' — Riak default9 pass # OS writeback decides; loss window 5-30s10 return ack
IntervalFsync.run
background flusher — the recency window is now - last_fsync
1def interval_fsync_thread(active_file, interval_s):2 while running:3 sleep(interval_s)4 os.fsync(active_file.fd)5 last_fsync_time = now()6 durable_up_to = active_file.tell()7 # vulnerable bytes = bytes appended since last_fsync_time;8 # bounded by interval_s seconds at line rate
Recovery.validate_log
startup scan — CRC is VALIDITY, runs regardless of policy
1def validate_log(file):2 valid_end = 03 for record, offset in scan(file):4 expected = crc32(record.body)5 if record.crc != expected:6 break # torn tail — drop everything from here7 valid_end = offset + record.size8 file.truncate(valid_end) # log is now parseable9 return valid_end