Arqly — System Design Mastery

#03Distributed Unique ID Generator

Generate globally unique, monotonic-ish IDs at scale.

Saved on this device

Build a globally-unique, k-sortable ID generator that every write in the platform depends on. The hard part isn't picking a bit layout — Twitter Snowflake's 41/10/12 has been right since 2010 — it's that ID generation is the single highest-leverage piece of state in the entire infrastructure. A 99.9% ID service drags every dependent service to 99.9%. A worker-id collision silently corrupts data at rest until a downstream consumer notices a primary-key violation hours later. A leap-second smear bug rewinds a host's clock and the next ID issued duplicates one already in storage.

The canonical here serves three caller classes side-by-side because that is what production teams actually run: (1) UUIDv7 in-process for ~90% of services that can take a 128-bit ID — zero network calls, ~2 µs per ID, immune to coordinator outages; (2) Snowflake RPC fallback for legacy callers that need 64-bit BIGINTs, sortable cursors, or polyglot interop; (3) Leaf-segment range allocator for human-facing sequential IDs (invoice numbers, order numbers) where customers expect short monotonic integers. The architecture's job is to make all three correct under the failure modes that historically cause silent dup-ID incidents — clock rewind, worker-id collision, segment-failover burn — and to detect any dup that does slip through within the next audit window.

This is a single-region active-active design with cross-region DR for the segment table. Multi-region active-active for ID generation is doable (per-region DC-id bits in Snowflake, per-region etcd cluster) but adds coordination cost without buying meaningful availability over the in-process default for the 90% of callers who don't need RPC at all.

Reading: Twitter Engineering — Announcing Snowflake (2010) · Discord — How Discord Stores Trillions of Messages (snowflake layout) · Instagram Engineering — Sharding & IDs at Instagram (PG PL/pgSQL next_id) · Flickr Code — Ticket Servers: Distributed Unique Primary Keys on the Cheap · Meituan Tech — Leaf: open source ID-gen (segment + snowflake, 双 buffer) · RFC 9562 — Universally Unique IDentifiers (incl. UUIDv7) · Sony — Sonyflake (39-bit 10ms-tick + 16-bit machine-id) · Cloudflare — How and why the leap second affected Cloudflare DNS (2017) · Meta Engineering — NTP service migration (chrony, 100µs precision) · Jepsen — MySQL 8.0.34 (semi-sync replication and binlog freshness) · PlanetScale — MySQL semi-sync: durability, consistency, split-brains · Shopify Engineering — Building Resilient Payment Systems (ULID vs UUIDv4) · Stripe Blog — Designing robust and predictable APIs with idempotency · Google SRE Workbook — Ch. 22 Addressing Cascading Failures · AWS Builders' Library — Timeouts, retries, and backoff with jitter

Snowflake bit budget (timestamp / worker / sequence)

UUIDv7 (RFC 9562) — index-locality vs randomness

worker-id leasing via ZooKeeper / etcd ephemeral nodes

Leaf-segment double-buffer range allocation

Flickr dual-master auto_increment offset

fail-static under coordinator outage

monotonic clock vs wall clock — halt-on-rewind

outbox for segment-allocate + audit-event atomicity

downstream uniqueness probe as first-class SLI

epoch / worker-id bit-budget exhaustion planning