Build a distributed search engine (Elasticsearch / OpenSearch style) (12 scenes)
Scene 05 · Shards — hash(routing) mod N
Each document lives on shard = hash(routing) mod number_of_primary_shards — and that mod is exactly why you cannot change the shard count after you create the index.
Previously
Everything so far fits on one machine. 5 million books at 2 KB of body each is 10 GB of postings — fine for one node. 500 million is not, and the routing decision has to be made before the document even reaches an inverted index.
Scene 05
Shards — hash(routing) mod N
Diagram
Indexing clients on the left send PUT /books/_doc/{id}. The hash router in the middle computes shard = hash(routing) % N to pick a primary shard. Each primary shard on the right is a complete Lucene index — its own segments, its own translog, its own commit point — receiving the docs whose hash lands on it.
Five books, five primary shards. Each PUT computes hash(_id) % 5 and lands on exactly one shard. Watch each one route — the same _id will always land on the same shard, every time, forever.
Implementation
Router.routeShard
modular hashing — same key, same shard, forever
1def routeShard(doc):2 key = doc.routing or doc._id3 return hash(key) % num_primary_shards