Replicas help reads, not writes

Build a distributed search engine (Elasticsearch / OpenSearch style) (12 scenes)

Scene 06 · Replicas help reads, not writes

Replicas linearly add read capacity and HA but do not speed up writes — every primary fans every write out to all in-sync replicas before acking.

Scene 06

Watch

Five primary shards, one replica each. Indexing clients send writes to the primary, which fans them out to every replica before acknowledging. Reads can hit any copy — primary or replica — picked by adaptive_replica_selection. Cluster status is GREEN.

Implementation

Primary.index

fan out to every in-sync replica, then ack

1def index(doc):
2    shard = hash(doc._id) % num_primary_shards
3    primary = routing_table.primary_for(shard)
4    primary.applyLocally(doc)
5    acks = []
6    for r in primary.in_sync_replicas:
7        acks.append(r.send(doc))   # parallel fan-out
8    wait_all(acks)                  # block on slowest
9    return ok(client)               # ack now

Coordinator.search

pick any copy per shard via adaptive_replica_selection

1def search(query):
2    for shard in all_primary_shards:
3        copies = primary + in_sync_replicas
4        copy = adaptive_replica_selection(copies)
5        scatter(copy, query)
6    # reads scale linearly with (1 + num_replicas)

PreviousShards — hash(routing) mod N NextScatter, reduce, fetch