Why Replicate?

Replication means keeping a copy of the same data on multiple machines that are connected via a network. This is fundamental for building systems that are resilient to hardware failures and geographically closer to users.

Intuition
Core Benefits
  1. High Availability: System stays up even if one node goes down.
  2. Disconnected Operation: Allows apps to continue working when network is down.
  3. Latency Reduction: Placing data closer to users geographically.
  4. Scalability: Increasing read throughput by distributing queries.
Single-Leader Replication

This is the most common approach (used in MySQL, PostgreSQL, MongoDB). One node is designated as the Leader.

  1. The Leader receives all write requests.
  2. It writes to its local storage and sends the data change to all Followers.
  3. Followers apply the changes in the same order.

Synchronous vs. Asynchronous Replication

ClientLeaderFollower (Sync)Follower (Async)Write(x=5)ReplicateACKOKReplicateACK
!Common pitfall
The Trade-off

Synchronous replication guarantees that the follower has a copy, but the write fails if the follower is slow. Asynchronous replication is fast but risks data loss if the leader fails before the follower receives the update.

Quorums (Leaderless Replication)

Popularized by Amazon's Dynamo and used in Cassandra/Riak. There is no leader; any node can accept writes.

To ensure consistency, we use a Quorum. If we have nn replicas, every write must be confirmed by ww nodes, and every read must query rr nodes.

TTheorem 5.1
Quorum Condition

To guarantee that a read sees the latest write, the overlap between the write set and read set must be non-empty:

(5.1)
w+r>nw + r > n

Commonly, n=3,w=2,r=2n = 3, w = 2, r = 2.

Configurationw + r > n?Property
n=3, w=2, r=24 > 3 (Yes)Strong consistency (Read-after-write)
n=3, w=3, r=14 > 3 (Yes)Heavy write, fast read
n=3, w=1, r=12 > 3 (No)Fast but inconsistent (Eventual only)

In the next chapter, we will look at how to handle datasets that are too large to fit on a single node via partitioning.