Brewer's Theorem

The CAP theorem, formulated by Eric Brewer, states that it is impossible for a distributed data store to simultaneously provide more than two out of the following three guarantees:

  1. Consistency (C): Every read receives the most recent write or an error.
  2. Availability (A): Every request receives a (non-error) response, without the guarantee that it contains the most recent write.
  3. Partition Tolerance (P): The system continues to operate despite an arbitrary number of messages being dropped (or delayed) by the network between nodes.
Nice to know
P is not optional

Because distributed systems are built on asynchronous networks, network partitions are inevitable. Therefore, a distributed system must choose Partition Tolerance. The real choice is between Consistency and Availability during a partition.

System TypeBenefitCommon Example
CPConsistent + Partition TolerantHBase, MongoDB
APAvailable + Partition TolerantCassandra, DynamoDB
CAConsistent + AvailableRDBMS (on a single node)
The Tradeoff: CP vs AP

When a network partition occurs, the nodes on different sides of the partition cannot communicate.

CP (Consistency + Partition Tolerance)

If the system chooses consistency over availability, it must ensure that no stale data is returned. Therefore, nodes that cannot communicate with the primary or a majority of nodes will simply return an error (or time out) rather than return potentially outdated information.

  • Examples: HBase, MongoDB (default), Redis, Zookeeper.

AP (Availability + Partition Tolerance)

If the system chooses availability over consistency, it will return the most recent version of the data it has, even if it cannot guarantee it is the absolute latest version.

  • Examples: Cassandra, CouchDB, DynamoDB, Riak.

System Trade-offs Analysis

ConsistencyAvailabilityLatencyScalabilityPartition

Visualizing a Network Partition (P)

Read/WriteRead/WriteHeartbeatPartitioned!
Client 1
Client 2
Node A
Node B
Node C
System TypeDuring PartitionUse Case
CPReturns Error/TimeoutBanking, Billing, Inventory
APReturns Stale DataSocial Media Feeds, Shopping Carts
Beyond CAP: The PACELC Theorem

The PACELC theorem, described by Daniel Abadi in 2010, extends the CAP theorem by addressing the tradeoff between latency and consistency even when there is no network partition.

PACELC stands for:

  • If there is a Partition, how does the system choose between Availability and Consistency?
  • Else (no partition), how does the system choose between Latency and Consistency?
(3.1)
PACELC: P(AC),  E(LC)\text{PACELC: } P \Rightarrow (A \lor C),\; E \Rightarrow (L \lor C)

Common PACELC Classifications

  1. PA/EL: Choose availability during partition, low latency during normal operation (e.g., DynamoDB, Cassandra).
  2. PC/EC: Choose consistency during partition, consistency during normal operation (e.g., BigTable, HBase).
  3. PA/EC: Choose availability during partition, but prioritize consistency during normal operation (e.g., MongoDB with primary-only reads).
Intuition
The Latency Tradeoff

Even in a perfectly healthy network, sending data to multiple replicas for consistency takes more time than returning a result from a single local node. This latency tax is the core of the else side of PACELC.