The CAP theorem, formulated by Eric Brewer, states that it is impossible for a distributed data store to simultaneously provide more than two out of the following three guarantees:
- Consistency (C): Every read receives the most recent write or an error.
- Availability (A): Every request receives a (non-error) response, without the guarantee that it contains the most recent write.
- Partition Tolerance (P): The system continues to operate despite an arbitrary number of messages being dropped (or delayed) by the network between nodes.
Because distributed systems are built on asynchronous networks, network partitions are inevitable. Therefore, a distributed system must choose Partition Tolerance. The real choice is between Consistency and Availability during a partition.
| System Type | Benefit | Common Example |
|---|---|---|
| CP | Consistent + Partition Tolerant | HBase, MongoDB |
| AP | Available + Partition Tolerant | Cassandra, DynamoDB |
| CA | Consistent + Available | RDBMS (on a single node) |
When a network partition occurs, the nodes on different sides of the partition cannot communicate.
CP (Consistency + Partition Tolerance)
If the system chooses consistency over availability, it must ensure that no stale data is returned. Therefore, nodes that cannot communicate with the primary or a majority of nodes will simply return an error (or time out) rather than return potentially outdated information.
- Examples: HBase, MongoDB (default), Redis, Zookeeper.
AP (Availability + Partition Tolerance)
If the system chooses availability over consistency, it will return the most recent version of the data it has, even if it cannot guarantee it is the absolute latest version.
- Examples: Cassandra, CouchDB, DynamoDB, Riak.
System Trade-offs Analysis
Visualizing a Network Partition (P)
| System Type | During Partition | Use Case |
|---|---|---|
| CP | Returns Error/Timeout | Banking, Billing, Inventory |
| AP | Returns Stale Data | Social Media Feeds, Shopping Carts |
The PACELC theorem, described by Daniel Abadi in 2010, extends the CAP theorem by addressing the tradeoff between latency and consistency even when there is no network partition.
PACELC stands for:
- If there is a Partition, how does the system choose between Availability and Consistency?
- Else (no partition), how does the system choose between Latency and Consistency?
Common PACELC Classifications
- PA/EL: Choose availability during partition, low latency during normal operation (e.g., DynamoDB, Cassandra).
- PC/EC: Choose consistency during partition, consistency during normal operation (e.g., BigTable, HBase).
- PA/EC: Choose availability during partition, but prioritize consistency during normal operation (e.g., MongoDB with primary-only reads).
Even in a perfectly healthy network, sending data to multiple replicas for consistency takes more time than returning a result from a single local node. This latency tax is the core of the else side of PACELC.