In a single-database system, Atomicity (the 'A' in ACID) ensures that an operation either completes fully or fails completely. In a distributed system, a single operation might involve multiple machines (e.g., transferring money between users in two different shards).
What if Node A commits the transaction, but Node B crashes or the network fails before it can receive the commit? We end up with an inconsistent state.
The standard way to achieve atomic commit is the Two-Phase Commit (2PC) algorithm. It involves a Coordinator and several Participants.
Phase 1: Prepare
- The Coordinator sends a
PREPAREmessage to all Participants. - Each Participant checks if it can commit. If so, it writes to its log and replies
YES.
Phase 2: Commit
- If all Participants voted
YES, the Coordinator sendsCOMMIT. - If any Participant voted
NOor timed out, the Coordinator sendsABORT.
Two-Phase Commit (Successful Case)
The main drawback of 2PC is that it is a blocking protocol. If the Coordinator crashes after Phase 1, the Participants must wait indefinitely because they don't know the final decision.
Because 2PC is slow and prone to blocking, many modern distributed systems (especially Microservices) use Sagas. A Saga is a sequence of local transactions.
- Each local transaction updates the database and triggers the next step.
- If one step fails, the Saga executes compensating transactions to undo the previous steps.
Saga: Compensating Transactions
Use 2PC for strong consistency where you can afford the latency. Use Sagas for long-running processes where availability and decoupling are more important.
In the next chapter, we will explore the different flavors of consistency and how they impact system behavior.