The Atomicity Problem

In a single-database system, Atomicity (the 'A' in ACID) ensures that an operation either completes fully or fails completely. In a distributed system, a single operation might involve multiple machines (e.g., transferring money between users in two different shards).

!Common pitfall
Partial Failures

What if Node A commits the transaction, but Node B crashes or the network fails before it can receive the commit? We end up with an inconsistent state.

Atomic Commit & 2PC

The standard way to achieve atomic commit is the Two-Phase Commit (2PC) algorithm. It involves a Coordinator and several Participants.

Phase 1: Prepare

  1. The Coordinator sends a PREPARE message to all Participants.
  2. Each Participant checks if it can commit. If so, it writes to its log and replies YES.

Phase 2: Commit

  1. If all Participants voted YES, the Coordinator sends COMMIT.
  2. If any Participant voted NO or timed out, the Coordinator sends ABORT.

Two-Phase Commit (Successful Case)

CoordinatorParticipant 1Participant 2PREPAREPREPAREYESYESCOMMITCOMMIT
TTheorem 7.1
The Coordinator's Dilemma

The main drawback of 2PC is that it is a blocking protocol. If the Coordinator crashes after Phase 1, the Participants must wait indefinitely because they don't know the final decision.

The Saga Pattern

Because 2PC is slow and prone to blocking, many modern distributed systems (especially Microservices) use Sagas. A Saga is a sequence of local transactions.

  1. Each local transaction updates the database and triggers the next step.
  2. If one step fails, the Saga executes compensating transactions to undo the previous steps.

Saga: Compensating Transactions

ClientInventoryPaymentReserve ItemReservedProcess PaymentPayment FailedCancel Reservation
Intuition
2PC vs. Saga

Use 2PC for strong consistency where you can afford the latency. Use Sagas for long-running processes where availability and decoupling are more important.

In the next chapter, we will explore the different flavors of consistency and how they impact system behavior.