What is a Distributed System?

A distributed system is a collection of independent computers that appears to its users as a single coherent system. The nodes in a distributed system communicate and coordinate their actions by passing messages.

System ScalabilityRelationship between Number of Nodes and Throughput in an ideal distributed system.
Intuition
The Core Idea

The goal is to make a collection of independent computers appear to its users as a single coherent system.

DDefinition 1.1
Distributed System

A system whose components are located on different networked computers, which communicate and coordinate their actions by passing messages to one another in order to achieve a common goal.

Key Characteristics

  1. Concurrency of components: Multiple machines execute simultaneously.
  2. Lack of a global clock: It's difficult to establish a true ordering of events.
  3. Independent failure of components: One machine can fail while others continue running.
Intuition
Why do we build them?

We build distributed systems to scale beyond the limits of a single machine, to provide high availability and fault tolerance, and to reduce latency by placing data closer to users geographically.

The 8 Fallacies

When engineers first start building distributed systems, they often make assumptions that are true for a single machine but false for a network of machines. These are known as the 8 Fallacies of Distributed Computing.

AssumptionRealityImpact
The network is reliablePackets drop, connections breakNeed retries and timeouts
Latency is zeroSpeed of light limits communicationNeed batching and caching
Bandwidth is infiniteNetwork pipes get congestedNeed data compression
The network is secureData can be interceptedNeed encryption (TLS/SSL)
Topology doesn't changeNodes join and leave constantlyNeed dynamic routing/discovery
There is one administratorDifferent teams manage different partsNeed strict API contracts
Transport cost is zeroMoving data costs moneyNeed data locality optimization
The network is homogeneousDifferent hardware and OSsNeed standardized protocols
!Common pitfall
Designing for Failure

Because the network is unreliable, any RPC (Remote Procedure Call) can fail. You must design your system expecting that any external call might time out or return an error.

In the next section, we will explore the profound implications of the lack of a global clock.

← Previous
Beginning
Course Progression
1 of 10
Next →
Time & Clocks