Introduction & Fallacies

What is a Distributed System?

A distributed system is a collection of independent computers that appears to its users as a single coherent system. The nodes in a distributed system communicate and coordinate their actions by passing messages.

System ScalabilityRelationship between Number of Nodes and Throughput in an ideal distributed system.

✦Intuition

The Core Idea

The goal is to make a collection of independent computers appear to its users as a single coherent system.

DDefinition 1.1

Distributed System

A system whose components are located on different networked computers, which communicate and coordinate their actions by passing messages to one another in order to achieve a common goal.

Key Characteristics

Concurrency of components: Multiple machines execute simultaneously.
Lack of a global clock: It's difficult to establish a true ordering of events.
Independent failure of components: One machine can fail while others continue running.

✦Intuition

Why do we build them?

We build distributed systems to scale beyond the limits of a single machine, to provide high availability and fault tolerance, and to reduce latency by placing data closer to users geographically.

The 8 Fallacies

When engineers first start building distributed systems, they often make assumptions that are true for a single machine but false for a network of machines. These are known as the 8 Fallacies of Distributed Computing.

Assumption	Reality	Impact
The network is reliable	Packets drop, connections break	Need retries and timeouts
Latency is zero	Speed of light limits communication	Need batching and caching
Bandwidth is infinite	Network pipes get congested	Need data compression
The network is secure	Data can be intercepted	Need encryption (TLS/SSL)
Topology doesn't change	Nodes join and leave constantly	Need dynamic routing/discovery
There is one administrator	Different teams manage different parts	Need strict API contracts
Transport cost is zero	Moving data costs money	Need data locality optimization
The network is homogeneous	Different hardware and OSs	Need standardized protocols

!Common pitfall

Designing for Failure

Because the network is unreliable, any RPC (Remote Procedure Call) can fail. You must design your system expecting that any external call might time out or return an error.

In the next section, we will explore the profound implications of the lack of a global clock.

← Previous

Beginning

Course Progression

1 of 10

Time & Clocks