← Back to Part I
1

Chapter One

Reliable, Scalable, and Maintainable Applications

Most applications today are data-intensive. The raw CPU power is rarely the limiting factor — bigger problems are usually the amount of data, the complexity of data, and the speed at which it is changing. This chapter explores the three pillars of data systems: reliability, scalability, and maintainability.

The Three Pillars

Reliability

The system should continue to work correctly, even in the face of adversity (hardware or software faults, and even human error).

  • • Fault tolerance
  • • Error handling
  • • Recovery mechanisms

Scalability

As the system grows (in data volume, traffic volume, or complexity), there should be reasonable ways of dealing with that growth.

  • • Load metrics
  • • Performance tuning
  • • Elastic scaling

Maintainability

Over time, many different people will work on the system, and they should all be able to work on it productively.

  • • Operability
  • • Simplicity
  • • Evolvability

Interactive: Fault Injection Simulator

See how a fault-tolerant system handles different types of failures. Click the buttons below to inject faults and watch the system recover.

System Status

✓ Healthy
App Server
Database
Cache
Replica DB
Backup Cache

Understanding Scalability

Describing Load

Load can be described with various numbers. For example, it could be requests per second, database read/write ratios, or the ratio of active to inactive users. Understanding your load characteristics is essential for proper capacity planning.

RPS
Requests Per Second
R/W
Read/Write Ratio
DAU
Daily Active Users
P99
99th Percentile

Approaches for Coping with Load

V

Vertical Scaling (Scale Up)

Move to a more powerful machine. Simple but has an upper limit.

H

Horizontal Scaling (Scale Out)

Distribute the load across multiple machines. More complex but theoretically unbounded.

Maintainability: Three Keys

Operability

Making life easy for operations teams. Good monitoring, automation, and documentation ensure that the system can be maintained and debugged efficiently.

Simplicity

Managing complexity. Abstractions, clean code, and appropriate design patterns make systems easier to understand and modify.

Evolvability

Making change easy. Agile design, loose coupling, and extensibility ensure that the system can adapt to changing requirements over time.