Chapter One
Reliable, Scalable, and Maintainable Applications
Most applications today are data-intensive. The raw CPU power is rarely the limiting factor — bigger problems are usually the amount of data, the complexity of data, and the speed at which it is changing. This chapter explores the three pillars of data systems: reliability, scalability, and maintainability.
The Three Pillars
Reliability
The system should continue to work correctly, even in the face of adversity (hardware or software faults, and even human error).
- • Fault tolerance
- • Error handling
- • Recovery mechanisms
Scalability
As the system grows (in data volume, traffic volume, or complexity), there should be reasonable ways of dealing with that growth.
- • Load metrics
- • Performance tuning
- • Elastic scaling
Maintainability
Over time, many different people will work on the system, and they should all be able to work on it productively.
- • Operability
- • Simplicity
- • Evolvability
Interactive: Fault Injection Simulator
See how a fault-tolerant system handles different types of failures. Click the buttons below to inject faults and watch the system recover.
System Status
Understanding Scalability
Describing Load
Load can be described with various numbers. For example, it could be requests per second, database read/write ratios, or the ratio of active to inactive users. Understanding your load characteristics is essential for proper capacity planning.
Approaches for Coping with Load
Vertical Scaling (Scale Up)
Move to a more powerful machine. Simple but has an upper limit.
Horizontal Scaling (Scale Out)
Distribute the load across multiple machines. More complex but theoretically unbounded.
Maintainability: Three Keys
Operability
Making life easy for operations teams. Good monitoring, automation, and documentation ensure that the system can be maintained and debugged efficiently.
Simplicity
Managing complexity. Abstractions, clean code, and appropriate design patterns make systems easier to understand and modify.
Evolvability
Making change easy. Agile design, loose coupling, and extensibility ensure that the system can adapt to changing requirements over time.