← Back to Part I
3
Chapter Three
Storage and Retrieval
How do databases store and retrieve data on disk? Understanding storage engines is crucial for choosing the right database for your workload and optimizing performance.
Data Structures for Storage
Hash Index
Key-value mapping with O(1) lookups
Pros:
- • Fast lookups
- • Simple implementation
Cons:
- • No range queries
- • Memory intensive
LSM-Tree
Log-Structured Merge tree for write-heavy workloads
Pros:
- • Fast writes
- • Excellent compression
Cons:
- • Slower reads
- • Compaction overhead
B-Tree
Balanced tree structure for read-heavy workloads
Pros:
- • Predictable performance
- • Range queries
Cons:
- • Slower writes
- • More storage
Transaction Processing vs Analytics
oltp
OLTP
Online Transaction Processing
- →Many small queries
- →Random access
- →User-facing
olap
OLAP
Online Analytics Processing
- →Few large queries
- →Sequential scans
- →Business intelligence
Column-Oriented Storage
Columnar storage is optimized for analytics workloads. Instead of storing data row by row, it stores each column separately, enabling efficient compression and query performance for analytical queries.
Advantages
- • Excellent compression (similar data together)
- • Fast for aggregations
- • Only reads needed columns
- • Efficient for wide tables
Trade-offs
- • Slow for single-row lookups
- • Write overhead (column assembly)
- • Not ideal for transactional workloads
- • More complex for point queries