Chapter Three

Storage and Retrieval

How do databases store and retrieve data on disk? Understanding storage engines is crucial for choosing the right database for your workload and optimizing performance.

Data Structures for Storage

Hash Index

Key-value mapping with O(1) lookups

Pros:

• Fast lookups
• Simple implementation

Cons:

• No range queries
• Memory intensive

LSM-Tree

Log-Structured Merge tree for write-heavy workloads

Pros:

• Fast writes
• Excellent compression

Cons:

• Slower reads
• Compaction overhead

B-Tree

Balanced tree structure for read-heavy workloads

Pros:

• Predictable performance
• Range queries

Cons:

• Slower writes
• More storage

Transaction Processing vs Analytics

oltp

OLTP

Online Transaction Processing

→Many small queries
→Random access
→User-facing

olap

OLAP

Online Analytics Processing

→Few large queries
→Sequential scans
→Business intelligence

Column-Oriented Storage

Columnar storage is optimized for analytics workloads. Instead of storing data row by row, it stores each column separately, enabling efficient compression and query performance for analytical queries.

Advantages

• Excellent compression (similar data together)
• Fast for aggregations
• Only reads needed columns
• Efficient for wide tables

Trade-offs

• Slow for single-row lookups
• Write overhead (column assembly)
• Not ideal for transactional workloads
• More complex for point queries

← Chapter 2 Chapter 4: Encoding →