A CloudStreet Educational Book
Written by Opus 4.5
Read the book online - Hosted on GitHub Pages
Ever wondered what happens when you hit COMMIT? Why does that one query take 30 seconds while another returns instantly? What's actually going on when your database "recovers" after a crash?
This book takes you on a journey into the heart of database systems—the storage engines, B-trees, write-ahead logs, and MVCC implementations that power everything from your local SQLite database to planet-scale distributed systems. We'll explore how databases transform your SQL queries into disk operations, manage concurrent access from thousands of users, and guarantee your data survives power failures and hardware crashes.
Whether you're a developer trying to understand why your queries are slow, an engineer designing data-intensive systems, or simply curious about one of the most sophisticated pieces of software ever created, this book will give you the mental models to understand what's really happening beneath the abstraction layers.
- Backend developers who want to write better queries and design better schemas
- Software engineers building systems that interact heavily with databases
- System architects making decisions about data storage and retrieval
- The curious who want to understand the engineering marvels hiding behind
SELECT * FROM users
- How data is physically organized on disk and in memory
- The data structures that make queries fast (and when they don't)
- How databases handle multiple users reading and writing simultaneously
- What guarantees ACID actually provides and how they're implemented
- Why write-ahead logging is essential for crash recovery
- How query optimizers decide the best way to execute your SQL
- The trade-offs between different storage engine architectures
- How distributed databases maintain consistency across machines
- Indexing Structures: B-Trees and Beyond
- LSM Trees and Write-Optimized Structures
- Hash Indexes and Specialized Structures
This book is designed to be read sequentially, as later chapters build on concepts introduced earlier. However, if you're already familiar with certain topics, feel free to skip ahead:
- New to databases? Start from Chapter 1 and work through sequentially.
- Know the basics? Skip to Part II for the data structure deep-dives.
- Here for concurrency? Part III covers transactions, locking, and MVCC.
- Query performance issues? Part IV on query processing will be most relevant.
- Scaling up? Part V covers distributed systems and different storage architectures.
This book is built using mdBook. To build locally:
# Install mdBook
cargo install mdbook
# Build the book
mdbook build
# Serve locally with hot-reload
mdbook serve --openThroughout this book, we use several conventions:
Code blocksindicate SQL, pseudocode, or data structure representations- Bold terms indicate important concepts being introduced
- Italics are used for emphasis and technical terms
- ASCII diagrams illustrate data structures and system architectures
- PostgreSQL is used as the primary reference implementation, with notes on how other databases differ
This book was written by Opus 4.5, Anthropic's AI assistant, as part of the CloudStreet educational series. The content synthesizes knowledge from database research papers, system documentation, and practical engineering experience into an accessible guide for working developers.
This work is part of the CloudStreet Educational Series.
"The database is the most important software component in most applications, yet it remains a black box to most developers. Let's open that box."
— Opus 4.5