Skip to content

Ever wondered what happens when you hit COMMIT? Journey into the storage engines, B-trees, WAL logs, and MVCC magic powering modern databases. From disk I/O to query optimization, learn how databases turn SQL into bits. Written by Opus 4.5 for developers who want to understand why their queries are slow.

License

Notifications You must be signed in to change notification settings

cloudstreet-dev/Database-Internals

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

2 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Database Internals: Where Your Data Actually Lives

A CloudStreet Educational Book

Written by Opus 4.5


Deploy Book

Read Online

Read the book online - Hosted on GitHub Pages


About This Book

Ever wondered what happens when you hit COMMIT? Why does that one query take 30 seconds while another returns instantly? What's actually going on when your database "recovers" after a crash?

This book takes you on a journey into the heart of database systems—the storage engines, B-trees, write-ahead logs, and MVCC implementations that power everything from your local SQLite database to planet-scale distributed systems. We'll explore how databases transform your SQL queries into disk operations, manage concurrent access from thousands of users, and guarantee your data survives power failures and hardware crashes.

Whether you're a developer trying to understand why your queries are slow, an engineer designing data-intensive systems, or simply curious about one of the most sophisticated pieces of software ever created, this book will give you the mental models to understand what's really happening beneath the abstraction layers.

Who This Book Is For

  • Backend developers who want to write better queries and design better schemas
  • Software engineers building systems that interact heavily with databases
  • System architects making decisions about data storage and retrieval
  • The curious who want to understand the engineering marvels hiding behind SELECT * FROM users

What You'll Learn

  • How data is physically organized on disk and in memory
  • The data structures that make queries fast (and when they don't)
  • How databases handle multiple users reading and writing simultaneously
  • What guarantees ACID actually provides and how they're implemented
  • Why write-ahead logging is essential for crash recovery
  • How query optimizers decide the best way to execute your SQL
  • The trade-offs between different storage engine architectures
  • How distributed databases maintain consistency across machines

Table of Contents

Part I: Foundations

  1. Introduction: The Journey of a Query
  2. Storage Engines and File Formats
  3. Disk I/O and Page Management

Part II: Data Structures

  1. Indexing Structures: B-Trees and Beyond
  2. LSM Trees and Write-Optimized Structures
  3. Hash Indexes and Specialized Structures

Part III: Transactions and Concurrency

  1. Write-Ahead Logging (WAL)
  2. MVCC and Transaction Isolation
  3. Locking and Concurrency Control

Part IV: Query Processing

  1. Query Parsing and Planning
  2. Query Optimization
  3. Buffer Pools and Caching

Part V: Reliability and Scale

  1. Recovery and Crash Safety
  2. Column Stores vs Row Stores
  3. Distributed Databases and Replication

Appendices

How to Read This Book

This book is designed to be read sequentially, as later chapters build on concepts introduced earlier. However, if you're already familiar with certain topics, feel free to skip ahead:

  • New to databases? Start from Chapter 1 and work through sequentially.
  • Know the basics? Skip to Part II for the data structure deep-dives.
  • Here for concurrency? Part III covers transactions, locking, and MVCC.
  • Query performance issues? Part IV on query processing will be most relevant.
  • Scaling up? Part V covers distributed systems and different storage architectures.

Building Locally

This book is built using mdBook. To build locally:

# Install mdBook
cargo install mdbook

# Build the book
mdbook build

# Serve locally with hot-reload
mdbook serve --open

Conventions Used

Throughout this book, we use several conventions:

  • Code blocks indicate SQL, pseudocode, or data structure representations
  • Bold terms indicate important concepts being introduced
  • Italics are used for emphasis and technical terms
  • ASCII diagrams illustrate data structures and system architectures
  • PostgreSQL is used as the primary reference implementation, with notes on how other databases differ

About the Author

This book was written by Opus 4.5, Anthropic's AI assistant, as part of the CloudStreet educational series. The content synthesizes knowledge from database research papers, system documentation, and practical engineering experience into an accessible guide for working developers.

License

This work is part of the CloudStreet Educational Series.


"The database is the most important software component in most applications, yet it remains a black box to most developers. Let's open that box."

— Opus 4.5

About

Ever wondered what happens when you hit COMMIT? Journey into the storage engines, B-trees, WAL logs, and MVCC magic powering modern databases. From disk I/O to query optimization, learn how databases turn SQL into bits. Written by Opus 4.5 for developers who want to understand why their queries are slow.

Topics

Resources

License

Stars

Watchers

Forks

Packages

No packages published

Contributors 2

  •  
  •  

Languages