DaemonDB is a lightweight relational database engine built from scratch in Go. It implements core database concepts including B+ tree indexing, heap file storage, SQL parsing, and query execution, designed for educational purposes and learning database internals.
DaemonDB provides a clean, well-documented implementation of fundamental database components. The project implements a complete database stack from storage to query execution:
Key Features:
- Complete B+ tree with insert, search, delete, and range scan operations
- Heap file storage system with slot directory for O(1) row lookup
- SQL parser supporting DDL and DML statements
- Query executor with bytecode-based virtual machine
- Page-based storage architecture (4KB pages)
- Thread-safe operations with proper concurrency control
The database follows a layered architecture separating storage, indexing, and query processing:
┌─────────────────────────────────────────┐
│ SQL Query Layer │
│ (Parser → Code Generator → Executor) │
└──────────────┬──────────────────────────┘
│
▼
┌─────────────────────────────────────────┐
│ Index Layer (B+ Tree) │
│ PrimaryKey → RowPointer(File,Page,Slot)│
└──────────────┬──────────────────────────┘
│
▼
┌─────────────────────────────────────────┐
│ Storage Layer (Heap Files) │
│ Pages (4KB) with Slot Directory │
└─────────────────────────────────────────┘
- Query Parser: Lexical analysis and syntax parsing of SQL statements
- Code Generator: Converts AST to bytecode instructions
- Query Executor (VM): Executes bytecode, orchestrates B+ tree and heap file operations
- B+ Tree: Index structure mapping primary keys to row locations
- Heap File Manager: Manages row storage in page-based heap files
- Pager: Abstract interface for page-level I/O (currently in-memory)
The B+ tree serves as the primary index, providing O(log n) performance for key lookups and range scans.
- Internal Nodes: Store separator keys and child pointers (navigation only)
- Leaf Nodes: Store key-value pairs with linked list structure for range scans
- Node Capacity: 32 keys per node (configurable via
MaxKeys) - Automatic Splitting: Nodes split when capacity exceeded, with parent propagation
- Balanced Tree: All leaves at same depth, maintains B+ tree invariants
// Initialize tree
pager := bplus.NewInMemoryPager()
cache := bplus.NewBufferPool(10)
tree := bplus.NewBPlusTree(pager, cache, bytes.Compare)
// Insert data
tree.Insertion([]byte("S001"), []byte("RowPointerBytes"))
// Search data
result, _ := tree.Search([]byte("S001"))
fmt.Printf("Found: %s\n", string(result))
// Delete data
tree.Delete([]byte("S001"))
// Range scan
iter := tree.SeekGE([]byte("S001"))
for iter.Valid() {
key := iter.Key()
value := iter.Value()
iter.Next()
}Completed Features:
- ✅ Leaf node insertion with splitting
- ✅ Internal node splitting with parent propagation
- ✅ Recursive parent updates (handles multi-level splits)
- ✅ Delete operations with borrow/merge logic
- ✅ Binary search optimization for key lookups
- ✅ Range scan iterator (SeekGE, Next)
- ✅ Thread-safe operations with reader-writer locks
File Structure:
struct.go: Node and tree data structuresinsertion.go: Insert operationssplit_leaf.go: Leaf node splittingsplit_internal.go: Internal node splittingparent_insert.go: Parent propagation logicdeletion.go: Delete with borrow/mergesearch.go: Point lookupiterator.go: Range scan operationsfind_leaf.go: Leaf node navigationbinary_search.go: Binary search utilities
The heap file system stores actual row data in page-based files with a slot directory for efficient row access.
Each 4KB page contains:
- Header (32 bytes): FileID, PageNo, FreePtr, NumRows, SlotCount, etc.
- Data Area: Rows stored sequentially from offset 32
- Slot Directory: Grows backward from end of page (4 bytes per slot: offset + length)
┌─────────────────────────────────────────┐
│ Page (4096 bytes) │
├─────────────────────────────────────────┤
│ Header (32B): metadata │
├─────────────────────────────────────────┤
│ Row 1 | Row 2 | Row 3 | ... │
│ (grows forward) │
├─────────────────────────────────────────┤
│ ... │
│ Slot 3 | Slot 2 | Slot 1 | Slot 0 │
│ (grows backward) │
└─────────────────────────────────────────┘
// Create heap file manager
hfm, _ := heapfile.NewHeapFileManager("./data")
// Create heap file for a table
hfm.CreateHeapfile("students", fileID)
// Insert row
rowPtr, _ := hfm.InsertRow(fileID, rowData)
// Returns: RowPointer{FileID, PageNumber, SlotIndex}
// Read row
rowData, _ := hfm.GetRow(rowPtr)Completed Features:
- ✅ Page-based storage (4KB pages)
- ✅ Slot directory for O(1) row lookup within pages
- ✅ Automatic page creation when pages fill up
- ✅ Row insertion with space management
- ✅ Row retrieval using RowPointer
- ✅ Thread-safe file operations
File Structure:
struct.go: PageHeader, Slot, RowPointer, HeapFile structuresheapfile.go: HeapFile operations (insertRow, GetRow, findSuitablePage)heapfile_manager.go: HeapFileManager (CreateHeapfile, InsertRow, GetRow)page_io.go: Low-level page read/write operationspage_header.go: Header serialization/deserializationslots.go: Slot directory operations (readSlot, writeSlot, addSlot)
Architecture:
- 1 Table = 1 HeapFile (one
.heapfile per table) - 1 HeapFile = Multiple Pages (grows automatically as needed)
- 1 Page = Multiple Rows (~100-200 rows per page, depends on row size)
A complete SQL processing pipeline from lexical analysis to AST generation.
-- Table creation
CREATE TABLE students {
id int,
name string,
age int,
grade string
}
-- Data insertion
INSERT INTO students VALUES ("S001", "Alice", 20, "A")
INSERT INTO students VALUES ("S002", "Bob", 21, "B")
-- Data querying
SELECT * FROM students
SELECT name, grade FROM students
-- Data updates
UPDATE students SET grade = "A+" WHERE id = "S001"
-- Table management
DROP students- Lexer: Hand-written tokenizer for SQL keywords, identifiers, literals
- Parser: Recursive descent parser for syntax analysis
- AST: Abstract syntax tree generation for each statement type
- Code Generator: Converts AST to bytecode instructions
File Structure:
lexer/lexer.go: Tokenization implementationlexer/token.go: Token definitionsparser/parser.go: Recursive descent parserparser/ast.go: AST node definitionscode-generator/code_generator.go: Bytecode emission
The query executor uses a bytecode-based virtual machine (VDBE-style) to execute SQL statements.
SQL Query
↓
Parser → AST
↓
Code Generator → Bytecode Instructions
↓
VM.Execute()
├─→ HeapFileManager.InsertRow() → Write row data
├─→ B+ Tree.Insertion() → Index the row
└─→ Return result
Completed:
- ✅ CREATE TABLE execution
- ✅ INSERT execution (writes to heap file + indexes in B+ tree)
- ✅ Bytecode instruction set (OP_PUSH_VAL, OP_INSERT, OP_SELECT, etc.)
- ✅ Row serialization/deserialization
- ✅ Primary key extraction
- ✅ RowPointer serialization
In Progress:
- 🚧 SELECT execution (parser complete, executor TODO)
- 🚧 UPDATE execution (parser complete, executor TODO)
- 🚧 DELETE execution (parser complete, executor TODO)
File Structure:
executor.go: VM implementation and statement executionhelpers.go: Serialization utilities, table schema management
DaemonDB/
├── bplustree/ # B+ Tree index implementation
│ ├── struct.go # Data structures
│ ├── insertion.go # Insert operations
│ ├── deletion.go # Delete operations
│ ├── search.go # Point lookup
│ ├── iterator.go # Range scan
│ ├── split_leaf.go # Leaf splitting
│ ├── split_internal.go # Internal node splitting
│ ├── parent_insert.go # Parent propagation
│ ├── find_leaf.go # Leaf navigation
│ ├── binary_search.go # Binary search utilities
│ ├── pager.go # Pager interface (in-memory)
│ └── ...
├── heapfile_manager/ # Heap file storage system
│ ├── struct.go # PageHeader, Slot, RowPointer
│ ├── heapfile.go # HeapFile operations
│ ├── heapfile_manager.go # HeapFileManager
│ ├── page_io.go # Page read/write
│ ├── page_header.go # Header serialization
│ ├── slots.go # Slot directory operations
│ └── heapfile_test.go # Comprehensive tests
├── query_parser/ # SQL parsing
│ ├── lexer/ # Lexical analysis
│ ├── parser/ # Syntax analysis
│ └── code-generator/ # Bytecode generation
├── query_executor/ # Query execution
│ ├── executor.go # VM and execution
│ └── helpers.go # Utilities
├── main.go # Entry point
└── README.md # This file
go run main.goThen enter SQL queries:
CREATE TABLE students {
id string,
name string,
age int,
grade string
}
INSERT INTO students VALUES ("S001", "Alice", 20, "A")
INSERT INTO students VALUES ("S002", "Bob", 21, "B")cd heapfile_manager
go test -v -run TestAllThis runs comprehensive tests:
- Basic insert/read operations
- Multiple pages
- Slot directory functionality
- Page header management
cd bplustree
go run bplus.go| Component | Status | Description |
|---|---|---|
| B+ Tree Core | ✅ Complete | Full CRUD with parent propagation, internal splits |
| B+ Tree Iterator | ✅ Complete | Range scan operations (SeekGE, Next) |
| Heap File Storage | ✅ Complete | Page-based storage with slot directory |
| Heap File Operations | ✅ Complete | Insert, GetRow (Delete/Update TODO) |
| SQL Parser | ✅ Complete | Lexer and parser for DDL/DML |
| Code Generator | ✅ Complete | AST to bytecode conversion |
| Query Executor | 🚧 Partial | INSERT/CREATE TABLE working, SELECT/UPDATE/DELETE TODO |
| Concurrency | ✅ Complete | Thread-safe operations |
| File Persistence | 🚧 Partial | Heap files on disk, B+ tree in-memory |
| Buffer Pool | 📋 Planned | Full buffer pool API (Get/Put/Pin/Unpin) |
| Node Serialization | 📋 Planned | Encode/decode nodes to pages |
1. User: INSERT INTO students VALUES ("S001", "Alice", 20, "A")
↓
2. Parser: Parse SQL → AST
↓
3. Code Generator: AST → Bytecode
↓
4. VM.Execute():
a. SerializeRow() → Convert values to bytes
b. HeapFileManager.InsertRow() → Write to heap file
→ Returns: RowPointer(FileID=1, PageNumber=0, SlotIndex=3)
c. SerializeRowPointer() → Convert to 8 bytes
d. ExtractPrimaryKey() → "S001"
e. B+ Tree.Insertion("S001", RowPointerBytes)
→ Stores index: "S001" → RowPointer
1. User: SELECT * FROM students WHERE id = "S001"
↓
2. Parser: Parse SQL → AST
↓
3. Code Generator: AST → Bytecode
↓
4. VM.Execute():
a. B+ Tree.Search("S001")
→ Returns: RowPointer bytes
b. DeserializeRowPointer() → RowPointer(1, 0, 3)
c. HeapFileManager.GetRow(RowPointer)
→ Reads page 0, slot 3 → Returns row data
d. DeserializeRow() → Convert bytes to values
e. Return result to user
- B+ Tree Search: O(log n) for point lookups
- B+ Tree Range Scan: O(log n + k) where k is result size
- Heap File Insert: O(1) per row (amortized)
- Heap File Read: O(1) using slot directory
- Page Size: 4KB (disk-aligned)
- Node Capacity: 32 keys per node
- Concurrency: Reader-writer locks for optimal read performance
- Language: Go 1.19+
- Storage: Heap files on disk (4KB pages)
- Indexing: B+ tree (currently in-memory, disk persistence planned)
- Query Language: SQL with DDL/DML support
- Concurrency: Thread-safe with mutex locks
- Architecture: Index-organized (B+ tree points to heap file rows)
The project includes comprehensive tests:
# Test heap file system
cd heapfile_manager
go test -v
# Test specific functionality
go test -v -run TestHeapFileOperations
go test -v -run TestMultiplePages
go test -v -run TestSlotDirectory- Implement SELECT execution
- Implement UPDATE/DELETE operations in heap files
- Add buffer pool API (Get/Put/Pin/Unpin with eviction)
- Implement node serialization for B+ tree persistence
- Add file-backed pager for B+ tree
- Implement WHERE clause filtering
- Add transaction support
- Implement WAL (Write-Ahead Logging) for durability
This project is licensed under the MIT License.
This is an educational project built for learning database internals. Contributions and suggestions are welcome!
