As a venture into the C programming language, I decided to try out the following tutorial:
https://github.com/cstack/db_tutorial
This is going to be a database built from the ground up.
The creator of the tutorial breaks the abstract architecture (using sqlite
as an example) of a database down into:
- A front end which:
- Tokenizes input.
- Parses tokenized input.
- Generates bytecode from the parsed input for the backend VM.
- A back end which:
- Receives bytecode instructions that operate on a self balancing B-tree data structure.
- A pager for I/O and caching operations.
The first step in the tutorial is to create a function that accepts user input so that we can provide it with both database queries as well as "meta-commands".
"Non-SQL statements like .exit are called “meta-commands”. They all start with a dot, so we check for them and handle them in a separate function."
First we add our do_meta_command()
and prepare_statement()
functions.
We also create our meta command and statement enums.
Requirements:
- Support
insert
andselect
operations. - Adhere to a simple schema.
"SQLite uses a B-tree for fast lookups, inserts and deletes."
Converted in memory table data structure to a hard disk memory pager that writes to and reads from a file.
DEV NOTE: I started trying to separate functionality from the main c file. I ended up going down a long rabbit hole about the history and use of header files for compiling from multiple file sources. I also encountered some duplicate imports while trying to compile. For now, rather then be bogged down with these constraints I just wrote a simple Python script for merging separate c files into one right before compilation. See make main
and the utils.py
file.
We will now need a cursor in order to traverse the rows in our tables.
The basic insertion functionality for our B-Tree has been implemented.
The next step will be to add search functionality.
Added table_find()
function which calls another new function called leaf_node_find()
to perform a binary search on our B-Tree.
Added leaf node splitting functionality to create internal nodes. Also added root creation and tracking functions.
Search functionality was extended by incorporating recursive search of our B-Tree.
NOTE TO SELF: Regarding how we jump from leaf node to leaf node during longer scans: "To scan the entire table, we need to jump to the second leaf node after we reach the end of the first. To do that, we’re going to save a new field in the leaf node header called “next_leaf”, which will hold the page number of the leaf’s sibling node on the right. The rightmost leaf node will have a next_leaf value of 0 to denote no sibling (page 0 is reserved for the root node of the table anyway)."
There are definitely some ways in which I want to expand on this project:
- One is that I need more tests. Maybe I'll implement some coverage checks to make quantify progress along that dimension.
- Create some kind of network accessible server for it so that I can build it and run it on remote servers and access it with a URI string.
- Add a user system along with password authentication.
- Elaborate significantly on the querying languge.
Got not only multiple tables working but also multiple databases.
Added INTO and FROM clauses to query language.
Note that the current file structure is as follows:
{DATABASE_NAME}-{TABLE_NAME}.db
Ex: mydb-mytable.db
.
Example:
./main mydb
db > insert 1 hello hi INTO mytable
db > insert 2 hello2 hi2 INTO mytable2
db > select FROM mytable
(1, hello, hi)
db > select FROM mytable2
(2, hello2, hi2)
File names:
mydb-mytable.db
mydb-mytable2.db