This repository houses the (header-only) implementation of an external-memory fractal tree using the C++ STXXL library.
A fractal tree (a variant of the buffered repository tree) is a tree data structure similar to a B-Tree. Each node occupies one block of memory of size B, and has up to sqrt(B) keys that function just like those in a B-Tree. However, each node additionally fills its block up with up with a buffer of size B-sqrt(B). On insertion, new key-datum-pairs are quickly inserted into the root node buffer (and later pushed down to the buffers in the next level when a buffer is full), speeding up insertions by a factor 1/sqrt(B) compared to a B-Tree. Fractal Tree searches are a factor of 2 slower than B-Tree searches.
With a Block Size of B, N items in the tree, and for range-search X items in the result, fractal trees and b-trees need the following number of disk accesses (I/Os):
Tree Type | Insertion | Search | Range-Search |
---|---|---|---|
Fractal Tree | |||
B-Tree |
Here are some results of measurements I ran (for details, see the report):
See the run-fractal-tree.cpp file:
#include "include/fractal_tree/fractal_tree.h"
int main()
{
using key_type = unsigned int;
using data_type = unsigned int;
using value_type = std::pair<key_type, data_type>;
constexpr unsigned block_size = 2u << 12u; // 4kB blocks
constexpr unsigned cache_size = 2u << 15u; // 32kB cache
using ftree_type = stxxl::ftree<key_type, data_type, block_size, cache_size>;
ftree_type f;
// insert 1MB of data
for (key_type k = 0; k < (2u << 20u) / sizeof(value_type); k++) {
f.insert(value_type(k, 2*k));
}
// find datum of given key
std::pair<data_type, bool> datum_and_found = f.find(1);
assert(datum_and_found.second);
assert(datum_and_found.first == 2);
// find values in key range [100, 1000]
std::vector<value_type> range_values = f.range_find(100, 1000);
std::vector<value_type> correct_range_values {};
for (key_type k = 100; k <= 1000; k++) {
correct_range_values.emplace_back(k, 2*k);
}
assert(range_values == correct_range_values);
return 0;
}
- clone the repo
- cd into the repo
- run
git submodule init
- run
git submodule update --init --recursive
More implementation details, an introduction to external memory trees, and benchmarks can be found in this report.