This is the source code of the submissioned paper at SIGMOD'25: Enabling Efficient In-Memory Full-Text Search with Vectorization on Compacted Columnar Format.
Rust version: 1.68.0
OS: 20.04.1-Ubuntu
CPU: x86_64 and supports AVX512, AVX, SSE4.0 and SSE SIMD instruction extensions.
curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | sh -s --default-toolchain=1.68.0
Pull the customized Apache Datafusion library.
git submodule update
cargo build --release
Command: (the bin is generated after the rustc compilation)
./target/release/fastful-search
help info:
Usage: fastfull-search [OPTIONS] --handler <HANDLER> --base <BASE> [PATH]...
Arguments:
[PATH]... file path
Options:
--handler <HANDLER> [possible values: base, split-base, split-o1, load-data, boolean-query, posting-table, tantivy]
-p, --partition-num <PARTITION_NUM>
-b, --batch-size <BATCH_SIZE>
--base <BASE>
-d, --dump-path <DUMP_PATH>
-h, --help Print help
Command: (the bin is generated after the compilation)
./target/release/do_query <idx_dir> <thread_num>