This repository has the source code for the implementation of various hash functions and schemes used in our "Can Learned Models Replace Hash Functions?" VLDB submission.
Run the following command: git clone --recurse-submodules https://github.com/DominikHorn/hashing-benchmark.git
-
To download the SOSD datasets
- run
bash download.shin the data folder
- run
-
To run the hash table experiments
- Change the path of SOSD datasets in file
src/support/datasets.hpp - To build and run the hash table experiemnts, run the following command:
bash benchmark.sh
- Change the path of SOSD datasets in file
The results of the hash table experiments are stored in JSON format in "results.json", and other stats are loggged in "log_stats.out".
- To run the range query experiments
- Change the path of SOSD datasets in file
src/support/datasets.hpp - To build and run the range query experiemnts, run the following command:
bash benchmark_range.sh
- Change the path of SOSD datasets in file
The results of the range query experiments are stored in JSON format in "results.json", and other stats are loggged in "log_stats.out".
- To run the join experiments
- Change the path of SOSD datasets in file
include/join/utils/datasets.hpp - Change the path of
OUTPUT_FOLDERin filescripts/evaluation/join_tuner.shby changing the variableoutput_folder_path - To run the join experiments, run the following command
sh scripts/evaluation/join_tuner.sh
- Change the path of SOSD datasets in file
The results of the join experiments are stored in CSV format in the OUTPUT_FOLDER.
-
Hash table implementation using different combinations of hashing schemes and functions:
include/chained.hpp: chained hash table using traditional hash functionsinclude/chained_model.hpp: chained hash table using learned hash functionsinclude/chained_exotic.hpp: chained hash table using perfect hash functionsinclude/probe.hpp: linear probing hash table using traditional hash functionsinclude/probe_model.hpp: linear probing hash table using learned hash functionsinclude/probe_exotic.hpp: linear probing hash table using perfect hash functionsinclulde/cuckoo.hpp: cuckoo hash table using traditional hash functionsinclude/cuckoo_model.hpp: cuckoo hash table using learned hash functionsinclude/cuckoo_exotic.hpp: cuckoo hash table using perfect hash functions
-
Non-partitioned hash join implementation using different combinations of hashing schemes and functions:
include/join: it hasnpj_join_runner.cppwhich provides the main implementation and other helper/configuration files
-
Optimization stuff
include/convenience/: commonly used cpp macros (e.g.,forceinline) and related functionalityinclude/support.hpp: simple tape storage implementation to eliminate small allocs in hashtables
-
Testing and benchmarking driver code
src/benchmarks/:passive_stats.hpp: benchmark code for collecting passive stats of hash tablestemplate_tables.hpp: benchmark code for collecting insert and probe stats of hash tablestables.hpp: some hashtable benchmark experimentstemplate_tables_range.hpp: benchmark code for collecting range query stats of hash tables
src/support/: code shared by different benchmarks and tests for loading datasets and generating probe distributions
src/benchmarks.cpp: original entry point for benchmarks targetsrc/tests.cpp: original entry point for tests targetcleanup.py: deduplicate and sort measurements json file
-
Building and running scripts
setup.sh: original script to setup repo (submodule checkout, cmake configure etc)requirements.txt: python requirementsCMakeLists.txt: cmake target definitionsthirdparty/: cmake dependency declarationsbuild-debug.sh: make debug buildbuild.sh: make production buildrun.sh: original script to build and execute benchmark targetperf.sh: likerun.shbut with perf instrumentationonly_new.py: helper script forrun.sh, which extracts all datapoints we already measured from results.json and ensures that we only run new datapointstest.sh: orignal script to build and execute testsbenchmark.sh: script to run probe and insert relevant code for benchmarkingscripts/evaluation/join_tuner.sh: script to run the join experiments
-
*results*.json: benchmark results from internal measurements -
README.mdthis file