Description
Hi!
Recently I did many Profile-Guided Optimization (PGO) benchmarks on multiple projects (including many databases like PostgreSQL, ClickHouse, Redis, MongoDB, etc.) - the results are available here, database-related results could be checked here. That's why I think it's worth trying to apply PGO to HeavyDB to improve the database performance.
I can suggest the following things to do:
- Evaluate PGO's results on HeavyDB.
- If PGO helps to achieve better performance - add a note to HeavyDB's documentation about that. In this case, users and maintainers will be aware of another optimization opportunity for HeavyDB.
- Provide PGO integration into the build scripts. It can help users and maintainers easily apply PGO for their workloads.
- Optimize prebuilt HeavyDB binaries with PGO.
Here are some examples of how PGO is already integrated into other projects' build scripts:
- Rustc: a CI script for the multi-stage build
- GCC:
- Clang: Docs
- Python:
- Go: Bash script
- V8: Bazel flag
- ChakraCore: Scripts
- Chromium: Script
- Firefox: Docs
- Thunderbird has PGO support too
- PHP - Makefile command and old Centminmod scripts
- MySQL: CMake script
- YugabyteDB: GitHub commit
- FoundationDB: Script
- Zstd: Makefile
- Foot: Scripts
- Windows Terminal: GitHub PR
- Pydantic-core: GitHub PR
- file.d: GitHub PR
- OceanBase: CMake flag
- ISPC: CMake scipts
- NodeJS: Configure script
- Android Open Source Project (AOSP):
- Official documentation
- Committed PGO profiles: repository
- DMD: Custom build rule
- LDC: GitHub action
- tsv-utils: Makefile
- Erlang OTP: Makefile
- Clingo (PGO enabled only in Spack): Package recipe
- SWI-Prolog:
- hck: Justfile
Here are some examples how PGO-related documentation could look like in the project:
- ClickHouse: https://clickhouse.com/docs/en/operations/optimizing-performance/profile-guided-optimization
- Databend: https://databend.rs/doc/contributing/pgo
- Vector: https://vector.dev/docs/administration/tuning/pgo/
- Nebula: https://docs.nebula-graph.io/3.5.0/8.service-tuning/enable_autofdo_for_nebulagraph/
- GCC: Official docs, section "Building with profile feedback" (even AutoFDO build is supported)
- Clang:
- tsv-utils: https://github.com/eBay/tsv-utils/blob/master/docs/BuildingWithLTO.md
After PGO, I can suggest evaluating PLO with LLVM BOLT as an additional optimization step after PGO.
Below are listed some BOLT results:
- Rustc:
- CPython: GitHub PR
- YDB: GitHub comment
- Clang:
- LDC: GitHub comment
- HHVM, Proxygen and others: Facebook paper
- NodeJS: Blog
- Chromium: Blog
- MySQL, MongoDB, memcached, Verilator: Paper
I am not familiar with HeavyDB (yet) but I guess at first we can try to train PGO on the HeavyDB benchmarks and then compare before and after PGO performance with HeavyDB.