Bolt is a C++ acceleration library providing composable, extensible and performant data processing toolkit. It is designed to provide generic and unified interfaces which can be pluggable into "any framework" running on "any harware" to consume "any data source".
Initially derived from Velox project, Bolt is created by ByteDance to embrace and unify the contributions from the community. It has been validated on Spark/Flink/Presto/ElasticSearch framework running on x64&ARM CPU/DPU/GPU accessing Parquet/ORC/Text/CSV/Lance file format managed under Hive/Paimon table to provide enterprise-grade cost optimization, results consistency and feature parity
"Contributions may come in many forms, and all of them are valuable". The governance model of Bolt community will be in line with the Apache Way and "Community over Code" spirit. While we are working out the detailed governance model on a tree-tier structure of Contributor / Maintainers / Project Management Committee(PMC), we are committed to treating the open source repository as the source of truth, including but not limited to
- Public CI pipelines
- Clear dependency management as code
- Equal code review opportunity for manintainers
- Transparent design discussion
This will ensure the smooth & credible experience for code contribution
Bolt focuses on the physical execution layer of DBMS while providing first-class and high performance support for popular frameworks and storage formats.
Frameworks:
- Apache Gluten for Spark
- OpenSearch for ElasticSearch
- Flink (Coming Soon)
- ...
Storage Formats:
- Parquet
- ORC
- TXT
- CSV
- Paimon
- Lance (Coming Soon)
- ...
Bolt is designed as a seamless acceleration layer that requires minimum code changes to the existing user jobs. Results/Performance comparison against original frameworks is performed on regular basis to capture regression & corner cases. Key features including
- Adaptive Task Parallelism
- Native Memory Management & Dynamic offheap threshold
- Operator Fusion
- JIT for hotspot expression
- Native Shuffle Support
- ...
Conan is an open source and multi-platform package manager. We provide scripts to help developers setup and install Bolt dependencies.
scripts/setup-dev-env.shThis script only exports conan recipes to local cache. For the first time, dependencies will be built from source and installed into local cache. You can setup your own conan server to accelerate building.
git clone https://github.com/bytedance/bolt.git
cd boltRun make in the root directory to compile the sources. For development, use
make debug to build a non-optimized debug version, or make release to build
an optimized version. Use make unittest to build and run tests.
# building bolt for spark
make release_spark
# In main branch, by default, BUILD_VERSION is main.
make release_spark BUILD_VERSION=main# Take gluten for example:
class GluenConan(ConanFile):
def requirements(self):
bolt_version="main"
self.requires(f"bolt/{bolt_version}", transitive_headers=True, transitive_libs=True)Check our contributing guide to learn about how to contribute to the project.
WIP
Bolt is licensed under the Apache 2.0 License. A copy of the license can be found here.
