Releases: LLNL/Aluminum
Releases · LLNL/Aluminum
v1.4.2
This is a minor update primarily adding additional debugging support to Aluminum and support for newer versions of CUDA and ROCm.
- Add additional sanity checks for MPI initialization.
- Add various checks to ensure arguments are sane.
- Add support for Caliper annotations to Aluminum APIs (build with
ALUMINUM_ENABLE_CALIPER=Yes
). - Add an option to disable all background streams for non-blocking operations (build with
ALUMINUM_DISABLE_BACKGROUND_STREAMS=Tes
). - Various compilation fixes.
- Support for building with CUDA 12.
- Support for building with ROCm 6.
v1.4.1
v1.4.0
This release addresses various issues and adds a new MultiSendRecv
operation.
- The default internal stream pool size has changed to 1. This is to mitigate issues on ROCm platforms, but no performance impact was observed on other platforms.
- Fix a compilation error when building on CUDA 12 platforms.
- On ROCm platforms only: zero-size RCCL
Send
,Recv
, andSendrecv
messages are skipped. This is to work around apparent hangs in RCCL with such messages and will be removed once the issue is fixed upstream. - Fix a memory copy issue in the host-transfer
Alltoallv
. - Updated to cxxopts 3.
- Added a compile-time traits API for describing what operations, types, etc. are supported by each backend.
- Added the
MultiSendRecv
operation, which supports an arbitrary sequence of sends and receives among ranks as a single operation. - Various internal reorganizations for the test and benchmark code.
v1.3.1
v1.3.0
v1.2.3
v1.2.2
v1.2.1
v1.2.0
This release adds better support for low-precision data.
- Support fp16 (IEEE half-precision) in all backends when support is available.
- Support bfloat16 in all backends when support is available.
- The NCCL/RCCL backend now supports averaging as a reduction operator (
avg
). - Aluminum now requires at least CUDA 11 / ROCm 5 when GPU support is requested.
v1.1.0
The highlight of this release is that Aluminum now has a logo.
There were some other, slightly less interesting, changes, too. Notably full support for multi-threaded communication in Aluminum. There are also significant improvements to support on HIP/ROCm platforms and extensive internal cleanups.
- Aluminum has a logo now!
- Support the
AL_MPI_SERIALIZED
compile-time flag, which will run blocking MPI calls on the progress engine for situations where all calls need to come from the same thread. - Support
AL_THREAD_MULTIPLE
for support in Aluminum for safe multi-threaded communication. - Significant improvements in benchmarking/testing infrastructure.
- Removed support for custom MPI allreduce algorithms. Aluminum now uses the native MPI implementations.
- Added an
al_info
binary to provide basic info on Aluminum. - Better progress engine binding on HIP/ROCm systems.
- The host-transfer backend uses stream memory operations on HIP/ROCm systems when available.
- Aluminum no longer relies on hipify for HIP/ROCm systems.
- Aluminum's CMake exports components to identify backend support at build time.
- Significant internal code reorganizations/cleanup for CUDA stuff and the progress engine.
- Various bugfixes and other minor improvements.