Releases · LLNL/Aluminum

11 Feb 20:57

ndryden

v1.4.2

2de42e7

v1.4.2 Latest

Latest

This is a minor update primarily adding additional debugging support to Aluminum and support for newer versions of CUDA and ROCm.

Add additional sanity checks for MPI initialization.
Add various checks to ensure arguments are sane.
Add support for Caliper annotations to Aluminum APIs (build with ALUMINUM_ENABLE_CALIPER=Yes).
Add an option to disable all background streams for non-blocking operations (build with ALUMINUM_DISABLE_BACKGROUND_STREAMS=Tes).
Various compilation fixes.
Support for building with CUDA 12.
Support for building with ROCm 6.

Assets 2

18 Aug 18:12

ndryden

v1.4.1

b6b018f

v1.4.1

This is a bugfix release addressing a compilation issue with libc++ (see #209).

Assets 2

17 Aug 18:49

ndryden

v1.4.0

3c08739

v1.4.0

This release addresses various issues and adds a new MultiSendRecv operation.

The default internal stream pool size has changed to 1. This is to mitigate issues on ROCm platforms, but no performance impact was observed on other platforms.
Fix a compilation error when building on CUDA 12 platforms.
On ROCm platforms only: zero-size RCCL Send, Recv, and Sendrecv messages are skipped. This is to work around apparent hangs in RCCL with such messages and will be removed once the issue is fixed upstream.
Fix a memory copy issue in the host-transfer Alltoallv.
Updated to cxxopts 3.
Added a compile-time traits API for describing what operations, types, etc. are supported by each backend.
Added the MultiSendRecv operation, which supports an arbitrary sequence of sends and receives among ranks as a single operation.
Various internal reorganizations for the test and benchmark code.

Assets 2

09 May 22:06

ndryden

v1.3.1

ed3e487

v1.3.1

This is a minor release that mainly fixes some linking issues on ROCm platforms.

Fix RCCL includes and linking.
Various improvements to the benchmarking and testing infrastructure.
Improved documentation.

Assets 2

10 Mar 18:13

ndryden

v1.3.0

9ec3675

v1.3.0

This adds in-place SendRecv support to Aluminum.

Assets 2

09 Mar 21:01

ndryden

v1.2.3

0191b71

v1.2.3

This is a bugfix release adding threads linkage to the CMake export.

Assets 2

08 Mar 22:13

ndryden

v1.2.2

19413a6

v1.2.2

This is primarily a bugfix release.

Fixed an issue in progress engine binding that could lead to hangs. (See #182.)
Traces include the stream of an operation.
Tuning parameters are now configured via CMake rather than by manually editing tunuing_params.hpp.

Assets 2

02 Mar 17:38

ndryden

v1.2.1

730a04e

v1.2.1

This is a minor bugfix release.

Fixed builds of the MPI-CUDA tests and MPI-CUDA RMA library.
Use locks to protect tracing when built with AL_THREAD_MULTIPLE.
Match benchmarking script type support to test's support.

Assets 2

02 Feb 23:08

ndryden

v1.2.0

e51dc07

v1.2.0

This release adds better support for low-precision data.

Support fp16 (IEEE half-precision) in all backends when support is available.
Support bfloat16 in all backends when support is available.
The NCCL/RCCL backend now supports averaging as a reduction operator (avg).
Aluminum now requires at least CUDA 11 / ROCm 5 when GPU support is requested.

Assets 2

30 Jan 23:22

ndryden

v1.1.0

808f68d

v1.1.0

The highlight of this release is that Aluminum now has a logo.

There were some other, slightly less interesting, changes, too. Notably full support for multi-threaded communication in Aluminum. There are also significant improvements to support on HIP/ROCm platforms and extensive internal cleanups.

Aluminum has a logo now!
Support the AL_MPI_SERIALIZED compile-time flag, which will run blocking MPI calls on the progress engine for situations where all calls need to come from the same thread.
Support AL_THREAD_MULTIPLE for support in Aluminum for safe multi-threaded communication.
Significant improvements in benchmarking/testing infrastructure.
Removed support for custom MPI allreduce algorithms. Aluminum now uses the native MPI implementations.
Added an al_info binary to provide basic info on Aluminum.
Better progress engine binding on HIP/ROCm systems.
The host-transfer backend uses stream memory operations on HIP/ROCm systems when available.
Aluminum no longer relies on hipify for HIP/ROCm systems.
Aluminum's CMake exports components to identify backend support at build time.
Significant internal code reorganizations/cleanup for CUDA stuff and the progress engine.
Various bugfixes and other minor improvements.

Assets 2

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Releases: LLNL/Aluminum

v1.4.2

v1.4.1

v1.4.0

v1.3.1

v1.3.0

v1.2.3

v1.2.2

v1.2.1

v1.2.0

v1.1.0