Skip to content

Releases: LLNL/Aluminum

v1.4.2

11 Feb 20:57
2de42e7
Compare
Choose a tag to compare

This is a minor update primarily adding additional debugging support to Aluminum and support for newer versions of CUDA and ROCm.

  • Add additional sanity checks for MPI initialization.
  • Add various checks to ensure arguments are sane.
  • Add support for Caliper annotations to Aluminum APIs (build with ALUMINUM_ENABLE_CALIPER=Yes).
  • Add an option to disable all background streams for non-blocking operations (build with ALUMINUM_DISABLE_BACKGROUND_STREAMS=Tes).
  • Various compilation fixes.
  • Support for building with CUDA 12.
  • Support for building with ROCm 6.

v1.4.1

18 Aug 18:12
b6b018f
Compare
Choose a tag to compare

This is a bugfix release addressing a compilation issue with libc++ (see #209).

v1.4.0

17 Aug 18:49
3c08739
Compare
Choose a tag to compare

This release addresses various issues and adds a new MultiSendRecv operation.

  • The default internal stream pool size has changed to 1. This is to mitigate issues on ROCm platforms, but no performance impact was observed on other platforms.
  • Fix a compilation error when building on CUDA 12 platforms.
  • On ROCm platforms only: zero-size RCCL Send, Recv, and Sendrecv messages are skipped. This is to work around apparent hangs in RCCL with such messages and will be removed once the issue is fixed upstream.
  • Fix a memory copy issue in the host-transfer Alltoallv.
  • Updated to cxxopts 3.
  • Added a compile-time traits API for describing what operations, types, etc. are supported by each backend.
  • Added the MultiSendRecv operation, which supports an arbitrary sequence of sends and receives among ranks as a single operation.
  • Various internal reorganizations for the test and benchmark code.

v1.3.1

09 May 22:06
ed3e487
Compare
Choose a tag to compare

This is a minor release that mainly fixes some linking issues on ROCm platforms.

  • Fix RCCL includes and linking.
  • Various improvements to the benchmarking and testing infrastructure.
  • Improved documentation.

v1.3.0

10 Mar 18:13
9ec3675
Compare
Choose a tag to compare

This adds in-place SendRecv support to Aluminum.

v1.2.3

09 Mar 21:01
0191b71
Compare
Choose a tag to compare

This is a bugfix release adding threads linkage to the CMake export.

v1.2.2

08 Mar 22:13
19413a6
Compare
Choose a tag to compare

This is primarily a bugfix release.

  • Fixed an issue in progress engine binding that could lead to hangs. (See #182.)
  • Traces include the stream of an operation.
  • Tuning parameters are now configured via CMake rather than by manually editing tunuing_params.hpp.

v1.2.1

02 Mar 17:38
730a04e
Compare
Choose a tag to compare

This is a minor bugfix release.

  • Fixed builds of the MPI-CUDA tests and MPI-CUDA RMA library.
  • Use locks to protect tracing when built with AL_THREAD_MULTIPLE.
  • Match benchmarking script type support to test's support.

v1.2.0

02 Feb 23:08
e51dc07
Compare
Choose a tag to compare

This release adds better support for low-precision data.

  • Support fp16 (IEEE half-precision) in all backends when support is available.
  • Support bfloat16 in all backends when support is available.
  • The NCCL/RCCL backend now supports averaging as a reduction operator (avg).
  • Aluminum now requires at least CUDA 11 / ROCm 5 when GPU support is requested.

v1.1.0

30 Jan 23:22
808f68d
Compare
Choose a tag to compare

The highlight of this release is that Aluminum now has a logo.

There were some other, slightly less interesting, changes, too. Notably full support for multi-threaded communication in Aluminum. There are also significant improvements to support on HIP/ROCm platforms and extensive internal cleanups.

  • Aluminum has a logo now!
  • Support the AL_MPI_SERIALIZED compile-time flag, which will run blocking MPI calls on the progress engine for situations where all calls need to come from the same thread.
  • Support AL_THREAD_MULTIPLE for support in Aluminum for safe multi-threaded communication.
  • Significant improvements in benchmarking/testing infrastructure.
  • Removed support for custom MPI allreduce algorithms. Aluminum now uses the native MPI implementations.
  • Added an al_info binary to provide basic info on Aluminum.
  • Better progress engine binding on HIP/ROCm systems.
  • The host-transfer backend uses stream memory operations on HIP/ROCm systems when available.
  • Aluminum no longer relies on hipify for HIP/ROCm systems.
  • Aluminum's CMake exports components to identify backend support at build time.
  • Significant internal code reorganizations/cleanup for CUDA stuff and the progress engine.
  • Various bugfixes and other minor improvements.