Releases: LLNL/RAJA
v0.5.0
Please download the RAJA-0.5.0.tar.gz file above. The others will not work due to the way RAJA uses git submodules.
This release contains a variety of bug fixes, removes nvcc compiler
warnings, addition of unit tests to expand coverage, and a variety of
other code cleanup and improvements. The most notable changes in this
version include:
-
New RAJA User Guide and Tutorial along with a set of example codes
that illustrate basic usage of RAJA features and which accompany
the tutorial. The examples are in theRAJA/examples
directory.
The user guide is available online here:
RAJA User Guide and Tutorial. -
RAJA::IndexSet
is now deprecated. You may still use it until it is
removed in a future release -- you will see a notification message at
compile time that it is deprecated.Index set functionality will now be available via
RAJA::TypedIndexSet
where you specify all segment types as template parameters when you
declare an instance of it. This change allows us to: remove all virtual
methods from the index set, be able to use index set objects to CUDA
GPU kernels and all of their functionality, and support any arbitrary
segment type even user-defined. Please see User Guide for details.Segment dependencies are being developed for the typed index set and
will be available in a future release. -
RAJA::nested::forall
changes:-
Addition of CUDA and OpenMP collapse policies for nested loops.
OpenMP collapse will do what the OpenMP collapse clause does.
CUDA collapse will collapse a loop nest into a single CUDA kernel based
on how nested policies specify how the loop levels should be distributed
over blocks and threads. -
Added new policy
RAJA::cuda_loop_exec
to enable inner loops to run
sequentially inside a CUDA kernel withRAJA::nested::forall
. -
Fixed
RAJA::nested::forall
so it now works with RAJA's CUDA Reducer
types. -
Removed
TypedFor
policies. For type safety of nested loop iteration
variables, it makes more sense to useTypedRangeSegment
since the
variables are associated with the loop kernel and not the execution
policy, which may be applied to multiple loops with different variables.
-
-
Fixed OpenMP scans to calculate chunks of work based on actual number of
threads the OpenMP runtime makes available. -
Enhancements and fixes to RAJA/CHAI interoperability.
-
Added aliases for several
camp
types in the RAJA namespace; e.g.,
camp::make_tuple
can now be accessed asRAJA::make_tuple
. This
change makes the RAJA API more consistent and clear.
v0.4.1
Please download the RAJA-0.4.1.tar.gz file above. The others will not work due to the way RAJA uses git submodules.
Bugfix for warnings when using the -Wpedantic
flag.
v0.4.0
Please download the RAJA-0.4.0.tar.gz file above. The others will not work due to the way RAJA uses git submodules.
The v0.4.0 release contains minor fixes for issues in the previous v0.3.1 release, plus some improvements to documentation, reduction performance, improved portability across a growing set of compilers and environments (e.g., Windows), namespace refactoring to avoid cyclic dependencies and leverage argument-dependent lookup, etc. In addition, the RAJA backend for Intel TBB is now off by default, whereas previously it was on by default.
A few major changes are included in this version:
-
Changes to the way RAJA is configured and built. We are now using the BLT build system which is a Git submodule of RAJA. In addition to requiring the '--recursive' option to be passed to 'git clone', this introduces the following major change:
RAJA_ENABLE_XXX
options passed to CMake are now justENABLE_XXX
. -
A new API and implementation for nested-loop RAJA constructs has been added. It is still a work in progress, but users are welcome to try it out and provide feedback. Eventually, RAJA::nested::forall will replace RAJA::forallN.
v0.3.1
This release contains some new RAJA features, plus a bunch of internal changes including more tests, conversion of nearly all unit tests to use Google Test, improved testing coverage, and compilation portability improvements (e.g., Intel, nvcc, msvc). Also, the prefix for all RAJA source files has been changed from *.cxx
to *.cpp
for consistency with the header file prefix conversion in the last release. The source file prefix change should not require users to change anything.
New features included in this release:
- Execution policy modifications and additions:
seq_exec
is now strictly sequential (no SIMD, etc.),simd_exec
will force SIMD vectorization,loop_exec
(new policy) will allow compiler to optimize however it can, including SIMD. So,loop_exec
is really what our previoussimd_exec
policy was before, and 'no vector' pragmas have been added to all sequential implementations. NOTE: SIMD changes are still being evaluated with different compilers on different platforms. More information will be provided as we learn more. - Added support for atomic operations (min, max, inc, dec, and, or, xor, exchange, and CAS) for all programming model backends. These appear in the
RAJA::atomic
namespace. - Support added for Intel Threading Building Blocks backend (considered experimental at this point)
- Added macros that will be used to mark features for future deprecation (please watch for this as we will be deprecating some features in the next release)
- Added support for C++17 if CMake knows about it
- Remove limit on number of ordered OpenMP reductions that can be used in a kernel
- Remove compile-time error from memutils, add portable aligned allocator
- Improved ListSegment implementation
RAJA::Index_type
is nowptrdiff_t
instead ofint
Notable bug fixes included in this release:
- Fixed
strided_numeric_iterator
to apply stride sign in comparison - Bug in
RangeStrideSegment
when using CUDA is fixed - Fixed reducer logic for
openmp_ordered
policy
v0.3.0
This release contains breaking changes and is not backward compatible with prior versions. The largest change is a re-organization of header files, and the switch to .hpp
as a file extension for all headers.
New features included in this release:
- Re-organization of header files
- Renaming of file extensions
- OpenMP 4.5 support
- CHAI support
v0.2.5
This release includes some small fixes, as well as an initial re-organization of the RAJA header files as we move towards a more flexible usage model.
v0.2.4
This version includes the following changes:
- Initial support of clang-cuda
- New, faster OpenMP reductions
N.B The default OpenMP reductions are no longer performed in a ordered fashion, so results may not be reproducible. The old reductions are still available with the policy RAJA::omp_reduce_ordered
.
v0.2.3
Hotfix to update the URLs used for fetching clang during Travis builds.
v0.2.2
Bugfix release that address an error when launching forall cuda kernels with a 0-length range
v0.2.1
This release contains fixes for compiler warnings and removes the usage of the custom FindCUDA CMake package.