Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support HIP/ROCm backends for GPUs #101

Merged
merged 36 commits into from
Jun 5, 2020

Conversation

benson31
Copy link
Collaborator

@benson31 benson31 commented May 5, 2020

Most of the GPU calls have been factored out into clean HIP vs CUDA backends. Even though HIP is a thin layer over CUDA on NVIDIA platforms, we don't use massive portions of the API and it seemed that there could be an advantage to having the two separate. (Additionally, I don't want CUDA users to have to install ROCm on systems that don't need it just to get back to CUDA.) Moreover, I could envision an optimization for one platform being neutral or even bad for the other, so keeping the two isolated will keep the optimization paths independent.

Similarly, cuBLAS and rocBLAS have been separated as the two have surprisingly divergent APIs. This is abstracted behind the gpu_blas namespace.

This port supports Aluminum in both CUDA mode and HIP mode.

I have not tested with hipCUB yet, but this support should work. hipCUB support has been added and seems fine.

I have not tested with GPU half types under HIP; please review this PR as-is and I will work on that functionality. If it makes it in before this merges, super. Otherwise, I can do a follow-on PR.

As part of this refactor, the preprocessing macros have changed slightly. HYDROGEN_HAVE_GPU should now be used to protect any generic GPU-specific code. HYDROGEN_HAVE_CUDA and HYDROGEN_HAVE_ROCM should be used to protect code that is GPU-backend specific, for CUDA and HIP/ROCm, respectively. Cleaning this up accounts for a large portion of the changes in this PR.

As a final note here: the SyncInfo object changed slightly. Instead of a struct with public event and stream, this is now a class that uses Event() and Stream() to access the event and stream handles, respectively. This is another large portion of the changes in the PR.

@benson31 benson31 added enhancement review requested hip rocm Things related to HIP/ROCm support. labels May 5, 2020
@benson31 benson31 self-assigned this May 5, 2020
Copy link
Collaborator

@ndryden ndryden left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Most comments are on find/replace errors. Overall I don't see any big issues.

include/El/blas_like/level1/Copy/util.hpp Outdated Show resolved Hide resolved
include/El/blas_like/level1/Hadamard.hpp Outdated Show resolved Hide resolved
include/El/core/Element/impl.hpp Outdated Show resolved Hide resolved
include/El/core/Matrix/impl.hpp Show resolved Hide resolved
include/El/core/Memory/decl.hpp Show resolved Hide resolved
src/core/imports/mpi/Gather.hpp Show resolved Hide resolved
src/hydrogen/device/ROCm.cpp Outdated Show resolved Hide resolved
tests/blas_like/Gemm.cpp Outdated Show resolved Hide resolved
Copy link
Collaborator

@timmoon10 timmoon10 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Currently at 140/179. So far looks good.

include/El/core/Memory/impl.hpp Outdated Show resolved Hide resolved
include/hydrogen/device/gpu/cuda/CUDALaunchKernel.hpp Outdated Show resolved Hide resolved
include/hydrogen/device/gpu/cuda/SyncInfo.hpp Show resolved Hide resolved
include/hydrogen/device/gpu/cuda/SyncInfo.hpp Outdated Show resolved Hide resolved
@timmoon10 timmoon10 self-requested a review May 8, 2020 23:59
Copy link
Collaborator

@timmoon10 timmoon10 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

158/179. One correctness error in include/hydrogen/meta/MetaUtilities.hpp.

include/hydrogen/device/gpu/rocm/SyncInfo.hpp Outdated Show resolved Hide resolved
include/hydrogen/device/gpu/rocm/SyncInfo.hpp Show resolved Hide resolved
include/hydrogen/meta/MetaUtilities.hpp Outdated Show resolved Hide resolved
@timmoon10 timmoon10 self-requested a review May 9, 2020 01:00
Copy link
Collaborator

@timmoon10 timmoon10 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall look good to me. My comments are all nitpicks.

tests/core/DistMatrix.cpp Outdated Show resolved Hide resolved
src/hydrogen/device/cuBLAS.cpp Show resolved Hide resolved
src/hydrogen/device/cuBLAS.cpp Outdated Show resolved Hide resolved
src/hydrogen/device/rocBLAS.cpp Show resolved Hide resolved
src/hydrogen/device/rocBLAS.cpp Outdated Show resolved Hide resolved
Copy link
Collaborator

@bvanessen bvanessen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. I see that there are still some unaddressed comments from @timmoon10

@benson31 benson31 merged commit d2feee8 into LLNL:hydrogen Jun 5, 2020
@benson31 benson31 deleted the feature-rocm-port branch June 5, 2020 22:37
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement hip rocm Things related to HIP/ROCm support. review requested
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants