-
Notifications
You must be signed in to change notification settings - Fork 565
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Remove direct UCX and NCCL dependencies #5038
Merged
Merged
Changes from all commits
Commits
Show all changes
9 commits
Select commit
Hold shift + click to select a range
de7c83b
Remove direct dependence on ucx.
vyasr 11c8721
Try removing nccl as well.
vyasr a2d829c
Remove all NCCL references.
vyasr b974490
Merge 23.02 into PR
dantegd f4f9649
UPD remove ucx/nccl from all recipes/envs
dantegd 858bdb4
FIX remove straggling references
dantegd 3d25dca
UPD Updates from PR review
dantegd 838b83d
FIX typo
dantegd 2838327
Merge branch 'branch-23.02' into remove_direct_ucx
vyasr File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file was deleted.
Oops, something went wrong.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file was deleted.
Oops, something went wrong.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The more I think about this, I believe we are still going to need to link against NCCL to use
raft/comms/mpi_comms.hpp
but only for the mg tests and nowhere else. Thempi_comms.hpp
are essentially creating a NCCL cluster using MPI and then handing off to NCCL everywhere (think of it like replacing Dask w/ MPI).Maybe we could use the
raft::distributed
COMPONENT for that? (or maybe not, since it also brings in UCX...).There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Components are neat, added
raft::distributed
linkage and notes of added dependencies (MPI, NCCL) for C++ MG testsThere was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are cuml's MG tests using libraft for NCCL in the same way that raft-dask uses libraft for UCX? I'm not quite sure how this code works; does cuml's MG code not directly use NCCL, but the MG tests do?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Cuml's MG googletests should be using MPI, however they should still be tied to the comms interface so shouldn't need to know NCCL exists at all (this NCCL can be provided transitively through the raft::distributed component). Note also that wherher the underlying MPI configuration has been configured with UCX underneath is up to the user running the tests (so the raft MPI communicator will use ucx by proxy and not directly)
Cuml's mg pytests all use raft-dask.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right the python tests are fine. I guess the difference is that with the pytests raft-dask is a Python package whose underlying extension modules are linked to NCCL, and then cuml Python only interacts with it at the Python layer, whereas the cuml C++ tests are either linking directly to libraft or building against raft's headers, and in either case the distributed components are not precompiled anyway so there's always NCCL linking required?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since we are using PIMPL for the comms, the mg tests shouldnt need nccl during compile time. But yes, the mg tests binary will need to have raft::distributed on the link libraries (which should pass nccl through)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yup OK cool. makes sense. Dante has already made the necessary change here by adding the distributed component in get_raft.cmake, so I think we're good here.