-
Notifications
You must be signed in to change notification settings - Fork 769
[SYCL][CUDA] add non-uniform groups and algorithms support for ext_oneapi_cuda #9182
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Closed
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
To avoid duplicating logic and introducing even more overloads of the group algorithms, it is desirable to move some of the implementation details into the detail::spirv namespace. This commit makes a few changes to enable that to happen: - spirv:: functions with a Group template now take a group object, to enable run-time information (e.g. group membership) to pass through. - ControlBarrier and the OpGroup* instruction used to implement reduce/scan now forward to spirv::, similar to other group functions and algorithms. - The calc helper used to map functors to SPIR-V instructions is updated to use the new spirv:: functions, instead of calling __spirv intrinsics. Signed-off-by: John Pennycook <john.pennycook@intel.com>
Nested detail namespaces cause problems for name lookup. Signed-off-by: John Pennycook <john.pennycook@intel.com>
Enables the following functions to be used with ballot_group arguments: - group_barrier - group_broadcast - any_of_group - all_of_group - none_of_group - reduce_over_group - exclusive_scan_over_group - inclusive_scan_over_group Signed-off-by: John Pennycook <john.pennycook@intel.com>
Fixes compilation at -O0.
Tests the ability to create an instance of each new group type, and the correctness of the core member functions. Signed-off-by: John Pennycook <john.pennycook@intel.com>
This commit adds tests for using ballot_group and the following algorithms: - group_barrier - group_broadcast - any_of_group - all_of_group - none_of_group - reduce_over_group - exclusive_scan_over_group - inclusive_scan_over_group Signed-off-by: John Pennycook <john.pennycook@intel.com>
Signed-off-by: JackAKirk <jack.kirk@codeplay.com> cluster/ballot/opportunistic_group cuda support. Signed-off-by: JackAKirk <jack.kirk@codeplay.com>
Works for all non uniform groups for int type. Fixed cluster_group full mask bug. Signed-off-by: JackAKirk <jack.kirk@codeplay.com>
opportunistic_group/ballot_group still missing shfl based impl. Signed-off-by: JackAKirk <jack.kirk@codeplay.com>
Some formatting. Signed-off-by: JackAKirk <jack.kirk@codeplay.com>
Signed-off-by: JackAKirk <jack.kirk@codeplay.com>
draft scan impl Signed-off-by: JackAKirk <jack.kirk@codeplay.com>
Signed-off-by: JackAKirk <jack.kirk@codeplay.com>
Signed-off-by: JackAKirk <jack.kirk@codeplay.com>
Signed-off-by: JackAKirk <jack.kirk@codeplay.com>
I'm closing this draft impl. I will open a new PR from the finalised branch. |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This PR adds cuda support for
fixed_size_group
,ballot_group
, andopportunistic_group
. All group algorithm support specified in the extension document is also added, except forinclusive_scan
andexclusive_scan
.Status summary
TODO:
GroupAll
forfixed_size_group
andballot_group
need different impls in intel backend (see https://github.com/intel/llvm/pull/9181/files) whereas in cuda backend both group types can use the same impl. I can simple adsorb the cuda impls into acuda::GroupAll
and call this from the appropriate group specific spirv::GroupAll specialization.