Skip to content

[SYCL] Add support for native root group implementation #11043

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 29 commits into from

Conversation

0x12CC
Copy link
Contributor

@0x12CC 0x12CC commented Sep 1, 2023

Add new PI APIs and relevant runtime changes to enable root groups composed of more than one work group. This changeset includes the following changes:

  • Add piextKernelSuggestMaxCooperativeGroupCount API to query the maximum number of work groups that can run cooperatively for a given kernel.
  • Add piextEnqueueCooperativeKernelLaunch API to launch kernels cooperatively.
  • Add MKernelIsCooperative flag to track if a kernel needs to be launched using piextEnqueueCooperativeKernelLaunch or piEnqueueKernelLaunch. Presently, this flag is only set if the kernel has the use_root_sync_key property. It's expected that other features may set this flag in the future if they require all work groups to run concurrently.
  • Modify the root group barrier implementation to use __spirv_ControlBarrier with Device scope rather than Workgroup scope.

@0x12CC
Copy link
Contributor Author

0x12CC commented Sep 1, 2023

@AlexeySachkov, @Pennycook, I've marked this PR as draft since the changes to sycl/plugins/level_zero/pi_level_zero.cpp and sycl/plugins/opencl/pi_opencl.cpp likely need to be split into a separate PR in oneapi-src/unified-runtime. Other changes should be ready for review so please feel free to provide early feedback.

@Pennycook
Copy link
Contributor

@0x12CC - I want to make you aware of some recent changes to the specification; we renamed a few things, and moved the queries into a separate extension specification (sycl_ext_oneapi_launch_queries). I don't think you need to make these changes in this PR (since we should just be able to rename things afterwards) but I wanted to let you know.

Signed-off-by: Michael Aziz <michael.aziz@intel.com>
Signed-off-by: Michael Aziz <michael.aziz@intel.com>
Signed-off-by: Michael Aziz <michael.aziz@intel.com>
Signed-off-by: Michael Aziz <michael.aziz@intel.com>
Signed-off-by: Michael Aziz <michael.aziz@intel.com>
Signed-off-by: Michael Aziz <michael.aziz@intel.com>
Signed-off-by: Michael Aziz <michael.aziz@intel.com>
Signed-off-by: Michael Aziz <michael.aziz@intel.com>
Signed-off-by: Michael Aziz <michael.aziz@intel.com>
Signed-off-by: Michael Aziz <michael.aziz@intel.com>
Signed-off-by: Michael Aziz <michael.aziz@intel.com>
Signed-off-by: Michael Aziz <michael.aziz@intel.com>
Signed-off-by: Michael Aziz <michael.aziz@intel.com>
@0x12CC 0x12CC marked this pull request as ready for review September 14, 2023 14:59
@0x12CC 0x12CC requested review from a team and victor-eds as code owners September 14, 2023 14:59
Signed-off-by: Michael Aziz <michael.aziz@intel.com>
Signed-off-by: Michael Aziz <michael.aziz@intel.com>
Signed-off-by: Michael Aziz <michael.aziz@intel.com>
Signed-off-by: Michael Aziz <michael.aziz@intel.com>
@0x12CC 0x12CC requested a review from ldrumm September 20, 2023 21:25
@0x12CC
Copy link
Contributor Author

0x12CC commented Sep 20, 2023

Thanks for the reviews, @victor-eds, @ldrumm! I've made all of the requested changes.

@0x12CC 0x12CC temporarily deployed to WindowsCILock September 20, 2023 21:48 — with GitHub Actions Inactive
@0x12CC 0x12CC temporarily deployed to WindowsCILock September 20, 2023 22:30 — with GitHub Actions Inactive
@JackAKirk
Copy link
Contributor

This looks at a glance like it would likely be compatible with cuda's grid_group and their cooperative kernel launch api, but has someone gone through and convinced themselves that the set of interfaces in runtime/UR will support corresponding cuda functionality, as well as any other backends that may have similar functionality?

@0x12CC
Copy link
Contributor Author

0x12CC commented Sep 21, 2023

This looks at a glance like it would likely be compatible with cuda's grid_group and their cooperative kernel launch api, but has someone gone through and convinced themselves that the set of interfaces in runtime/UR will support corresponding cuda functionality, as well as any other backends that may have similar functionality?

There's been some discussion of the relevant CUDA APIs in the PR for the UR changes: oneapi-src/unified-runtime#849.

Copy link
Contributor

@jandres742 jandres742 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please move changes to UR files.

@0x12CC
Copy link
Contributor Author

0x12CC commented Sep 28, 2023

@ldrumm, @jandres742, I've removed all of the changes related to UR and made the other requested changes. This PR now only includes the new PI function and the relevant changes to the SYCL runtime. The implementation for these new PI functions will be in the UR repository. Please review this updated changeset.

@0x12CC 0x12CC temporarily deployed to WindowsCILock September 28, 2023 16:45 — with GitHub Actions Inactive
@0x12CC 0x12CC requested a review from jandres742 September 28, 2023 16:45
@jandres742
Copy link
Contributor

@ldrumm, @jandres742, I've removed all of the changes related to UR and made the other requested changes. This PR now only includes the new PI function and the relevant changes to the SYCL runtime. The implementation for these new PI functions will be in the UR repository. Please review this updated changeset.

thanks @0x12CC . Right, please see guidlines here https://github.com/oneapi-src/unified-runtime/pull/902/files

@@ -573,6 +583,13 @@ pi_result piextKernelGetNativeHandle(pi_kernel Kernel,
return pi2ur::piextKernelGetNativeHandle(Kernel, NativeHandle);
}

pi_result piextKernelSuggestMaxCooperativeGroupCount(pi_kernel Kernel,
pi_uint32 *GroupCountRet) {
(void)Kernel;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this needs to be also a redirection to pi2ur, even if all that we are doing is returning 1.

@0x12CC 0x12CC temporarily deployed to WindowsCILock September 28, 2023 17:14 — with GitHub Actions Inactive
Co-authored-by: Nikita Kornev <nikita.kornev@intel.com>
@0x12CC 0x12CC closed this Jan 11, 2024
@0x12CC 0x12CC deleted the native_root_group branch January 11, 2024 18:55
@0x12CC
Copy link
Contributor Author

0x12CC commented Jan 11, 2024

I've closed this PR since the branch is outdated and created #12367 to replace it. The new PR includes all of the suggestions/fixes from this PR.

@bader
Copy link
Contributor

bader commented Jan 11, 2024

I've closed this PR since the branch is outdated and created #12367 to replace it. The new PR includes all of the suggestions/fixes from this PR.

I recommend updating the branch instead of opening a new PR. It's hard to track whether comments were addressed or not. It's also unclear where to continue discussions if reviews have follow-up comments.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.