Skip to content

[SYCL] [CUDA]Add experimental cuda interop #6162

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed

Conversation

AidanBeltonS
Copy link
Contributor

This PR adds an experimental CUDA-interop that reflects the proposed CUDA backend spec in KhronosGroup/SYCL-Docs#197. The changes work with the CUDA CTS interop checks KhronosGroup/SYCL-CTS#336.

The changes are kept experimental so as to not break oneMKL, and oneDNN which use CUDA interops.

The experimental interop is enabled by defining the SYCL_EXT_ONEAPI_BACKEND_CUDA_EXPERIMENTAL macro and including the header file sycl/ext/oneapi/experimental/backend/cuda.hpp.

@AidanBeltonS AidanBeltonS force-pushed the aidan/cuda-interop-context branch from 488747b to 28621e6 Compare May 18, 2022 10:31
@AidanBeltonS AidanBeltonS changed the title Aidan/cuda interop context [SYCL] Add experimental cuda interop May 18, 2022
@AidanBeltonS AidanBeltonS changed the title [SYCL] Add experimental cuda interop [SYCL] [CUDA]Add experimental cuda interop May 18, 2022
pi_context *piContext) {
(void)num_devices;
(void)devices;
(void)ownNativeHandle;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

don't you want to check (also in other places) that unused arguments are either unset, or set to reasonable values?

@smaslov-intel
Copy link
Contributor

The changes look good high-level, but can't you split them into multiple PRs?
Also, are you adding E2E tests to check the interop?

@smaslov-intel
Copy link
Contributor

Also, are you adding E2E tests to check the interop?

I see you are adding tests in KhronosGroup/SYCL-CTS#336
When/how are they being run?

@smaslov-intel
Copy link
Contributor

The experimental interop is enabled by defining the SYCL_EXT_ONEAPI_BACKEND_CUDA_EXPERIMENTAL macro

What does "experimental" denotes here and when can/will it become non-experimental?

@AidanBeltonS
Copy link
Contributor Author

The changes look good high-level, but can't you split them into multiple PRs? Also, are you adding E2E tests to check the interop?

Sure, I think this can be split into a few PRs. It might be a bit tricky as there may be some dependencies between the different PRs. Would you prefer this to be separated into a few submissions?
I can add some E2E tests to the llvm-test-suite, as well.

I see you are adding tests in KhronosGroup/SYCL-CTS#336
When/how are they being run?

I am running the CUDA CTS tests locally to verify this PR passes the SYCL-CTS. I run them when I make changes to either this PR or the CUDA CTS tests.

What does "experimental" denotes here and when can/will it become non-experimental?

This is experimental as there is still some discussion regarding the CUDA backend spec. In addition both oneMKL and oneDNN use the exiting interop, if these changes were to overwrite the current interop behaviour both of these libraries would break.
I think this can be non-experimental when the CUDA backend spec is finalised and agreed, and oneMKL and oneDNN are updated to use this interop.

@pvchupin pvchupin requested a review from gmlueck May 25, 2022 16:31
@smaslov-intel
Copy link
Contributor

Would you prefer this to be separated into a few submissions?

Yes, please. Even splitting to 2 will make it more manageable.

both oneMKL and oneDNN use the exiting interop, if these changes were to overwrite the current interop behaviour both of these libraries would break.

Please clarify what is the "current interop"? I am not seeing any interop support in the current CUDA Plugin.

@AidanBeltonS
Copy link
Contributor Author

Yes, please. Even splitting to 2 will make it more manageable.

Sure I will work on splitting this up.

Please clarify what is the "current interop"? I am not seeing any interop support in the current CUDA Plugin.

Although there is no support in the CUDA plugin for creating SYCL objects with native types, there is functionality to return native types from SYCL objects.

The returned native types are being changed by the CUDA backend spec, for example CUdeviceptr is being replaced with DataT *. So by current I am referring to the input and return native types defined in include/CL/sycl/detail/backend_traits_cuda.hpp compared to the experimental native types defined in /include/sycl/ext/oneapi/experimental/backend/backend_traits_cuda.hpp.

@smaslov-intel
Copy link
Contributor

Sure I will work on splitting this up.

Thank you!

The returned native types are being changed by the CUDA backend spec, for example CUdeviceptr is being replaced with DataT *. So by current I am referring to the input and return native types defined in include/CL/sycl/detail/backend_traits_cuda.hpp compared to the experimental native types defined in /include/sycl/ext/oneapi/experimental/backend/backend_traits_cuda.hpp.

I see. I think it would still be better to have a plan from the beginning how the new interop becomes the default and the only one eventually. And, then, how to get it out of the "experimental" envelope. @gmlueck : would you have any suggestions here?

@steffenlarsen
Copy link
Contributor

I see. I think it would still be better to have a plan from the beginning how the new interop becomes the default and the only one eventually. And, then, how to get it out of the "experimental" envelope.

Since this is an ABI change we cannot reasonably have a deprecation period for the change, so I don't personally see any other option than promoting it out of experimental (replacing the old API) once we are in an ABI break window. Preferably this would happen when the backend specification is deemed fully ready and is implemented in this experimental guard.

@smaslov-intel
Copy link
Contributor

I see. I think it would still be better to have a plan from the beginning how the new interop becomes the default and the only one eventually. And, then, how to get it out of the "experimental" envelope.

Since this is an ABI change we cannot reasonably have a deprecation period for the change, so I don't personally see any other option than promoting it out of experimental (replacing the old API) once we are in an ABI break window. Preferably this would happen when the backend specification is deemed fully ready and is implemented in this experimental guard.

This sounds right. We need to keep track of this change (remove "experimental") needed when ABI break is allowed. Shouldn't we add deprecation messages on the "current" interop now?

@steffenlarsen
Copy link
Contributor

Shouldn't we add deprecation messages on the "current" interop now?

Since the switch is guarded by SYCL_EXT_ONEAPI_BACKEND_CUDA_EXPERIMENTAL I am not sure if we want to deprecate the current API as it would have to tell the user to define SYCL_EXT_ONEAPI_BACKEND_CUDA_EXPERIMENTAL in order for the warnings to go away. Most often we can't issue deprecation warnings prior to ABI break as we can't always offer a deprecation period, though I suppose we could do it in this case.

@smaslov-intel
Copy link
Contributor

We can only remove old interface after it was deprecated for some time. I think it is OK to tell users that we are moving to the new version, which is currently implemented under SYCL_EXT_ONEAPI_BACKEND_CUDA_EXPERIMENTAL and they should start coding with it, because it will become the default (and the only one) in the next major update or so.

@github-actions github-actions bot added the Stale label Nov 30, 2022
@github-actions github-actions bot closed this Dec 31, 2022
@AerialMantis
Copy link
Contributor

I'd like to keep this pull request open, we've not been very active on it recently, but we plan to come back to it again soon.

@AerialMantis AerialMantis reopened this Jan 16, 2023
@AerialMantis AerialMantis temporarily deployed to aws January 16, 2023 13:26 — with GitHub Actions Inactive
@bader bader removed the Stale label Jan 16, 2023
Copy link
Contributor

This pull request is stale because it has been open 180 days with no activity. Remove stale label or comment or this will be automatically closed in 30 days.

@github-actions github-actions bot added the Stale label Sep 16, 2024
Copy link
Contributor

This pull request was closed because it has been stalled for 30 days with no activity.

@github-actions github-actions bot closed this Oct 17, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants