[SYCL][CUDA] Fix get_native interop for device #6649

npmiller · 2022-08-26T15:22:53Z

This patch fixes: #6635

In #6483, the implementation of get_native for device for the CUDA plugin was mistakenly moved to the experimental interface header, and so it was no longer available for the regular interface, causing build issues.

For the CUDA plugin there is currently two interfaces for the CUDA interop, the "legacy" one which is used by projects such as oneMKL and oneDNN, and the "experimental" one, defined in the sycl/ext/oneapi/experimental/backend/cuda.hpp header which implements the interop as described in the CUDA backend specification proposed here: KhronosGroup/SYCL-Docs#197

sergey-semenov · 2022-08-26T15:28:28Z

Thanks for the fix! Could you please add a test as well?

npmiller · 2022-08-26T16:15:58Z

I've updated the test for the experimental interface so it covers both the legacy and experimental CUDA interop interfaces.

v-klochkov · 2022-08-30T16:39:14Z

Non-blocking comment:
In my opinion, it may be better to keep cuda specific functions in "sycl\include\sycl\ext\oneapi\backend\cuda.hpp" (need to create it near "sycl\include\sycl\ext\oneapi\backend\level_zero.hpp"

Tagging @smaslov-intel and @bader for more opinions.

smaslov-intel · 2022-08-31T13:47:03Z

Non-blocking comment: In my opinion, it may be better to keep cuda specific functions in "sycl\include\sycl\ext\oneapi\backend\cuda.hpp" (need to create it near "sycl\include\sycl\ext\oneapi\backend\level_zero.hpp"

Tagging @smaslov-intel and @bader for more opinions.

There is already sycl/include/sycl/ext/oneapi/experimental/backend/cuda.hpp from where the specialization is being moved from. I don't feel strongly but think it'd be good to keep everything backend-specific isolated in their respectful support headers. @gmlueck what do you think from SYCL spec perspective?

gmlueck · 2022-08-31T13:51:14Z

There is already sycl/include/sycl/ext/oneapi/experimental/backend/cuda.hpp from where the specialization is being moved from. I don't feel strongly but think it'd be good to keep everything backend-specific isolated in their respectful support headers. @gmlueck what do you think from SYCL spec perspective?

I agree. The spec says that an application must include the backend interop header in order to use the backend interop APIs. Therefore, it's better to declare the CUDA interop APIs in the CUDA backend interop header.

npmiller · 2022-08-31T16:55:24Z

So to clarify a bit, currently we have the two following headers regarding CUDA interop:

include/sycl/ext/oneapi/experimental/backend/cuda.hpp
include/sycl/backend/cuda.hpp

The first one implements the backend interop in the way described by the CUDA backend specification in the following PR KhronosGroup/SYCL-Docs#197, in the long term the hope is that once this specification makes it into the main SYCL spec, this experimental header would be moved to include/sycl/backend/cuda.hpp, which is the correct spot according to the SYCL specification, and this would be a breaking change which would force other projects such as oneMKL and oneDNN to update their code.

Currently these projects are not using this experimental header, but are using older "legacy" CUDA interop.

This older CUDA interop interface was mostly implemented using the generic interop template functions so didn't really make use of the target specific interop header, in fact this get_native function was originally in the common device.hpp header.

And the second header listed above, is currently empty and marked as deprecated, for two reasons, first because it wasn't needed for the CUDA interop to work, and secondly because this header would in theory be reserved for a backend specified in the main SYCL specification, so in theory this file should have been in include\sycl\ext\oneapi\backend\cuda.hpp to begin with:

__SYCL_WARNING("sycl/backend/cuda.hpp is deprecated and no required anymore")

So circling back to this patch, technically the proper solution here would be to create a new include\sycl\ext\oneapi\backend\cuda.hpp header for the "legacy" CUDA interop interface and move the template specialization there (and maybe duplicate it in the experimental header too as it uses the same specialization), but this is a breaking change for projects such as oneMKL and oneDNN which would then need to update their code to include this new header.

Alternatively, and that's what this patch is doing, we can keep the specialization in the common code, which will work for projects using this "legacy" CUDA interop interface, and we'll eventually clean it up when the official CUDA backend specification is finalized (which maybe should be noted in a FIXME comment).

I don't really mind doing either option so let me know which one you'd prefer, but be aware that moving it out of the common code will be breaking for other projects.

gmlueck · 2022-09-01T12:55:07Z

So the "legacy" CUDA interop is currently undocumented?

I assume we will want to deprecate and remove that legacy version once we promote the experimental API to the supported header at include/sycl/backend/cuda.hpp. Is there anything we need to do now that will help us deprecate the legacy API later? For example, do the names of the legacy APIs collide with the ones currently in "experimental"? If so, does it help to move the legacy APIs to a separate header now?

npmiller · 2022-09-01T14:00:05Z

That's correct.

Currently the only thing that conflicts between the two interfaces is the specialization of the types defined in the SYCL specification, these are already in separate type traits headers and currently users can switch between the two by using the SYCL_EXT_ONEAPI_BACKEND_CUDA_EXPERIMENTAL macro:

sycl/ext/oneapi/experimental/backend/backend_traits_cuda.hpp
sycl/detail/backend_traits_cuda.hpp

The "legacy" interface doesn't need anything else, except for the device get_native specialization in this patch, which luckily is currently a suitable specialization for both the legacy and experimental interfaces.

So when we deprecate the legacy interface, with the way things are organized currently it should be fairly straightforward to keep it around and guard it with a macro similar as above.

gmlueck · 2022-09-01T21:02:30Z

So when we deprecate the legacy interface, with the way things are organized currently it should be fairly straightforward to keep it around and guard it with a macro similar as above.

We should not require the macro to enable the "real" interface once it is available. The best outcome is that users disambiguate the version of get_native they get via the include file that they use. Can we do it that way? If we need users to define a macro, we should require "legacy" users to define the macro, however this is a potentially breaking change for those users (because they need to define a macro when they did not previously need this).

npmiller · 2022-09-02T13:28:04Z

We should not require the macro to enable the "real" interface once it is available. The best outcome is that users disambiguate the version of get_native they get via the include file that they use. Can we do it that way?

I agree, this would be ideal, the current headers aren't setup for it but I could probably re-factor them to make that work. However, the current "legacy" interface doesn't require users to include any headers, so that would still be a breaking change for them.

If we need users to define a macro, we should require "legacy" users to define the macro, however this is a potentially breaking change for those users (because they need to define a macro when they did not previously need this).

Yes, that's what I meant, currently the macro is for the "real" interface but once it's ratified we could reverse it to have the macro for the "legacy" interface, and yes that is also a breaking change.

Essentially with the way the "legacy" interface is setup we will need to do a breaking change to nicely deprecate it and support both.

And so my initial point is essentially that maybe it makes sense to hold off on that breaking change until we're ready to deprecate the "legacy" interface and do both at once, that way we can encourage users to update at the same time.

steffenlarsen · 2022-09-05T09:39:15Z

@npmiller - Precommit fails to build it. Could you please investigate why?

npmiller · 2022-09-06T09:05:21Z

I'll keep an eye on the run but the pre-commit should be fixed, the issue was having the CUDA template specialization when the CUDA backend traits are not included (meaning CUDA plugin disabled), and only seemed to show up on older gcc 7.x.

So I simply guarded the template specialization with the same macros as the backend traits include. I've also added similar macro guard for template specializations for OpenCL and LevelZero that are also in this file. In local testing these would cause the same issues with older gcc but I suspect the CI always has these two plugins enabled so it would never trip the issue.

steffenlarsen · 2022-09-06T09:40:57Z

Failure in OCL x64 is known and should be fixed with intel/llvm-test-suite#1228. I have re-approved, so let's see what Precommit says.

steffenlarsen · 2022-09-06T10:13:43Z

Precommit has been appeased. @v-klochkov | @smaslov-intel | @gmlueck - It looks like the ongoing discussion was initially considered non-blocking. Is that still the case? If so I think this is ready to be merged, but maybe the discussion should be moved to a discussion thread?

gmlueck · 2022-09-06T12:14:28Z

And so my initial point is essentially that maybe it makes sense to hold off on that breaking change until we're ready to deprecate the "legacy" interface and do both at once, that way we can encourage users to update at the same time

Yes, that makes sense to me.

v-klochkov · 2022-09-06T14:23:17Z

From my side, it is still non-blocking comment. It was added to raise a question and pay attention on that.
Having an approval from @steffenlarsen should be enough to merge.

[SYCL][CUDA] Fix get_native interop for device

84b5a61

npmiller requested a review from a team as a code owner August 26, 2022 15:22

npmiller requested review from v-klochkov and sergey-semenov August 26, 2022 15:22

[SYCL][CUDA] Test both cuda interop interface

a822600

sergey-semenov previously approved these changes Aug 30, 2022

View reviewed changes

v-klochkov requested a review from steffenlarsen September 2, 2022 17:07

steffenlarsen previously approved these changes Sep 5, 2022

View reviewed changes

[SYCL] Only specialize templates with available backend traits

21a1e11

npmiller dismissed stale reviews from steffenlarsen and sergey-semenov via 21a1e11 September 6, 2022 08:58

steffenlarsen approved these changes Sep 6, 2022

View reviewed changes

steffenlarsen merged commit 4d69c29 into intel:sycl Sep 6, 2022

fxzjshm mentioned this pull request Oct 20, 2022

[CUDA][HIP] get_native() API doesn't work for device #6635

Closed

npmiller mentioned this pull request Nov 7, 2022

[SYCL][HIP] Add interop header and devcie specialization #7145

Merged

[SYCL][CUDA] Fix get_native interop for device #6649

[SYCL][CUDA] Fix get_native interop for device #6649

Uh oh!

Conversation

npmiller commented Aug 26, 2022

Uh oh!

sergey-semenov commented Aug 26, 2022

Uh oh!

npmiller commented Aug 26, 2022

Uh oh!

v-klochkov commented Aug 30, 2022

Uh oh!

smaslov-intel commented Aug 31, 2022

Uh oh!

gmlueck commented Aug 31, 2022

Uh oh!

npmiller commented Aug 31, 2022

Uh oh!

gmlueck commented Sep 1, 2022

Uh oh!

npmiller commented Sep 1, 2022

Uh oh!

gmlueck commented Sep 1, 2022

Uh oh!

npmiller commented Sep 2, 2022

Uh oh!

steffenlarsen commented Sep 5, 2022

Uh oh!

npmiller commented Sep 6, 2022

Uh oh!

steffenlarsen commented Sep 6, 2022

Uh oh!

steffenlarsen commented Sep 6, 2022

Uh oh!

gmlueck commented Sep 6, 2022

Uh oh!

v-klochkov commented Sep 6, 2022

Uh oh!

Uh oh!