Skip to content

Conversation

npmiller
Copy link
Contributor

This patch fixes: #6635

In #6483, the implementation of get_native for device for the CUDA plugin was mistakenly moved to the experimental interface header, and so it was no longer available for the regular interface, causing build issues.

For the CUDA plugin there is currently two interfaces for the CUDA interop, the "legacy" one which is used by projects such as oneMKL and oneDNN, and the "experimental" one, defined in the sycl/ext/oneapi/experimental/backend/cuda.hpp header which implements the interop as described in the CUDA backend specification proposed here: KhronosGroup/SYCL-Docs#197

@npmiller npmiller requested a review from a team as a code owner August 26, 2022 15:22
@sergey-semenov
Copy link
Contributor

Thanks for the fix! Could you please add a test as well?

@npmiller
Copy link
Contributor Author

I've updated the test for the experimental interface so it covers both the legacy and experimental CUDA interop interfaces.

sergey-semenov
sergey-semenov previously approved these changes Aug 30, 2022
@v-klochkov
Copy link
Contributor

Non-blocking comment:
In my opinion, it may be better to keep cuda specific functions in "sycl\include\sycl\ext\oneapi\backend\cuda.hpp" (need to create it near "sycl\include\sycl\ext\oneapi\backend\level_zero.hpp"

Tagging @smaslov-intel and @bader for more opinions.

@smaslov-intel
Copy link
Contributor

Non-blocking comment: In my opinion, it may be better to keep cuda specific functions in "sycl\include\sycl\ext\oneapi\backend\cuda.hpp" (need to create it near "sycl\include\sycl\ext\oneapi\backend\level_zero.hpp"

Tagging @smaslov-intel and @bader for more opinions.

There is already sycl/include/sycl/ext/oneapi/experimental/backend/cuda.hpp from where the specialization is being moved from. I don't feel strongly but think it'd be good to keep everything backend-specific isolated in their respectful support headers. @gmlueck what do you think from SYCL spec perspective?

@gmlueck
Copy link
Contributor

gmlueck commented Aug 31, 2022

There is already sycl/include/sycl/ext/oneapi/experimental/backend/cuda.hpp from where the specialization is being moved from. I don't feel strongly but think it'd be good to keep everything backend-specific isolated in their respectful support headers. @gmlueck what do you think from SYCL spec perspective?

I agree. The spec says that an application must include the backend interop header in order to use the backend interop APIs. Therefore, it's better to declare the CUDA interop APIs in the CUDA backend interop header.

@npmiller
Copy link
Contributor Author

So to clarify a bit, currently we have the two following headers regarding CUDA interop:

  • include/sycl/ext/oneapi/experimental/backend/cuda.hpp
  • include/sycl/backend/cuda.hpp

The first one implements the backend interop in the way described by the CUDA backend specification in the following PR KhronosGroup/SYCL-Docs#197, in the long term the hope is that once this specification makes it into the main SYCL spec, this experimental header would be moved to include/sycl/backend/cuda.hpp, which is the correct spot according to the SYCL specification, and this would be a breaking change which would force other projects such as oneMKL and oneDNN to update their code.

Currently these projects are not using this experimental header, but are using older "legacy" CUDA interop.

This older CUDA interop interface was mostly implemented using the generic interop template functions so didn't really make use of the target specific interop header, in fact this get_native function was originally in the common device.hpp header.

And the second header listed above, is currently empty and marked as deprecated, for two reasons, first because it wasn't needed for the CUDA interop to work, and secondly because this header would in theory be reserved for a backend specified in the main SYCL specification, so in theory this file should have been in include\sycl\ext\oneapi\backend\cuda.hpp to begin with:

__SYCL_WARNING("sycl/backend/cuda.hpp is deprecated and no required anymore")

So circling back to this patch, technically the proper solution here would be to create a new include\sycl\ext\oneapi\backend\cuda.hpp header for the "legacy" CUDA interop interface and move the template specialization there (and maybe duplicate it in the experimental header too as it uses the same specialization), but this is a breaking change for projects such as oneMKL and oneDNN which would then need to update their code to include this new header.

Alternatively, and that's what this patch is doing, we can keep the specialization in the common code, which will work for projects using this "legacy" CUDA interop interface, and we'll eventually clean it up when the official CUDA backend specification is finalized (which maybe should be noted in a FIXME comment).

I don't really mind doing either option so let me know which one you'd prefer, but be aware that moving it out of the common code will be breaking for other projects.

@gmlueck
Copy link
Contributor

gmlueck commented Sep 1, 2022

So the "legacy" CUDA interop is currently undocumented?

I assume we will want to deprecate and remove that legacy version once we promote the experimental API to the supported header at include/sycl/backend/cuda.hpp. Is there anything we need to do now that will help us deprecate the legacy API later? For example, do the names of the legacy APIs collide with the ones currently in "experimental"? If so, does it help to move the legacy APIs to a separate header now?

@npmiller
Copy link
Contributor Author

npmiller commented Sep 1, 2022

That's correct.

Currently the only thing that conflicts between the two interfaces is the specialization of the types defined in the SYCL specification, these are already in separate type traits headers and currently users can switch between the two by using the SYCL_EXT_ONEAPI_BACKEND_CUDA_EXPERIMENTAL macro:

  • sycl/ext/oneapi/experimental/backend/backend_traits_cuda.hpp
  • sycl/detail/backend_traits_cuda.hpp

The "legacy" interface doesn't need anything else, except for the device get_native specialization in this patch, which luckily is currently a suitable specialization for both the legacy and experimental interfaces.

So when we deprecate the legacy interface, with the way things are organized currently it should be fairly straightforward to keep it around and guard it with a macro similar as above.

@gmlueck
Copy link
Contributor

gmlueck commented Sep 1, 2022

So when we deprecate the legacy interface, with the way things are organized currently it should be fairly straightforward to keep it around and guard it with a macro similar as above.

We should not require the macro to enable the "real" interface once it is available. The best outcome is that users disambiguate the version of get_native they get via the include file that they use. Can we do it that way? If we need users to define a macro, we should require "legacy" users to define the macro, however this is a potentially breaking change for those users (because they need to define a macro when they did not previously need this).

@npmiller
Copy link
Contributor Author

npmiller commented Sep 2, 2022

We should not require the macro to enable the "real" interface once it is available. The best outcome is that users disambiguate the version of get_native they get via the include file that they use. Can we do it that way?

I agree, this would be ideal, the current headers aren't setup for it but I could probably re-factor them to make that work. However, the current "legacy" interface doesn't require users to include any headers, so that would still be a breaking change for them.

If we need users to define a macro, we should require "legacy" users to define the macro, however this is a potentially breaking change for those users (because they need to define a macro when they did not previously need this).

Yes, that's what I meant, currently the macro is for the "real" interface but once it's ratified we could reverse it to have the macro for the "legacy" interface, and yes that is also a breaking change.

Essentially with the way the "legacy" interface is setup we will need to do a breaking change to nicely deprecate it and support both.

And so my initial point is essentially that maybe it makes sense to hold off on that breaking change until we're ready to deprecate the "legacy" interface and do both at once, that way we can encourage users to update at the same time.

steffenlarsen
steffenlarsen previously approved these changes Sep 5, 2022
@steffenlarsen
Copy link
Contributor

@npmiller - Precommit fails to build it. Could you please investigate why?

@npmiller
Copy link
Contributor Author

npmiller commented Sep 6, 2022

I'll keep an eye on the run but the pre-commit should be fixed, the issue was having the CUDA template specialization when the CUDA backend traits are not included (meaning CUDA plugin disabled), and only seemed to show up on older gcc 7.x.

So I simply guarded the template specialization with the same macros as the backend traits include. I've also added similar macro guard for template specializations for OpenCL and LevelZero that are also in this file. In local testing these would cause the same issues with older gcc but I suspect the CI always has these two plugins enabled so it would never trip the issue.

@steffenlarsen
Copy link
Contributor

Failure in OCL x64 is known and should be fixed with intel/llvm-test-suite#1228. I have re-approved, so let's see what Precommit says.

@steffenlarsen
Copy link
Contributor

Precommit has been appeased. @v-klochkov | @smaslov-intel | @gmlueck - It looks like the ongoing discussion was initially considered non-blocking. Is that still the case? If so I think this is ready to be merged, but maybe the discussion should be moved to a discussion thread?

@gmlueck
Copy link
Contributor

gmlueck commented Sep 6, 2022

And so my initial point is essentially that maybe it makes sense to hold off on that breaking change until we're ready to deprecate the "legacy" interface and do both at once, that way we can encourage users to update at the same time

Yes, that makes sense to me.

@v-klochkov
Copy link
Contributor

From my side, it is still non-blocking comment. It was added to raise a question and pay attention on that.
Having an approval from @steffenlarsen should be enough to merge.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[CUDA][HIP] get_native() API doesn't work for device
6 participants