-
Notifications
You must be signed in to change notification settings - Fork 769
[SYCL][Bindless] Update and add support for SPV_INTEL_bindless_image extension new revision #13753
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SYCL][Bindless] Update and add support for SPV_INTEL_bindless_image extension new revision #13753
Conversation
… extension new revision Add support to emit instructions that convert handles to images, samplers and sampled images
There will be a corresponding SPIRV-LLVM-Translator pull request soon. |
Now up here: KhronosGroup/SPIRV-LLVM-Translator#2559 Needs to be merged first and then eventually pulled into DPC++. Then two tests in this PR should pass. |
// Image types used for generating SPIR-V | ||
#ifdef __SYCL_DEVICE_ONLY__ | ||
template <int NDims> | ||
using OCLImageTyRead = |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this part of the specification or should it be moved into the detail
namespace?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, yeah it should be in the detail namespace. Moved it.
return __invoke__ImageReadLod<DataT>(imageHandle.raw_handle, coords, level); | ||
return __invoke__ImageReadLod<DataT>( | ||
CONVERT_HANDLE_TO_SAMPLED_IMAGE(imageHandle.raw_handle, | ||
OCLImageTyRead<coordSize>), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why wouldn't you put OCLImageTyRead
inside CONVERT_HANDLE_TO_SAMPLED_IMAGE
? Or better yet inside __invoke__ImageReadLod
...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I do agree that putting OCLImageTyRead
inside CONVERT_HANDLE_TO_SAMPLED_IMAGE
is better. But I am not sure about putting it inside __invoke__ImageReadLod
. That would require also putting sampled_opencl_image_type
inside __invoke__ImageReadLod
Because these functions are used to also create the PTX instructions for the CUDA backend I feel putting it inside __invoke__ImageReadLod
would complicate things. What do you think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have moved OCLImageTyRead
into CONVERT_HANDLE_TO_SAMPLED_IMAGE
for now unless something different is decided.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Technically, this file is outside SYCL RT codeownership, so the decision is ultimately yours :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I will leave it as is for now. Potentially revisit this later.
@@ -230,6 +230,18 @@ template <typename SampledType, typename TempRetT, typename TempArgT> | |||
extern __DPCPP_SYCL_EXTERNAL TempRetT __spirv_ImageSampleCubemap(SampledType, | |||
TempArgT); | |||
|
|||
template <typename RetT> | |||
extern __DPCPP_SYCL_EXTERNAL RetT | |||
__spirv_ConvertHandleToImageINTEL(unsigned long); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
unsigned long is 32bit on windows?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can we use following code?
template <class RetT, class HandleT>
extern __DPCPP_SYCL_EXTERNAL RetT __spirv_ConvertHandleToImageINTEL(HandleT);
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point. Added.
…side 'CONVERT_HANDLE_TO_SAMPLED_IMAGE' as sampled images only ever use read image type
#if defined(__NVPTX__) | ||
#define FETCH_UNSAMPLED_IMAGE(DataT, raw_handle, coords) \ | ||
__invoke__ImageFetch<DataT>(raw_handle, coords) | ||
#elif defined(__SPIR__) | ||
#define FETCH_UNSAMPLED_IMAGE(DataT, raw_handle, coords) \ | ||
__invoke__ImageRead<DataT>(raw_handle, coords) | ||
#else | ||
#define FETCH_UNSAMPLED_IMAGE(DataT, raw_handle, coords) \ | ||
__invoke__ImageFetch<DataT>(raw_handle, coords) | ||
#endif |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are these all the same?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not quite. In the CUDA backend, __invoke__ImageFetch
is used for a basic unsampled image read. When this is encountered by llvm-spirv
, it crashes as it tries to emit, OpImageFetch
, which is not implemented in llvm-spirv
. But for a basic unsampled image read, we want to emit OpImageRead
instead. So __invoke__ImageRead
is used when compiling to SPIR-V.
Interestingly, and slightly annoyingly, OpImageFetch
must always return a vector of four components. But that is prob not a big deal and is something to think about much later when adding support for fetching data from sampled images.
Also, turns out, llvm-spirv
does not have an assert to check if an instruction is fully implemented before trying to call Inst->init();
at line 185 of SPIRVInstruction.h
and jumping to random memory. That was a bit annoying.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have removed one of the redundant branches.
@@ -10394,7 +10394,8 @@ static void getOtherSPIRVTransOpts(Compilation &C, | |||
",+SPV_INTEL_fpga_argument_interfaces" | |||
",+SPV_INTEL_fpga_invocation_pipelining_attributes" | |||
",+SPV_INTEL_fpga_latency_control" | |||
",+SPV_INTEL_task_sequence"; | |||
",+SPV_INTEL_task_sequence" | |||
",+SPV_INTEL_bindless_images"; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you also update clang/test/Driver/sycl-spirv-ext.c
to cover your new entry?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, I missed that. Will do.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
Friendly ping @aelovikov-intel |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Comments about tests could be resolved in a different PR.
|
||
// Macros are required because it is not legal for a function to return | ||
// a variable of type 'opencl_image_type'. | ||
#if defined(__SPIR__) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: I'd personally merge the #if
s into a single one providing three macros at once. Can be ignored.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, actually after combining them they did look better. Done.
int main() { | ||
|
||
sycl::device dev; | ||
sycl::queue q(dev); | ||
auto ctxt = q.get_context(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think we need any of this for this test. One can use SYCL_EXTERNAL
and only run device compilation to produce much compact IR. See https://github.com/intel/llvm/blob/sycl/sycl/test/check_device_code/vector/vector_as.cpp for an example.
try { | ||
q.submit([&](sycl::handler &cgh) { | ||
cgh.parallel_for<image_addition>(1, [=](sycl::id<1> id) { | ||
#ifdef __SYCL_DEVICE_ONLY__ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That can become troublesome if customers have to structure their code like this. In the worst case it can lead to different lambda captures between host/device.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Customers do not need to structure their code like this. I was just intending to invoke the new __spirv_
builtins manually to verify they worked. The guards are needed as opencl image types do not compile on host.
There is the other test, sycl/test/extensions/bindless_images.cpp
which does also check that the SPIR-V instructions are emitted correctly, but it does not check that __spirv_ConvertHandleToSamplerINTEL
works correctly as it is currently not used in the SYCL bindless extension. So I wanted to keep this to ensure all of the new builtins were covered.
Good idea. I will double check with my team. Should be fine. |
@aelovikov-intel On your comment that I can use If ok for you, can you approve and I will move forward to getting this merged. |
The updated SPV_INTEL_bindless_images spec can be seen in this PR: intel/llvm#13753 (cherry picked from commit e9a9a3f)
@intel/llvm-gatekeepers Should be ready to merge. There are some failing tests but all of them look unrelated to the changes here. |
The issue causing no_sycl_hpp_in_e2e_tests.cpp have been addressed by #13889. |
…ion (KhronosGroup#2559) The updated SPV_INTEL_bindless_images spec can be seen in this PR: intel/llvm#13753
…ion (KhronosGroup#2559) The updated SPV_INTEL_bindless_images spec can be seen in this PR: intel/llvm#13753
The updated SPV_INTEL_bindless_images spec can be seen in this PR: intel/llvm#13753 Original PR KhronosGroup#2559
…ion (KhronosGroup#2559) The updated SPV_INTEL_bindless_images spec can be seen in this PR: intel/llvm#13753
…ion (#2559) (#3002) The updated SPV_INTEL_bindless_images spec can be seen in this PR: intel/llvm#13753
…ion (#2559) (#3008) The updated SPV_INTEL_bindless_images spec can be seen in this PR: intel/llvm#13753
…#2559) The updated SPV_INTEL_bindless_images spec can be seen in this PR: intel/llvm#13753
…on (KhronosGroup#2559) The updated SPV_INTEL_bindless_images spec can be seen in this PR: intel/llvm#13753
…ion (#2559) (#3009) The updated SPV_INTEL_bindless_images spec can be seen in this PR: intel/llvm#13753
…#2559) The updated SPV_INTEL_bindless_images spec can be seen in this PR: intel/llvm#13753
…on (KhronosGroup#2559) The updated SPV_INTEL_bindless_images spec can be seen in this PR: intel/llvm#13753
…ion (#2559) (#3016) The updated SPV_INTEL_bindless_images spec can be seen in this PR: intel/llvm#13753
Add support to emit instructions that convert handles to images, samplers and sampled images