Description
Describe the bug
I've found a scenario where a particular simple function used by my SYCL kernel fails with either JIT or AOT SPIR-V compilation for a CPU target (although not a GPU target). Looking at SPIR-V disassembly suggests that the failure only happens when the compiler was able to recognize the function can be reduced to the OpBitReverse
operation. Breaking up the function so that the compiler doesn't recognize the OpBitReverse
optimization allows the kernel to compile for the CPU target without a problem.
To Reproduce
Call this function from a SYCL kernel (e.g. just add the call to any simple SYCL tutorial's kernel):
uint8_t reverse_byte(uint8_t a)
{
a = ((0x55 & a) << 1) | (0x55 & (a >> 1));
a = ((0x33 & a) << 2) | (0x33 & (a >> 2));
return (a << 4) | (a >> 4);
}
Compile with vanilla options:
icpx -fsycl <whatever>.cpp -o <whatever>
Runtime fails for a CPU target:
The program was built for 1 devices
Build program log for 'Intel(R) Xeon(R) W-10885M CPU @ 2.40GHz':
Compilation started
Unsupported SPIR-V module
SPIRV module requires unsupported capability 0
Compilation failed
-11 (CL_BUILD_PROGRAM_FAILURE)
NOTE: I only see this problem with a CPU target. When I target a GPU it has no problem.
Now replace that function definition with these without changing anything at the reverse_byte
call site in the kernel:
uint8_t parital_reverse_byte(uint8_t a)
{
a = ((0x55 & a) << 1) | (0x55 & (a >> 1));
return ((0x33 & a) << 2) | (0x33 & (a >> 2));
}
uint8_t reverse_byte(uint8_t a)
{
a = parital_reverse_byte(a);
return (a << 4) | (a >> 4);
}
Recompile and run. Now there are no issues at runtime.
SPIR-V disassembly (via -fsycl-device-only -fsycl-device-obj=spirv
and https://github.com/KhronosGroup/SPIRV-Tools) shows that only in the first case does it optimize the reverse_byte
function down to the OpBitReverse
operation. So my suspicion is that there is no backing implementation for OpBitReverse
for CPU targets (or at least my CPU, see above). But I don't know how to decipher "SPIRV module requires unsupported capability 0" any further to know for sure that that's the issue.
Environment:
- OS: Linux (Ubuntu 20.04.1)
- Target device and vendor: Intel(R) Xeon(R) W-10885M CPU @ 2.40GHz
- DPC++ version: Intel(R) oneAPI DPC++/C++ Compiler 2022.2.1 (2022.2.1.20221020)
- Dependencies version: n/a
Additional context
None.