Description
Is your feature request related to a problem? Please describe
In some (many?) cases, the kernels are intended to be run on a GPU, most of which have reduced FP64 performance compared to FP32. However, it's very easy in C++ to accidentally introduce double-precision arithmetic.
Such issues can be detected with a profiler, but it might be helpful to have an opt-in, compile-time diagnostic to warn about double-precision ops in the device kernels. No warning is to be emitted unless the user explicitly requested it.
Toy example (based on real cases). Excerpt:
sycl::event ev = queue.submit([&](sycl::handler &cgh) {
auto buffer_dev = buffer.get_access<sycl::access_mode::write>(cgh);
cgh.parallel_for<class DummyKernel>(
sycl::range<1>{numThreads}, [=](sycl::id<1> threadId) {
int x = threadId.get(0);
float x2 = x * 2.0; // Forgot F suffix
float y = fma(x, 2, x2); // Used fma instead of sycl::fma<float>
buffer_dev[threadId] = y;
});
});
Part of the SPIR-V output (clang++ -fsycl -ffast-math -Wall simple.cpp -O3 -o simple && SYCL_DUMP_IMAGES=1 ./simple
):
%conv_i_i = OpUConvert %uint %33
%conv2_i_i = OpConvertSToF %double %conv_i_i
%mul_i_i = OpFMul %double %conv2_i_i %double_2
%conv3_i_i = OpFConvert %float %mul_i_i
%conv5_i_i = OpFConvert %double %conv3_i_i
%47 = OpExtInst %double %1 fma %conv2_i_i %double_2 %conv5_i_i
%conv6_i_i = OpFConvert %float %47
So, we're getting unnecessary conversions between FP64 and FP32, and FP64 ops when FP32 were probably intended.
Describe the solution you would like
Have an optional -Wsycl-fp64-on-device
warning switch.
Describe alternatives you have considered
- A kernel attribute enabling the diagnostic for a single kernel?
- Things like
float x2 = x * 2.0
probably can get be optimized in theffast-math
mode?
Additional context
The example above is compiled correctly, so it's not a bug. But it is likely not what the user intended, so a diagnostic would be a quick way to catch such problems without even having to run a profiler.