Add an optional diagnostic for the use of double-precision ops in kernels

**Is your feature request related to a problem? Please describe**

In some (many?) cases, the kernels are intended to be run on a GPU, most of which have reduced FP64 performance compared to FP32. However, it's very easy in C++ to accidentally introduce double-precision arithmetic.

Such issues can be detected with a profiler, but it might be helpful to have an opt-in, compile-time diagnostic to warn about double-precision ops in the device kernels. No warning is to be emitted unless the user explicitly requested it.

[Toy example](https://gist.github.com/al42and/098adaf863827411703df6cae9d85b38) (based on real cases). Excerpt:

```cpp
  sycl::event ev = queue.submit([&](sycl::handler &cgh) {
    auto buffer_dev = buffer.get_access<sycl::access_mode::write>(cgh);
    cgh.parallel_for<class DummyKernel>(
        sycl::range<1>{numThreads}, [=](sycl::id<1> threadId) {
          int x = threadId.get(0);
          float x2 = x * 2.0;      // Forgot F suffix
          float y = fma(x, 2, x2); // Used fma instead of sycl::fma<float>
          buffer_dev[threadId] = y;
        });
  });
```

Part of the SPIR-V output (`clang++ -fsycl -ffast-math -Wall simple.cpp -O3 -o simple && SYCL_DUMP_IMAGES=1 ./simple`):

```
   %conv_i_i = OpUConvert %uint %33
  %conv2_i_i = OpConvertSToF %double %conv_i_i
    %mul_i_i = OpFMul %double %conv2_i_i %double_2
  %conv3_i_i = OpFConvert %float %mul_i_i
  %conv5_i_i = OpFConvert %double %conv3_i_i
         %47 = OpExtInst %double %1 fma %conv2_i_i %double_2 %conv5_i_i
  %conv6_i_i = OpFConvert %float %47
```

So, we're getting unnecessary conversions between FP64 and FP32, and FP64 ops when FP32 were probably intended.

**Describe the solution you would like**

Have an optional `-Wsycl-fp64-on-device` warning switch.

**Describe alternatives you have considered**

- A kernel attribute enabling the diagnostic for a single kernel?
- Things like `float x2 = x * 2.0` probably can get be optimized in the `ffast-math` mode?

**Additional context**

The example above is compiled correctly, so it's not a bug. But it is likely not what the user intended, so a diagnostic would be a quick way to catch such problems without even having to run a profiler.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add an optional diagnostic for the use of double-precision ops in kernels #5783

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Add an optional diagnostic for the use of double-precision ops in kernels #5783

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions