Description
Describe the bug
Using an invalid sub-group size to the kernel doesn't result in a runtime error. Possibly throw a RT error as seen on Intel devices with invalid sub-group sizes. This behavior was observed for both the CUDA and HIP Plugins.
For CUDA possibly 32, for HIP possibly 64.
From the SYCL spec:
Each device supports only certain sub-group sizes as defined by info::device::sub_group_sizes. In addition, some device features may be incompatible with certain sub-group sizes. If a kernel is decorated with this attribute and then submitted to a device that does not support the sub-group size or if the kernel uses a feature that the device does not support with this sub-group size, the implementation must throw a synchronous exception with the errc::kernel_not_supported error code.
To Reproduce
#include <sycl/sycl.hpp>
int main() {
auto const& gpu_devices = sycl::device::get_devices(sycl::info::device_type::gpu);
std::cout << "Number of Root GPUs: " << gpu_devices.size() << std::endl;
for(const auto& d : gpu_devices) {
std::cout << "Found Root GPU-ID: " << d.get_info<sycl::info::device::name>() << std::endl;
std::vector<size_t> sg_sizes = d.get_info<sycl::info::device::sub_group_sizes>();
std::cout << "Supported sub-group sizes: ";
for (int i=0; i<sg_sizes.size(); i++) {
std::cout << sg_sizes[i];
}
std::cout << std::endl;
}
const int N{1024};
sycl::queue Q{sycl::gpu_selector{}};
int* ptr = sycl::malloc_device<int>(N, Q);
Q.parallel_for(N, [=](sycl::item<1> id) [[sycl::reqd_sub_group_size(8)]] { ptr[id] = id; }).wait();
return 0;
}
Environment (please complete the following information):
CUDA PI built with CUDA-11.6.2, HIP PI built with room-5.1.0
llvm compiler: ac6a4f5