Description
Describe the bug
As suggested in #14785, we are about to deprecate parallel_for
and single_task
overloads from the sycl_ext_oneapi_kernel_properties
extension, and use the alternative interface provided by the sycl_ext_oneapi_enqueue_functions
extension. With this new interface, if a user wants to specify properties from the sycl_ext_oneapi_kernel_properties
extension for a kernel, they must use a named function object which exposes the properties via get(sycl::ext::oneapi::experimental::properties_tag)
. (See note in this doc: https://github.com/intel/llvm/blob/974aec94af2ab81014895cf961895b5d2c06fc29/sycl/doc/extensions/experimental/sycl_ext_oneapi_enqueue_functions.asciidoc)
However, this method doesn't work when there're reductions present. For instance, in the example below, the kernel created using nd_launch
won't have the device_has<...>
properties as expected:
#include <sycl/sycl.hpp>
using namespace sycl;
using namespace ext::oneapi::experimental;
static constexpr auto device_has_all = device_has<
aspect::ext_oneapi_cuda_async_barrier, aspect::custom, aspect::fp16,
aspect::fp64, aspect::image, aspect::online_compiler, aspect::online_linker,
aspect::queue_profiling, aspect::usm_device_allocations,
aspect::usm_system_allocations, aspect::ext_intel_pci_address, aspect::cpu,
aspect::gpu, aspect::accelerator, aspect::ext_intel_gpu_eu_count,
aspect::ext_intel_gpu_subslices_per_slice,
aspect::ext_intel_gpu_eu_count_per_subslice,
aspect::ext_intel_max_mem_bandwidth, aspect::ext_intel_mem_channel,
aspect::usm_atomic_host_allocations, aspect::usm_atomic_shared_allocations,
aspect::atomic64, aspect::ext_intel_device_info_uuid,
aspect::ext_oneapi_srgb, aspect::ext_intel_gpu_eu_simd_width,
aspect::ext_intel_gpu_slices, aspect::ext_oneapi_native_assert,
aspect::host_debuggable, aspect::ext_intel_gpu_hw_threads_per_eu,
aspect::usm_host_allocations, aspect::usm_shared_allocations,
aspect::ext_intel_free_memory, aspect::ext_intel_device_id>;
struct TestKernelHasDevice_nd_item1_2 {
template <typename T1, typename T2> void operator()(nd_item<1>, T1&, T2&) const {}
auto get(properties_tag) {
return properties{device_has_all};
}
};
int main() {
queue Q;
range<1> R1{1};
nd_range<1> NDR1{R1, R1};
auto Redu1 = reduction<int>(nullptr, plus<int>());
auto Redu2 = reduction<float>(nullptr, multiplies<float>());
nd_launch<class WGSizeKernel0>(Q, NDR1, TestKernelHasDevice_nd_item1_2{}, Redu1, Redu2);
return 0;
}
To reproduce
Use the example code above and put into into a file, say /llvm/sycl/test/test.cpp, and then run
/path_to_your_workspace/llvm/build/bin/clang --driver-mode=g++ -fsycl-device-only -S -Xclang -emit-llvm /path_to_your_workspace/llvm/sycl/test/test.cpp -o temp_issue_out.txt
. Then in the output file, you could see the created kernels' attribute list doesn't contain the device_has properties. For example:
Search for "WGSizeKernel0" in the output file. Notice it has attribute list #0:
And check #0 at the end of the file. It doesn't have the device_has<...> properties as expected:
(Here's an example of attribute list with these properties for reference:
)