Skip to content

Can't Specify Properties in sycl_ext_oneapi_kernel_properties with Reductions #16320

Open
@HPS-1

Description

@HPS-1

Describe the bug

As suggested in #14785, we are about to deprecate parallel_for and single_task overloads from the sycl_ext_oneapi_kernel_properties extension, and use the alternative interface provided by the sycl_ext_oneapi_enqueue_functions extension. With this new interface, if a user wants to specify properties from the sycl_ext_oneapi_kernel_properties extension for a kernel, they must use a named function object which exposes the properties via get(sycl::ext::oneapi::experimental::properties_tag). (See note in this doc: https://github.com/intel/llvm/blob/974aec94af2ab81014895cf961895b5d2c06fc29/sycl/doc/extensions/experimental/sycl_ext_oneapi_enqueue_functions.asciidoc)

However, this method doesn't work when there're reductions present. For instance, in the example below, the kernel created using nd_launch won't have the device_has<...> properties as expected:

#include <sycl/sycl.hpp>

using namespace sycl;
using namespace ext::oneapi::experimental;

static constexpr auto device_has_all = device_has<
    aspect::ext_oneapi_cuda_async_barrier, aspect::custom, aspect::fp16,
    aspect::fp64, aspect::image, aspect::online_compiler, aspect::online_linker,
    aspect::queue_profiling, aspect::usm_device_allocations,
    aspect::usm_system_allocations, aspect::ext_intel_pci_address, aspect::cpu,
    aspect::gpu, aspect::accelerator, aspect::ext_intel_gpu_eu_count,
    aspect::ext_intel_gpu_subslices_per_slice,
    aspect::ext_intel_gpu_eu_count_per_subslice,
    aspect::ext_intel_max_mem_bandwidth, aspect::ext_intel_mem_channel,
    aspect::usm_atomic_host_allocations, aspect::usm_atomic_shared_allocations,
    aspect::atomic64, aspect::ext_intel_device_info_uuid,
    aspect::ext_oneapi_srgb, aspect::ext_intel_gpu_eu_simd_width,
    aspect::ext_intel_gpu_slices, aspect::ext_oneapi_native_assert,
    aspect::host_debuggable, aspect::ext_intel_gpu_hw_threads_per_eu,
    aspect::usm_host_allocations, aspect::usm_shared_allocations,
    aspect::ext_intel_free_memory, aspect::ext_intel_device_id>;

struct TestKernelHasDevice_nd_item1_2 {
  template <typename T1, typename T2> void operator()(nd_item<1>, T1&, T2&) const {}
  auto get(properties_tag) {
      return properties{device_has_all};
  }
};

int main() {
  queue Q;
  range<1> R1{1};
  nd_range<1> NDR1{R1, R1};

  auto Redu1 = reduction<int>(nullptr, plus<int>());
  auto Redu2 = reduction<float>(nullptr, multiplies<float>());

  nd_launch<class WGSizeKernel0>(Q, NDR1, TestKernelHasDevice_nd_item1_2{}, Redu1, Redu2);

  return 0;
}

To reproduce

Use the example code above and put into into a file, say /llvm/sycl/test/test.cpp, and then run
/path_to_your_workspace/llvm/build/bin/clang --driver-mode=g++ -fsycl-device-only -S -Xclang -emit-llvm /path_to_your_workspace/llvm/sycl/test/test.cpp -o temp_issue_out.txt. Then in the output file, you could see the created kernels' attribute list doesn't contain the device_has properties. For example:

Search for "WGSizeKernel0" in the output file. Notice it has attribute list #0:
image
And check #0 at the end of the file. It doesn't have the device_has<...> properties as expected:
image
(Here's an example of attribute list with these properties for reference:
image)

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions