Skip to content

Create SYCL events for submissions to native queues #13706

@fknorr

Description

@fknorr

Is your feature request related to a problem? Please describe

SYCL allows submitting native backend operations to (in-order) queues without paying the synchronization overhead cost of a host_task through sycl::get_native(queue).

sycl::queue q(device, sycl::property::queue::in_order{});
const auto evt1 = q.submit(/* some SYCL operation */);
auto cuda_stream = sycl::get_native<sycl::backend::ext_oneapi_cuda>(q);
cudaNativeFunctionAsync(..., cuda_stream); // does not wait for evt1 on the host, only on device!
const auto evt2 = ??;

As seen above we can get a sycl::event for the SYCL operation / kernel submission, but not for the cudaNative submission. Such an event would however be desirable so that another operation (on a different queue) could specify a dependency on that exact submission, something which is not possible when manually doing cudaEventRecord.

Describe the solution you would like

Multiple ideas, in descending complexity:

  1. Support for AdaptiveCpp's enqueue_custom_operation, see docs
  2. An extension function sycl::event sycl::queue::ext_record_event() that performs the equivalent of a cudaEventRecord on an in-order queue
  3. A working implementation of sycl::make_event<backend::ext_oneapi_cuda>(cudaEvent_t, context &)
  4. A pointer on what internal function needs to be called as workaround in the meantime

Describe alternatives you have considered

I have attempted

sycl::event record_cuda_event(sycl::queue &queue) {
    const auto stream = sycl::get_native<sycl::backend::ext_oneapi_cuda>(queue);
    cudaEvent_t event;
    cudaEventCreateWithFlags(&event, cudaEventDisableTiming);
    cudaEventRecord(event, stream);
    return sycl::detail::make_event(sycl::detail::pi::cast<pi_native_handle>(event), queue.get_context(), sycl::backend::ext_oneapi_cuda);
}

but the returned event does not make progress when queried using event.get_infosycl::info::event::command_execution_status()`.

Using the official API

return sycl::make_event<sycl::backend::ext_oneapi_cuda>(event, context);

instead fails to compile with

include/sycl/backend.hpp:356:1: note: candidate template ignored: requirement 'detail::InteropFeatureSupportMap<sycl::backend::ext_oneapi_cuda>::MakeEvent == true' was not satisfied [with Backend = sycl::backend::ext_oneapi_cuda]

Additional context

Using host_task as a replacement is not desirable because it needs to wait (on the host) for the previous operations on the (in-order) queue to complete, negating the latency-hiding benefits of eagerly submitting device work in-order.

Please advise if there is any workaround using (non-portable / unstable) internal APIs at the moment to create an event from such a manual submission, or to convert (wrap) a cudaEvent_t to a sycl::event so that other in-order queues may wait on it.

Above code / workarounds was tried with DPC++ e330855 (May 7, 2024).

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions