-
Notifications
You must be signed in to change notification settings - Fork 803
Description
Is your feature request related to a problem? Please describe
SYCL allows submitting native backend operations to (in-order) queues without paying the synchronization overhead cost of a host_task
through sycl::get_native(queue)
.
sycl::queue q(device, sycl::property::queue::in_order{});
const auto evt1 = q.submit(/* some SYCL operation */);
auto cuda_stream = sycl::get_native<sycl::backend::ext_oneapi_cuda>(q);
cudaNativeFunctionAsync(..., cuda_stream); // does not wait for evt1 on the host, only on device!
const auto evt2 = ??;
As seen above we can get a sycl::event
for the SYCL operation / kernel submission, but not for the cudaNative
submission. Such an event would however be desirable so that another operation (on a different queue) could specify a dependency on that exact submission, something which is not possible when manually doing cudaEventRecord
.
Describe the solution you would like
Multiple ideas, in descending complexity:
- Support for AdaptiveCpp's
enqueue_custom_operation
, see docs - An extension function
sycl::event sycl::queue::ext_record_event()
that performs the equivalent of acudaEventRecord
on an in-order queue - A working implementation of
sycl::make_event<backend::ext_oneapi_cuda>(cudaEvent_t, context &)
- A pointer on what internal function needs to be called as workaround in the meantime
Describe alternatives you have considered
I have attempted
sycl::event record_cuda_event(sycl::queue &queue) {
const auto stream = sycl::get_native<sycl::backend::ext_oneapi_cuda>(queue);
cudaEvent_t event;
cudaEventCreateWithFlags(&event, cudaEventDisableTiming);
cudaEventRecord(event, stream);
return sycl::detail::make_event(sycl::detail::pi::cast<pi_native_handle>(event), queue.get_context(), sycl::backend::ext_oneapi_cuda);
}
but the returned event does not make progress when queried using event.get_infosycl::info::event::command_execution_status()`.
Using the official API
return sycl::make_event<sycl::backend::ext_oneapi_cuda>(event, context);
instead fails to compile with
include/sycl/backend.hpp:356:1: note: candidate template ignored: requirement 'detail::InteropFeatureSupportMap<sycl::backend::ext_oneapi_cuda>::MakeEvent == true' was not satisfied [with Backend = sycl::backend::ext_oneapi_cuda]
Additional context
Using host_task
as a replacement is not desirable because it needs to wait (on the host) for the previous operations on the (in-order) queue to complete, negating the latency-hiding benefits of eagerly submitting device work in-order.
Please advise if there is any workaround using (non-portable / unstable) internal APIs at the moment to create an event from such a manual submission, or to convert (wrap) a cudaEvent_t
to a sycl::event
so that other in-order queues may wait on it.
Above code / workarounds was tried with DPC++ e330855 (May 7, 2024).