Skip to content

[SYCL][Doc] Update SYCL_INTEL_data_flow_pipes extension for FPGA host… #8789

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 13 commits into from
Mar 30, 2023
Merged
Original file line number Diff line number Diff line change
Expand Up @@ -31,7 +31,7 @@ This document describes an extension that adds pipes to SYCL. Pipes are first i

== Notice

Copyright (c) 2019-2021 Intel Corporation. All rights reserved.
Copyright (c) 2019-2023 Intel Corporation. All rights reserved.

== Status

Expand All @@ -49,10 +49,25 @@ Revision: 3
== Contact
Michael Kinsner, Intel (michael 'dot' kinsner 'at' intel 'dot' com)

== Contributors

Michael Kinsner, Intel +
Shuo Niu, Intel +
Bo Lei, Intel +
Marco Jacques, Intel +
Joe Garvey, Intel +
Aditi Kumaraswamy, Intel +
Robert Ho, Intel +
Sherry Yuan, Intel +
Peter Colberg, Intel +
Zibai Wang, Intel

== Dependencies

This extension is written against the SYCL 2020 specification, Revision 3.

It also depends on the `sycl_ext_oneapi_properties` extension.

The use of blocking pipe reads or writes requires support for https://github.com/KhronosGroup/SPIRV-Registry/blob/master/extensions/INTEL/SPV_INTEL_blocking_pipes.asciidoc[SPV_INTEL_blocking_pipes] if SPIR-V is used by an implementation.

== Overview
Expand Down Expand Up @@ -183,7 +198,7 @@ class pipe {
// Non-blocking
static DataT read( bool &Success );
static void write( const DataT &Data, bool &Success );

// Static members
using value_type = DataT;
size_t min_capacity = MinCapacity;
Expand Down Expand Up @@ -644,15 +659,15 @@ Automated mechanisms are possible to provide uniquification across calls, and co

The Intel FPGA experimental `pipe` class is implemented in `sycl/ext/intel/experimental/pipes.hpp` which is included in `sycl/ext/intel/fpga_extensions.hpp`.

In the experimental API version, read/write methods take in a property list as function argument, which can contain the latency control properties `latency_anchor_id` and/or `latency_constraint`.
In the experimental API version, the device side read/write methods take in a property list as function argument, which can contain the latency control properties `latency_anchor_id` and/or `latency_constraint`.

* `sycl::ext::intel::experimental::latency_anchor_id<N>`, where `N` is an integer: An ID to associate with the current read/write function call, which can then be referenced by other `latency_constraint` properties elsewhere in the program to define relative latency constaints. ID must be unique within the application, and a diagnostic is required if that condition is not met.
* `sycl::ext::intel::experimental::latency_constraint<A, B, C>`: A tuple of three values which cause the current read/write function call to act as an endpoint of a latency constraint relative to a specified `latency_anchor_id` defined by a different instruction.
** `A` is an integer: The ID of the target anchor defined on a different instruction through a `latency_anchor_id` property.
** `B` is an enum value: The type of control from the set {`latency_control_type::exact`, `latency_control_type::max`, `latency_control_type::min`}.
** `C` is an integer: The relative clock cycle difference between the target anchor and the current function call, that the constraint should infer subject to the type of the control (exact, max, min).

=== Synopsis
=== Device side pipe read/write

[source,c++]
----
Expand Down Expand Up @@ -687,9 +702,8 @@ template <int Target, latency_control_type Type, int Cycle>
inline constexpr latency_constraint_key::value_t<Target, Type, Cycle>
latency_constraint;

template <typename Name,
typename DataT,
size_t MinCapacity = 0>
template <class Name, class DataT, int32_t MinCapacity = 0,
class PropertiesT = decltype(oneapi::experimental::properties{})>
class pipe {
// Blocking
static DataT read();
Expand All @@ -716,7 +730,7 @@ class pipe {
} // namespace sycl::ext::intel::experimental
----

=== Usage
=== Latency Control example

[source,c++]
----
Expand Down Expand Up @@ -748,6 +762,69 @@ myQueue.submit([&](handler &cgh) {
});
----

== Host Side pipe read/write

If the read/write member functions of a pipe are called from the host side, a `sycl::queue` is added to the parameters. The `memory_order` parameter is also added to the parameters for future work.

[source,c++,Host pipe read write members,linenums]
----
template <class Name, class DataT, int32_t MinCapacity = 0,
class PropertiesT = decltype(oneapi::experimental::properties{})>
class pipe {
// Blocking
static _dataT read(queue &Q, memory_order Order = memory_order::seq_cst);
static void write(queue &Q, const _dataT &Data, memory_order Order = memory_order::seq_cst);
// Non-blocking
static _dataT read(queue &Q, bool &Success, memory_order Order = memory_order::seq_cst);
static void write(queue &Q, const _dataT &Data, bool &Success, memory_order Order = memory_order::seq_cst);
}
----

== Simple example of host-to-device write&read

[source,c++,First example,linenums]
----
using default_pipe_properties = decltype(sycl::ext::oneapi::experimental::properties(sycl::ext::intel::experimental::uses_valid<true>));

// Classes used to name the kernels
class TestTask;
class H2DPipeID;
class D2HPipeID;

using H2DPipe = sycl::ext::intel::experimental::pipe<H2DPipeID, int, 10, default_pipe_properties>;
using D2HPipe = sycl::ext::intel::experimental::pipe<D2HPipeID, int, 10, default_pipe_properties>;

struct BasicKernel {
void operator()() const {
auto a = H2DPipe::read();
D2HPipe::write(a+1);
}
};

int main() {
queue q(testconfig_selector{});
H2DPipe::write(q, 1);

  q.submit([&](handler &h) {
    h.single_task<TestTask>(BasicKernel{});
  });
auto b = D2HPipe::read(q);
std::cout << b << std::endl; // It should print 2;
}
----

== Issues for experimental API

. Although the memory_order parameter hasn't been used in the implementation, the choice of seq_cst for the default value of the `sycl::memory_order` parameter of the read/write functions is still open for discussion. While seq_cst is more consistent with C++ atomics, it is a change from how pipes work today, which is equivalent to memory_order::relaxed. Another consideration is that SYCL 2020 atomic_ref uses a third approach where the default must be specified as a template parameter of the class itself.
+
--
*RESOLUTION*: Not resolved. Still under discussion.
--

== Future work

. In the future, the `sycl::memory_order` parameter of read/write functions will control how other memory accesses, including regular, non-atomic memory accesses, are to be ordered around the pipe read/write operation. The default memory order is `sycl::memory_order::seq_cst`. Currently, `sycl::memory_order` parameter is defined but not being used in the implementation.

== Feature test macro

This extension provides a feature-test macro as described in the core SYCL
Expand Down Expand Up @@ -776,6 +853,7 @@ extension's APIs the implementation supports.
|2|2019-11-13|Michael Kinsner|Incorporate feedback
|3|2020-04-27|Michael Kinsner|Clarify that pipe operations behave as-if they are relaxed atomic operations. Make SYCL2020 the baseline
|4|2021-12-02|Shuo Niu|Add experimental latency control API
|5|2023-03-27|Zibai Wang|Experimental API change only. Add memory order parameter and compile-time properties. Add host pipe read/write functions.
|========================================

//************************************************************************
Expand Down