Skip to content

[SYCL][Doc] Update SYCL_INTEL_data_flow_pipes extension for FPGA host pipe support #5838

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 3 commits into from
Closed
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
Expand Up @@ -35,7 +35,7 @@ This document describes an extension that adds pipes to SYCL. Pipes are first i

== Notice

Copyright (c) 2019-2021 Intel Corporation. All rights reserved.
Copyright (c) 2019-2022 Intel Corporation. All rights reserved.

== Status

Expand All @@ -53,10 +53,24 @@ Revision: 3
== Contact
Michael Kinsner, Intel (michael 'dot' kinsner 'at' intel 'dot' com)

== Contributors

Michael Kinsner, Intel +
Shuo Niu, Intel +
Bo Lei, Intel +
Marco Jacques, Intel +
Joe Garvey, Intel +
Aditi Kumaraswamy, Intel +
Robert Ho, Intel +
Sherry Yuan, Intel +
Peter Colberg, Intel

== Dependencies

This extension is written against the SYCL 2020 specification, Revision 3.

It also depends on the `sycl_ext_oneapi_properties` extension.

The use of blocking pipe reads or writes requires support for https://github.com/KhronosGroup/SPIRV-Registry/blob/master/extensions/INTEL/SPV_INTEL_blocking_pipes.asciidoc[SPV_INTEL_blocking_pipes] if SPIR-V is used by an implementation.

== Overview
Expand Down Expand Up @@ -116,7 +130,7 @@ A pipe type is a specialization of the pipe class:
----
template <typename name,
typename dataT,
size_t min_capacity = 0>
typename propertiesT = properties<>>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Using properties here is nice, but this is an API-breaking change to a supported extension. Existing code using the old API will no longer compiler, right?

Copy link
Contributor

@sherry-yuan sherry-yuan Mar 21, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Its not possible to keep both class variant at the same time in experimental.

// existing implementation
template <class _name, class _dataT, int32_t _min_capacity = 0> 
class pipe {

};
// updated implementation
template <class _name, class _dataT, typename PropertyList>
class pipe {

};

Will give error:

./Playground/file0.cpp:4:62: error: template parameter 'int _min_capacity'
    4 | template <class _name, class _dataT, int32_t _min_capacity = 0>
      |                                                              ^
./Playground/file0.cpp:10:7: note: redeclared here as 'class PropertyList'
   10 | class pipe {
      |       ^~~~

The choice is between: breaking ABI vs defining a new name/namespace for pipes

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is exactly the difference between an "experimental" extension vs. a "supported" one. If it is "supported", we promise not to break API compatibility with existing code without going through a deprecation process.

If the FPGA team wants the freedom to change APIs without deprecation, we should consider making this an experimental API (which includes moving it to the the experimental namespace).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Indeed this breaks the ABI, and I don't see any trick using SFINAE via std::enable_if to retain compatibility.

@GarveyJoe, am I missing anything obvious?

@gmlueck, would you accept a global macro to opt pipe into the properties parameter, along with a deprecation notice for the _min_capacity parameter when the macro is undefined? The properties parameter would become the default at some later point, after compile-time constant properties have been moved out of experimental.

Copy link
Contributor Author

@pcolberg pcolberg Mar 21, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is exactly the difference between an "experimental" extension vs. a "supported" one. If it is "supported", we promise not to break API compatibility with existing code without going through a deprecation process.

If the FPGA team wants the freedom to change APIs without deprecation, we should consider making this an experimental API (which includes moving it to the the experimental namespace).

Yes, this feature would be implemented in sycl::ext::intel::experimental::pipe for now, but that still leaves the question of how to allow for a deprecation period in which both the old and the new template parameter are supported once this is moved out of experimental.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@pcolberg @GarveyJoe @gmlueck Is there anything we still need to resolve in this conversation?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The conclusion was to retain _min_capacity as the third parameter and add compile-time properties as a fourth parameter, which has been merged in #5886 for the experimental namespace.

class pipe;
----

Expand All @@ -129,7 +143,7 @@ A difference in any of the three template parameters identifies a different pipe
using pipe<class foo, int>;
using pipe<class bar, int>;
using pipe<class bar, float>;
using pipe<class bar, float, 5>;
using pipe<class bar, float, decltype(properties{min_capacity<5>})>;
----


Expand Down Expand Up @@ -174,19 +188,21 @@ The pipe class exposes static member functions for writing a data word to a pipe

Blocking and non-blocking forms of the read and write members are defined, with the form chosen based on overload resolution.

The `sycl::memory_order` parameter of read/write functions controls how other memory accesses, including regular, non-atomic memory accesses, are to be ordered around the pipe read/write operation. The default memory order is `sycl::memory_order::seq_cst`.

[source,c++,Read write members,linenums]
----
template <typename name,
typename dataT,
size_t min_capacity = 0>
typename propertiesT = properties<>>
class pipe {
// Blocking
static dataT read();
static void write( const dataT &data );
static dataT read( memory_order order = memory_order::seq_cst );
static void write( const dataT &data, memory_order order = memory_order::seq_cst );

// Non-blocking
static dataT read( bool &success_code );
static void write( const dataT &data, bool &success_code );
static dataT read( bool &success_code, memory_order order = memory_order::seq_cst );
static void write( const dataT &data, bool &success_code, memory_order order = memory_order::seq_cst );
}
----

Expand All @@ -196,7 +212,7 @@ The template parameters of the device type are defined as:

* `name`: Type that is the basis of pipe identification. Typically a user-defined class, in a user namespace. Forward declaration of the type is sufficient, and the type does not need to be defined.
* `dataT`: The type of data word/packet contained within a pipe. This is the data type that is read during a successful `pipe::read` operation, or written during a successful `pipe::write` operation. The type must be standard layout and trivially copyable.
* `min_capacity`: User defined minimum number of words in units of `dataT` that the pipe must be able to store without any being read out. A minimum capacity is required in some algorithms to avoid deadlock, or for performance tuning. An implementation can include more capacity than this parameter, but not less.
* `propertiesT`: The list of properties that are associated with the pipe.

== Pipe types and {cpp} scope

Expand Down Expand Up @@ -254,6 +270,44 @@ Type aliases in {cpp} through the `using` mechanism do not change the type of a
pipe<type_alias, int>::write(0);
----

== Host pipe read/write

The read/write member functions of a host pipe have different signatures when they are called from the host side, in which case a `sycl::queue` is added to the parameters.

[source,c++,Host pipe read write members,linenums]
----
template <typename name,
typename dataT,
typename propertiesT = properties<>>
class pipe {
// Blocking
static dataT read( queue &q, memory_order order = memory_order::seq_cst );
static void write( queue &q, const dataT &data, memory_order order = memory_order::seq_cst );

// Non-blocking
static dataT read( queue &q, bool &success_code, memory_order order = memory_order::seq_cst );
static void write( queue &q, const dataT &data, bool &success_code, memory_order order = memory_order::seq_cst );
}
----

== Simple example of host-to-device write

[source,c++,First example,linenums]
----
int data = 3;
using pipe_prop = decltype(experimental::properties{min_capacity<5>})
using my_pipe = pipe<class some_pipe, int>;
myQueue.submit([&](handler& cgh) {
// enqueue kernels
});
my_pipe::write( myQueue, data);

myQueue.submit([&](handler& cgh) {
// enqueue a kernel that uses data previously written from host
int data = my_pipe::read();
});
----

== Host pipe map/unmap

Pipes expose two additional static member functions that are available within host code, and which map to the OpenCL C host pipe extension map/unmap interface. These member functions provide higher bandwidth or otherwise more efficient communication on some platforms, by allowing block transfers of larger data sets.
Expand All @@ -262,7 +316,7 @@ Pipes expose two additional static member functions that are available within ho
----
template <typename name,
typename dataT,
size_t min_capacity = 0>
typename propertiesT = properties<>>
class pipe {
template <pipe_property::writeable host_writeable>
static dataT* map(size_t requested_size, size_t &mapped_size);
Expand Down Expand Up @@ -303,7 +357,7 @@ Multiple reads or multiple writes to the same pipe from more than one kernel are
When there are accesses to a pipe from different work-items or host threads, the order of data written to or read from the pipe is not defined. Specifically, regarding multiple accesses to the same pipe:

1. *Accesses to a single pipe within a single work-item of a kernel or thread of the host program:* Operations on the same pipe occur in program order with respect to the work-item or host thread. No "concurrent" accesses or reordering of accesses are observable from the perspective of the single pipe. If there are multiple pipe access operations to the same pipe within a single kernel, they execute in program order from the perspective of a single work-item.
2. *Accesses to multiple pipes within a single work-item of a kernel or thread of the host program:* Different pipes are treated in the same way as non-aliased memory, in that accesses to one pipe may be reordered relative to accesses to another pipe. There is no expectation of program ordering of pipe operations across different pipes, only for a single pipe. If a happens-before relationship across pipes is required, synchronization mechanisms such as atomics or barriers must be used.
2. *Accesses to multiple pipes within a single work-item of a kernel or thread of the host program:* Different pipes are treated in the same way as non-aliased memory, in that accesses to one pipe may be reordered relative to accesses to another pipe. There is no expectation of program ordering of pipe operations across different pipes, only for a single pipe unless a memory order stronger than `memory_order_relaxed` or some other synchronization mechanism, such as a barrier, is used.
3. *Accesses to a single pipe within two work-items of the same kernel (same or different invocations of a single kernel), and/or threads of the host program:* No ordering guarantees are made on the order of pipe operations across device work-items or host threads. For example, if two work-items executing a kernel write to a pipe, there are no guarantees that the work-item with lower _id_ (for any definition of _id_) executes before the pipe write from a higher _id_. The execution order of work-items executing a kernel are not defined by SYCL, may be dynamically reordered, and may not be deterministic. If ordering guarantees are required across work-items and/or host threads, synchronization mechanisms such as atomics or barriers must be used.

=== Restrictions on pipes accessed by both kernels and the host program
Expand Down Expand Up @@ -428,8 +482,6 @@ Pipes in the context of this extension step outside the OpenCL and SYCL memory m

. There is no implicit synchronizes-with relationship between different pipes and/or with non-pipe memory in a named address space (e.g. global, local, private). Specifically, there is no implicit global or local release of side effects through a pipe access, and observation of data or control information on one pipe does not imply any knowledge through happens-before relationship with a different pipe or with memory not associated with the pipe.

. Pipe read and write operations behave as if they are SYCL relaxed atomic load and store operations. When paired with sycl::atomic_fences to establish a sychronizes-with relationship, pipe operations can provide guarantees on side effect visibility in memory, as defined by the SYCL memory model.

. At a work-group barrier, there is an implicit acquire and release of side effects for any pipes operated on within the kernel, either before or after the barrier. This occurs without an explicit memory fence being applied to or around the barrier.

. There are no guarantees on pipe operation side effect latency. Writes to a pipe will eventually be visible to read operations on the pipe, without a synchronization point, but that visibility is not guaranteed to be by the time that the next instruction is executed by a writing work-item, for example. There may be arbitrary latency between a write to a pipe and visibility of the data on a read endpoint of the pipe. Likewise, there may be arbitrary latency between a read from a pipe, and visibility at a write endpoint that there is capacity available to write to (assuming that capacity was full prior to the read).
Expand Down Expand Up @@ -613,6 +665,11 @@ Automated mechanisms are possible to provide uniquification across calls, and co
*RESOLUTION*: Not resolved. Looking for input, because this is a valid design pattern in some cases.
--

- The choice of seq_cst for the default value of the `sycl::memory_order` parameter of the read/write functions is still open for discussion. While seq_cst is more consistent with C++ atomics, it is a change from how pipes work today, which is equivalent to memory_order::relaxed. Another consideration is that SYCL 2020 atomic_ref uses a third approach where the default must be specified as a template parameter of the class itself.
--
*RESOLUTION*: Not resolved. Still under discussion.
--

. Arbitration is allowed by default (more than one read or write endpoint) within a single kernel. Should there be an additional pipe template parameter to disable arbitration, as part of the type? Downsides are that restriction as part of the type requires compiler support, since the pipe and read/write member functions are stateless, and adding additional parameters to the type increases likelihood of accidentally creating two pipes with slightly different parameterizations.
+
--
Expand Down Expand Up @@ -752,6 +809,7 @@ extension's APIs the implementation supports.
|2|2019-11-13|Michael Kinsner|Incorporate feedback
|3|2020-04-27|Michael Kinsner|Clarify that pipe operations behave as-if they are relaxed atomic operations. Make SYCL2020 the baseline
|4|2021-12-02|Shuo Niu|Add experimental latency control API
|5|2022-03-18|Peter Colberg|Add memory order parameter and compile-time properties. Add host pipe read/write functions.
|========================================

//************************************************************************
Expand Down