Skip to content

[SYCL][Doc] Add SYCL_INTEL_FPGA_data_flow_pipes_properties extension #5839

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
@@ -0,0 +1,280 @@
= SYCL_INTEL_FPGA_data_flow_pipes_properties
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Adding some general comments here, so the conversation can be threaded.

  • My question about breaking API compatibility in [SYCL][Doc] Update SYCL_INTEL_data_flow_pipes extension for FPGA host pipe support #5838 is clearly related to this PR also.

  • It's not clear to me why these properties should be in a separate extension. Why not add them to the existing "sycl_ext_intel_dataflow_pipes" extension?

  • Regardless of which extension they are in, we should define the behavior when these properties are used in a kernel that is submitted to a device that is not an FPGA. I think the two options are:

    • These properties are an "optional kernel feature" as defined in section 5.7 of the core SYCL spec. In that case, submitting a kernel that uses these properties to a non-FPGA device would throw a synchronous exception at the time you submit the kernel.
    • These properties are defined to be ignored when the kernel is submitted to a non-FPGA device. This is similar to what we did for the device global properties.
  • The min_capacity property does not seem specific to FPGA at all. Wouldn't this be relevant even if we implemented pipe on other devices?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @gmlueck for the quick review!

@GarveyJoe, what are your thoughts on these and the other questions?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

API compatibility: I answered this in an email but I'll add it here for completeness: I think the solution is to keep min_capacity as a template parameter rather than pull it into the properties list. While this might not be how we would have written it if we were building this from scratch, this will allow us to avoid breaking existing code. min_capacity is sufficiently different from the other properties to warrant being separated from them. While the other properties merely control the interface of the pipes, min_capacity is a platform-agnostic property of the algorithm itself.

Separate extension: I'm not opposed to a single, combined extension.

Behaviour on non-FPGA targets: I prefer the 2nd option.

min_capacity: Greg is correct, min_capacity is different from the other properties. It can't be safely dropped as it is an algorithmic requirement needed to prevent deadlock. It must be respected on all targets.


:source-highlighter: coderay
:coderay-linenums-mode: table

// This section needs to be after the document title.
:doctype: book
:toc2:
:toc: left
:encoding: utf-8
:lang: en

:blank: pass:[ +]

// Set the default source code type in this document to C++,
// for syntax highlighting purposes. This is needed because
// docbook uses c++ and html5 uses cpp.
:language: {basebackend@docbook:c++:cpp}

// This is necessary for asciidoc, but not for asciidoctor
:cpp: C++

== Introduction
IMPORTANT: This specification is a draft.

NOTE: Khronos(R) is a registered trademark and SYCL(TM) and SPIR(TM) are
trademarks of The Khronos Group Inc. OpenCL(TM) is a trademark of Apple Inc.
used by permission by Khronos.

NOTE: This document is better viewed when rendered as html with asciidoctor.
GitHub does not render image icons.

This document describes an extension that adds compile-time constant properties
and static member functions to pipes.

== Notice

Copyright (c) 2022 Intel Corporation. All rights reserved.

== Status

Working Draft

This is a preview extension specification, intended to provide early access to
a feature for review and community feedback. When the feature matures, this
specification may be released as a formal extension.

Because the interfaces defined by this specification are not final and are
subject to change they are not intended to be used by shipping software
products.

== Version

Built On: {docdate} +
Revision: A

== Contact

Peter Colberg, Intel (peter 'dot' colberg 'at' intel 'dot' com)

== Contributors

Bo Lei, Intel +
Marco Jacques, Intel +
Joe Garvey, Intel +
Aditi Kumaraswamy, Intel +
Robert Ho, Intel +
Sherry Yuan, Intel +
Peter Colberg, Intel

== Dependencies

This extension is written against the SYCL 2020 specification, revision 3.

It also depends on the `SYCL_INTEL_data_flow_pipes` and
`sycl_ext_oneapi_properties` extensions.

== Overview

This extension introduces properties that establish differences in the
implementation of `sycl::pipe`. These properties are FPGA specific. An example
of the syntax can be seen below.

[source,c++]
----
using pipe = pipe<class some_pipe, int, decltype(properties{uses_ready<true>})>;
----

== Feature test macro

This extension provides a feature-test macro as described in the core SYCL
specification section 6.3.3 "Feature test macros". Therefore, an implementation
supporting this extension must predefine the macro
`SYCL_EXT_INTEL_FPGA_PIPE_PROPERTIES` to one of the values defined in the table
below. Applications can test for the existence of this macro to determine if
the implementation supports this feature, or applications can test the macro's
value to determine which of the extension's APIs the implementation supports.

[%header,cols="1,5"]
|===
|Value |Description
|1 |Initial extension version. Base features are supported.
|===

=== Pipe properties

Below is a list of compile-time-constant properties which `pipe` supports.

```c++
namespace sycl::ext::intel::experimental {

struct min_capacity {
template<int Capacity>
using value_t = property_value<min_capacity, std::integral_constant<int, Capacity>>;
};

struct ready_latency {
template<int Latency>
using value_t = property_value<ready_latency, std::integral_constant<int, Latency>>;
};

struct bits_per_symbol {
template<int Bits>
using value_t = property_value<bits_per_symbol, std::integral_constant<int, Bits>>;
};

struct uses_valid {
template<bool Valid>
using value_t = property_value<uses_valid, std::integral_constant<bool, Valid>>;
};

struct uses_ready {
template<bool Ready>
using value_t = property_value<uses_ready, std::integral_constant<bool, Ready>>;
};

struct in_csr {
template<bool Csr>
using value_t = property_value<in_csr, std::integral_constant<bool, Csr>>;
};

Comment on lines +137 to +141
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we potentially use the same in_csr property defined in device global? Given the behaviour will be similar

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wouldn't that mean that the device_global extension needs to be enabled to use this property?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we could define the property in both extensions, so you could use it with either one. You would not need to enable the device global extension just to use the property.

Note that there is a proposal to rename the "implement_in_csr" property like this:

enum class hw_interface_enum : /* unspecified */ {
  csr,
  dedicated
};

struct hw_interface_key {
  template<hw_interface_enum Interface>
  using value_t =
      property_value<hw_interface_key, std::integral_constant<int, Interface>>;
};

template <hw_interface_enum Interface>
inline constexpr hw_interface_key::value_t<Interface> hw_interface;
inline constexpr hw_interface_key::value_t<hw_interface_enum::csr>
    hw_interface_csr;
inline constexpr hw_interface_key::value_t<hw_interface_enum::dedicated>
    hw_interface_dedicated;

Does that new spelling still make sense for pipes?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it makes sense, and it also makes this property more extensible.

Copy link
Contributor

@rho180 rho180 Mar 22, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think "dedicated" isn't quite appropriate for host pipes. We can and will share a single hardware pipe among multiple logical pipes.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nothing comes to mind immediately (but that doesn't mean there isn't one that makes sense for both). @GarveyJoe do you have any ideas?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Adding myself

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"conduit" or "streaming" could both work. Alternatively, we could go back to it being a bool.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I could see either of those for pipe, but it's less clear to me they're descriptive of device_global (@artemrad ?) For bool, how confident are we won't eventually want another option?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should in_csr be a separate protocol? It isn't an avalon-st interface anymore. It's closer to an avalon-mm interface. Some of the protocol-dependent properties don't even make sense for an "in_csr" pipe such as ready_latency.

struct first_symbol_in_high_order_bits {
template<bool HighOrder>
using value_t = property_value<first_symbol_in_high_order_bits, std::integral_constant<bool, HighOrder>>;
};

struct protocol {
enum class protocol_name {
avalon,
};

template<protocol_name Protocol>
using value_t = property_value<protocol, std::integral_constant<protocol_name, Protocol>>;
};

} // namespace sycl::ext::intel::experimental
```

--
[options="header"]
|====
| Property | Description
|`min_capacity`
| Valid values: Non-negative integer value.

Default value: 0

User defined minimum number of words in units of data type size that the pipe
must be able to store without any being read out. A minimum capacity is required
in some algorithms to avoid deadlock, or for performance tuning. An
implementation can include more capacity than this parameter, but not less.

This property is not guaranteed to be respected if the pipe is an inter-kernel
pipe. The compiler is allowed to optimize the pipe if both sides are visible.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't understand how this can be ignored for inter-kernel pipes. As you say, the algorithm may depend on a certain minimum capacity.

In any event, isn't it always assumed that a compiler can optimize away code if it can prove that doing so has no effect? I don't think we need to say that in the spec if that's all we mean.



|`ready_latency`
| Valid values: Non-negative integer value.

Default value: 0

The number of cycles between when the ready signal is deasserted and when the
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a looser definition than that in the avalon streaming specification. Specifically, the avalon spec also mandates that the upstream source cannot assert valid until ready_latency cycles after the downstream source has asserted ready. Is this intended? If we intend to mimic the avalon spec exactly should we change this to: "The ready_latency property of the stream as defined in the Avalon-RT specification"?

pipe can no longer accept new inputs.

This property is not guaranteed to be respected if the pipe is an inter-kernel
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we remove min_capacity from the list of properties than this applies to all properties. Perhaps we should factor it out of the table.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Regardless of whether or not we factor it out, I think we need to tweak this wording to make it clearer when the properties can be ignored. We could say something like:

"On FPGA targets this property provides control over the hardware interface built for the pipe at the boundary of a device image. If both endpoints of this pipe reside in the same device_image then this property is meaningless and can be ignored. On non-FPGA targets this property is meaningless and can be ignored. "

pipe. The compiler is allowed to optimize the pipe if both sides are visible.

|`bits_per_symbol`
| Valid values: A positive integer value that evenly divides by the data type size.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't this say:

Suggested change
| Valid values: A positive integer value that evenly divides by the data type size.
| Valid values: A positive integer value that evenly divides the data type size.

My understanding is that you can have multiple symbols per element in the pipe but not multiple pipe elements per symbol. The most common case is that the bits_per_symbol is 8, i.e. one symbol is one byte.


Default value: Datatype size

Describes how the data is broken into symbols on the data bus.

Data is broken down according to how you set the first_symbol_in_high_order_bits
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we need to mention first_symbol_in_high_order_bits here. It already has its own definition further down and mentioning it here doesn't make the definition of bits_per_symbol any more clear. If you extended this with an example that showed how the two properties interacted that might be helpful but, as is, this sentence contributes nothing.

property. By default, data is broken down in little endian order.

This property is not guaranteed to be respected if the pipe is an inter-kernel
pipe. The compiler is allowed to optimize the pipe if both sides are visible.

|`uses_valid`
| Valid values: true or false

Default value: true

Controls whether a valid signal is present on the pipe interface. If false, the
upstream source must provide valid data on every cycle that ready is asserted.

This is equivalent to changing the pipe read calls to tryRead and assuming that
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's no API named "tryRead"; the non-blocking version is also named "read". It merely has different function arguments. Also, the argument is named "success_code" not "success". Instead, let's say:

"This is equivalent to changing the pipe read calls to non-blocking reads and assuming that success_code is always true."

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The current API is unclear and confusing.
I think try_read and try_write should be introduced, as suggested by #832 and the cited PR comments there.
But this is a long story, waiting for 3 years already. We have not implemented it on AMD FPGA yet.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@keryell, for simplicity I don't want to combine this change with a possible change to the non-blocking API, but let's discuss this further in #832.

success is always true.

If set to false, min_capacity and ready_latency must be 0.

This property is not guaranteed to be respected if the pipe is an inter-kernel
pipe. The compiler is allowed to optimize the pipe if both sides are visible.

|`uses_ready`
| Valid values: true or false

Default value: true

Controls whether a ready signal is present. If false, the downstream sink must
be able to accept data on every cycle that valid is asserted. This is
equivalent to changing the pipe read calls to tryWrite and assuming that success
Copy link
Contributor

@GarveyJoe GarveyJoe Apr 1, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
equivalent to changing the pipe read calls to tryWrite and assuming that success
equivalent to changing the pipe write calls to non-blocking writes and assuming that success_code

is always true.

If set to false, ready_latency must be 0.

This property is not guaranteed to be respected if the pipe is an inter-kernel
pipe. The compiler is allowed to optimize the pipe if both sides are visible.

|`in_csr`
| Valid Values: true or false

Default Value: false

Controls whether the host pipe is implemented using the Control and Status register (CSR).

This property is not guaranteed to be respected if the pipe is an inter-kernel
pipe. The compiler is allowed to optimize the pipe if both sides are visible.

|`first_symbol_in_high_order_bits`
| Valid values: true or false

Default value: false

Specifies whether the data symbols in the pipe are in big-endian
Copy link
Contributor

@GarveyJoe GarveyJoe Apr 1, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

By saying "the data symbols in the pipe" it sounds like we're referring to how symbols are ordered across the depth of the pipe, which is certainly not our intention. We should make it clear that this only refers to the ordering of symbols within a given word/transaction. Some alternatives are:

... the data symbols in each word written to/read from the pipe are ...
... the data symbols in each transaction are ...
... the data symbols in the data channel of the pipe are ...
The first_symbol_in_high_order_bits property of the stream as defined in the Avalon-ST specification.

order.

This property is not guaranteed to be respected if the pipe is an inter-kernel
pipe. The compiler is allowed to optimize the pipe if both sides are visible.

|`protocol`
| Specifies the protocol for the pipe interface.
Currently, the only protocol supported is `avalon`.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems strange to me to have a property with only one legal value that is also the default. I understand we're trying to leave the door open for other protocols in the future, but if that happens we could add the property at that time.

It also seems weird to present protocol as its own independent property that is orthogonal to the other properties when a bunch of the other properties (bits_per_symbol, first_symbol_in_high_order_bits and ready_latency) are properties of the avalon ST spec and don't necessarily apply to other interface protocols. For example, AXI doesn't seem to have an endianness property so first_symbol_in_high_order_bits doesn't make sense with that protocol. Not to mention "first_symbol_in_high_order_bits" is the name pulled directly from the avalon spec and if we were trying to be generic we would call it "endianness" instead.

I see four possible solutions:

  1. Drop the protocol property completely and accept that for now all our properties are avalon-centric. This is the least work now but might require the most refactoring in the future to support other protocols.
  2. Make the protocol-dependent properties part of the protocol property. This way different protocols can have different sub-properties as determined by their various protocol specifications.
  3. Try to come up with generic names for every property that will apply to any possible future protocol. I think this is doomed to failure (see the endianness AXI interfaction).
  4. Keep separate properties like we have in this PR but add extra restrictions to the protocol-dependent properties that say: "this property is only valid if specified in conjunction with the protocol property"

Other protocols may be supported in the future.


The default protocol is `avalon`.
|====
--

== Revision History

[cols="5,15,15,70"]
[grid="rows"]
[options="header"]
|========================================
|Rev|Date|Author|Changes
|1|2022-03-18|Peter Colberg|*Initial public working draft*
|========================================

//************************************************************************
//Other formatting suggestions:
//
//* Use *bold* text for host APIs, or [source] syntax highlighting.
//* Use +mono+ text for device APIs, or [source] syntax highlighting.
//* Use +mono+ text for extension names, types, or enum values.
//* Use _italics_ for parameters.
//************************************************************************