Skip to content

Commit

Permalink
[SYCL] Extension spec for queue index (#7520)
Browse files Browse the repository at this point in the history
Add a proposed extension specification that allows a queue to be tied to
a "queue index".
  • Loading branch information
gmlueck committed Dec 9, 2022
1 parent 1b68ffd commit f5fb759
Showing 1 changed file with 214 additions and 0 deletions.
214 changes: 214 additions & 0 deletions sycl/doc/extensions/proposed/sycl_ext_intel_queue_index.asciidoc
Original file line number Diff line number Diff line change
@@ -0,0 +1,214 @@
= sycl_ext_intel_queue_index

:source-highlighter: coderay
:coderay-linenums-mode: table

// This section needs to be after the document title.
:doctype: book
:toc2:
:toc: left
:encoding: utf-8
:lang: en
:dpcpp: pass:[DPC++]

// Set the default source code type in this document to C++,
// for syntax highlighting purposes. This is needed because
// docbook uses c++ and html5 uses cpp.
:language: {basebackend@docbook:c++:cpp}


== Notice

[%hardbreaks]
Copyright (C) 2022-2022 Intel Corporation. All rights reserved.

Khronos(R) is a registered trademark and SYCL(TM) and SPIR(TM) are trademarks
of The Khronos Group Inc. OpenCL(TM) is a trademark of Apple Inc. used by
permission by Khronos.


== Contact

To report problems with this extension, please open a new issue at:

https://github.com/intel/llvm/issues


== Dependencies

This extension is written against the SYCL 2020 revision 6 specification. All
references below to the "core SYCL specification" or to section numbers in the
SYCL specification refer to that revision.


== Status

This is a proposed extension specification, intended to gather community
feedback. Interfaces defined in this specification may not be implemented yet
or may be in a preliminary state. The specification itself may also change in
incompatible ways before it is finalized. *Shipping software products should
not rely on APIs defined in this specification.*


== Overview

Backends such as Level Zero and OpenCL expose an "index" to a device's work
submission queue, which allows the application to fine tune the way work is
submitted to a device. This extension exposes that same concept to SYCL
applications.

Most SYCL applications should not need to use this extension because the SYCL
implementation automatically selects an efficient way to submit work to a
device, including automatic selection of a queue index when necessary.
Therefore, this extension is aimed at advanced users who understand the device
hardware and think they can outperform the default implementation by specifying
an explicit queue index.

Note that this extension can be supported on any backend, even if the backend
has no notion of a "queue index". Backends that have no native support for a
queue index can report that a device has only a single available queue index.
Applications can then only request one possible queue index, and the backend
can treat this as the default behavior (i.e. the backend can ignore the index).


== Specification

=== Feature test macro

This extension provides a feature-test macro as described in the core SYCL
specification. An implementation supporting this extension must predefine the
macro `SYCL_EXT_INTEL_QUEUE_INDEX` to one of the values defined in the table
below. Applications can test for the existence of this macro to determine if
the implementation supports this feature, or applications can test the macro's
value to determine which of the extension's features the implementation
supports.

[%header,cols="1,5"]
|===
|Value
|Description

|1
|Initial version of this extension.
|===

=== New device information descriptor

This extension adds the following new device information descriptor which
allows the application to query the number of available queue indices for the
device.

```
namespace sycl::ext::intel::info::device {

struct max_compute_queue_indices;

} // namespace sycl::ext::intel::info::device
```

The return type for this information descriptor is `int`, and the value is a
positive integer telling the number queue indices that are available for the
device. These indices are numbered sequentially starting at `0`.

=== New queue property

This extension adds the following new queue property which can be specified to
the queue constructor via the `property_list` parameter.

```
namespace sycl::ext::intel::property::queue {

class compute_index {
public:
compute_index(int idx);
int get_index();
};

} // namespace sycl::ext::intel::property::queue
```

The `compute_index` property is a hint to the implementation which can affect
work submission concurrency. When two queues for the same device have
different queue indices, there is a greater chance that commands submitted to
the two queues will be concurrently submitted to the device.

It is an error to specify a queue index that is out of range for the queue's
device. The `queue` constructor throws an `exception` with `errc::invalid` if
the index is less than `0` or if the index is greater than or equal to the
value returned by `max_compute_queue_indices` for the queue's device.

The constructor and member functions of the `compute_index` property have the
following semantics.

[%header,cols="1,3"]
|===
|Function
|Description

|`compute_index(int idx)`
|Constructs a property with the given queue index.

|`int get_index()`
|Returns the queue index associated with the property.
|===


== Example usage

The following code snippet shows how to create a SYCL queue using a specific
queue index.

```
#include <sycl/sycl.hpp>

using sycl;
using sycl::ext::intel;

void foo(device d) {
int max_index = d.get_info<info::device::max_compute_queue_indices>();
int index = /* choose value between 0 and max_index-1 */;
queue q{d, property::queue::compute_index{index}};
}
```


== Behavior on Intel GPU devices

:multi-CCS: https://github.com/intel/compute-runtime/blob/master/level_zero/doc/experimental_extensions/MULTI_CCS_MODES.md
:sycl_ext_intel_cslice: https://github.com/intel/llvm/pull/7513

This non-normative section describes the behavior of the `compute_index`
property for some specific Intel GPU devices when using {dpcpp}. These details
are not part of the extension specification, and this behavior may not apply to
other devices.

On many Intel devices, there is just one available queue index, and there is
therefore no advantage to using the `compute_index` property. However, this
property can sometimes be useful when running on Data Center GPU Flex series
devices (aka ATS-M) or Data Center GPU Max series devices (aka PVC).

Some models of ATS-M support multiple queue indices with the semantics
described in the sections above. When a single process submits kernels to
different queue indices, there is a greater likelihood that the kernels will
be submitted concurrently.

PVC also supports multiple queue indices on each tile, but these queue indices
have a different semantic. In order to expose multiple queue indices on PVC,
the device driver must be configured in {multi-CCS}[multi-CCS] mode. In this
mode, the PVC root device still has just one queue index, however each "tile"
has multiple queue indices. Therefore, the application must first create
sub-devices to access each tile, and then the application can construct a queue
on these sub-devices using the `compute_index` property.

The semantics of these PVC queue indices is different, though. On PVC, each
queue index corresponds to a fixed subset of the execution units. Queues using
different indices still have a greater likelihood of submitting kernels
concurrently, but each kernel also runs on its own partition of the execution
units. Therefore, the `compute_index` property is just an alternate way to
run on a partition of the device, exactly the same as creating a "cslice"
sub-device via the {sycl_ext_intel_cslice}[sycl_ext_intel_cslice] extension.

In both the ATS-M case and the PVC case, constructing a SYCL queue with
`compute_index` causes the runtime to submit kernels exclusively to that index
on the underlying Level Zero or OpenCL driver. Without this property, the
runtime is free to distribute kernels across the available queue indices.

0 comments on commit f5fb759

Please sign in to comment.