From 3b142916d2c4d64a728c34e82d6c0962c67231c6 Mon Sep 17 00:00:00 2001 From: tapplencourt Date: Fri, 10 Nov 2023 17:00:44 -0600 Subject: [PATCH] Run reflow --- adoc/chapters/acknowledgements.adoc | 2 +- adoc/chapters/architecture.adoc | 2086 ++++--- adoc/chapters/copyright-spec.adoc | 103 +- adoc/chapters/device_compiler.adoc | 574 +- adoc/chapters/extensions.adoc | 225 +- adoc/chapters/feature_sets.adoc | 44 +- adoc/chapters/glossary.adoc | 530 +- adoc/chapters/host_backend.adoc | 64 +- adoc/chapters/information_descriptors.adoc | 10 +- adoc/chapters/introduction.adoc | 245 +- adoc/chapters/opencl_backend.adoc | 608 +- adoc/chapters/programming_interface.adoc | 6421 ++++++++++---------- adoc/chapters/references.adoc | 8 +- adoc/chapters/what_changed.adoc | 526 +- 14 files changed, 5857 insertions(+), 5589 deletions(-) diff --git a/adoc/chapters/acknowledgements.adoc b/adoc/chapters/acknowledgements.adoc index f84b0fd9..4749aff8 100644 --- a/adoc/chapters/acknowledgements.adoc +++ b/adoc/chapters/acknowledgements.adoc @@ -67,7 +67,7 @@ * Jon Leech, Luna Princeps LLC * Kathleen Mattson, Miller & Mattson, LLC * Dave Miller, Miller & Mattson, LLC - * Stéphanie Even, Mercedes-Benz Research and Development NA + * Stéphanie Even, Mercedes-Benz Research and Development NA * Chris Gearing, Mobileye * Seiji Nishimura, NSITEXE, Inc. * Neil Trevett, NVIDIA diff --git a/adoc/chapters/architecture.adoc b/adoc/chapters/architecture.adoc index 12b80e82..b2165bbf 100644 --- a/adoc/chapters/architecture.adoc +++ b/adoc/chapters/architecture.adoc @@ -3,24 +3,26 @@ [[architecture]] = SYCL architecture -This chapter describes the structure of a SYCL application, and how the -SYCL generic programming model lays out on top of a number of <>s. +This chapter describes the structure of a SYCL application, and how the SYCL +generic programming model lays out on top of a number of <>s. == Overview -SYCL is an open industry standard for programming a heterogeneous system. The -design of SYCL allows standard {cpp} source code to be written such that it can -run on either an heterogeneous device or on the <>. +SYCL is an open industry standard for programming a heterogeneous system. +The design of SYCL allows standard {cpp} source code to be written such that it +can run on either an heterogeneous device or on the <>. The terminology used for SYCL inherits historically from OpenCL with some -SYCL-specific additions. However SYCL is a generic {cpp} programming model -that can be laid out on top of other heterogeneous APIs apart from OpenCL. +SYCL-specific additions. +However SYCL is a generic {cpp} programming model that can be laid out on top of +other heterogeneous APIs apart from OpenCL. SYCL implementations can provide <>s for various heterogeneous APIs, -implementing the SYCL general specification on top of them. We refer to this -heterogeneous API as the <>. The SYCL general specification -defines the behavior that all SYCL implementations must expose to SYCL users -for a SYCL application to behave as expected. +implementing the SYCL general specification on top of them. +We refer to this heterogeneous API as the <>. +The SYCL general specification defines the behavior that all SYCL +implementations must expose to SYCL users for a SYCL application to behave as +expected. A function object that can execute on a <> exposed by a <> is called a <>. @@ -29,36 +31,38 @@ To ensure maximum interoperability with different <>s, software developers can access the <> alongside the SYCL general API whenever they include the <> interoperability headers. However, interoperability is a <>-specific feature. -An application that uses interoperability does not conform to the -SYCL general application model, since it is not portable across backends. +An application that uses interoperability does not conform to the SYCL general +application model, since it is not portable across backends. // Note below I leave the reference to OpenCL intentionally The target users of SYCL are {cpp} programmers who want all the performance and portability features of a standard like OpenCL, but with the flexibility to use higher-level {cpp} abstractions across the host/device code boundary. -Developers can use most of the abstraction features of {cpp}, such as -templates, classes and operator overloading. - -However, some {cpp} language features are not permitted inside -kernels, due to the limitations imposed by the capabilities of the underlying -heterogeneous platforms. -These features include virtual functions, virtual inheritance, -throwing/catching exceptions, and run-time type-information. These features are -available outside kernels as normal. Within these constraints, developers can -use abstractions defined by SYCL, or they can develop their own on top. These -capabilities make SYCL ideal for library developers, middleware providers and -application developers who want to separate low-level highly-tuned algorithms -or data structures that work on heterogeneous systems from higher-level software -development. Software developers can produce templated algorithms that are easily -usable by developers in other fields. +Developers can use most of the abstraction features of {cpp}, such as templates, +classes and operator overloading. + +However, some {cpp} language features are not permitted inside kernels, due to +the limitations imposed by the capabilities of the underlying heterogeneous +platforms. +These features include virtual functions, virtual inheritance, throwing/catching +exceptions, and run-time type-information. +These features are available outside kernels as normal. +Within these constraints, developers can use abstractions defined by SYCL, or +they can develop their own on top. +These capabilities make SYCL ideal for library developers, middleware providers +and application developers who want to separate low-level highly-tuned +algorithms or data structures that work on heterogeneous systems from +higher-level software development. +Software developers can produce templated algorithms that are easily usable by +developers in other fields. [[sec:anatomy]] == Anatomy of a SYCL application -Below is an example of a typical <> which schedules a job to run -in parallel on any heterogeneous device available. +Below is an example of a typical <> which schedules a job to +run in parallel on any heterogeneous device available. // An AsciiDoctor "feature", the language is specified as the second // parameter of this attribute, even if we do not want it. So add a @@ -69,70 +73,68 @@ in parallel on any heterogeneous device available. include::{code_dir}/anatomy.cpp[lines=4..-1] ---- -At line 1, we [code]#{hash}include# the SYCL header files, which -provide all of the SYCL features that will be used. +At line 1, we [code]#{hash}include# the SYCL header files, which provide all of +the SYCL features that will be used. A SYCL application runs on a <>. -The application is structured in three scopes which specify the different sections; -<>, <> and <>. -The <> specifies a single kernel function that will -be, or has been, compiled by a <> and executed on a -<>. In this example <> is defined by lines -25 to 26. The <> specifies a unit of work which is -comprised of a <> and <>. In this -example <> is defined by lines 20 to 28. The -<> specifies all other code outside of a +The application is structured in three scopes which specify the different +sections; <>, <> and <>. +The <> specifies a single kernel function that will be, or has +been, compiled by a <> and executed on a <>. +In this example <> is defined by lines 25 to 26. +The <> specifies a unit of work which is comprised of a +<> and <>. +In this example <> is defined by lines 20 to 28. +The <> specifies all other code outside of a <>. -These three scopes are used to control the application flow and the -construction and lifetimes of the various objects used within SYCL, as -explained in <>. - -A <> is the scoped block of code that will be -compiled using a device compiler. This code may be defined by the -body of a lambda function or by the [code]#operator()# function of -a function object. Each instance of the -<> will be executed as a single, though not -necessarily entirely independent, flow of execution and has to adhere -to restrictions on what operations may be allowed to enable device -compilers to safely compile it to a range of underlying devices. +These three scopes are used to control the application flow and the construction +and lifetimes of the various objects used within SYCL, as explained in +<>. + +A <> is the scoped block of code that will be compiled +using a device compiler. +This code may be defined by the body of a lambda function or by the +[code]#operator()# function of a function object. +Each instance of the <> will be executed as a single, +though not necessarily entirely independent, flow of execution and has to adhere +to restrictions on what operations may be allowed to enable device compilers to +safely compile it to a range of underlying devices. The [code]#parallel_for# member function can be templated with a class. -This class is used to manually name the -kernel when desired, such as to avoid a compiler-generated name when debugging -a kernel defined through a lambda, to provide a known name with which to apply -build options to a kernel, or to ensure compatibility with multiple -compiler-pass implementations. +This class is used to manually name the kernel when desired, such as to avoid a +compiler-generated name when debugging a kernel defined through a lambda, to +provide a known name with which to apply build options to a kernel, or to ensure +compatibility with multiple compiler-pass implementations. The [code]#parallel_for# member function creates an instance of a <>, -which is the entity that will be enqueued within a -command group. In the case of [code]#parallel_for# the -<> will be executed over the given range from 0 to 1023. -The different member functions to -execute kernels can be found in <>. - -A <> is the syntactic scope wrapped by the construction -of a <> as seen on line 19. The -<> may invoke only a single +which is the entity that will be enqueued within a command group. +In the case of [code]#parallel_for# the <> will be +executed over the given range from 0 to 1023. +The different member functions to execute kernels can be found in +<>. + +A <> is the syntactic scope wrapped by the construction of +a <> as seen on line 19. +The <> may invoke only a single <>, and it takes a parameter of type command group [code]#handler#, which is constructed by the runtime. -All the requirements for a kernel to execute are -defined in this <>, as described in -<>. In this case the constructor used -for [code]#myQueue# on line 9 is the default constructor, which allows -the queue to select the best underlying device to execute on, leaving the -decision up to the runtime. - -In SYCL, data that is required within a <> must -be contained within a <>, <>, or <> allocation, as described in -<>. We -construct a buffer on line 16. Access to the <> is controlled via -an <> which is constructed on line 21. -The <> is used to -keep track of access to the data and the <> is used to request -access to the data on a queue, as well as to track the dependencies between -<>. In this example the <> is used to -write to the data buffer on line 26. +All the requirements for a kernel to execute are defined in this +<>, as described in <>. +In this case the constructor used for [code]#myQueue# on line 9 is the default +constructor, which allows the queue to select the best underlying device to +execute on, leaving the decision up to the runtime. + +In SYCL, data that is required within a <> must be +contained within a <>, <>, or <> allocation, as described in +<>. +We construct a buffer on line 16. +Access to the <> is controlled via an <> which is constructed +on line 21. +The <> is used to keep track of access to the data and the <> +is used to request access to the data on a queue, as well as to track the +dependencies between <>. +In this example the <> is used to write to the data buffer on line 26. [[sec:normativerefs]] @@ -146,122 +148,135 @@ The documents in the following list are referred to within this SYCL specification, and their content is a requirement for this document. . *{cpp17}:* <>, referred to in this - specification as the {cpp} core language. The SYCL specification refers to - language in the following {cpp} defect reports and assumes a compiler that - implements them: <>. + specification as the {cpp} core language. + The SYCL specification refers to language in the following {cpp} defect + reports and assumes a compiler that implements them: <>. . *{cpp20}:* <>, referred to in this specification as the next {cpp} specification. +Programming languages — {cpp}>>, referred to in this specification as the next +{cpp} specification. [[sec:nonnormativerefs]] == Non-normative notes and examples -Unless stated otherwise, text within this SYCL specification is normative and defines -the required behavior of a SYCL implementation. Non-normative / informational notes -are included within this specification using a "`note`" callout, of the form: +Unless stated otherwise, text within this SYCL specification is normative and +defines the required behavior of a SYCL implementation. +Non-normative / informational notes are included within this specification using +a "`note`" callout, of the form: [NOTE] ==== -Information within a note callout, such as this text, is for informational purposes -and does not impose requirements on or specify behavior of a SYCL implementation. +Information within a note callout, such as this text, is for informational +purposes and does not impose requirements on or specify behavior of a SYCL +implementation. ==== -Source code examples within the specification are provided to aid with understanding, -and are non-normative. +Source code examples within the specification are provided to aid with +understanding, and are non-normative. -In case of any conflict between a non-normative note or source example, and normative -text within the specification, the normative text must be taken to be correct. +In case of any conflict between a non-normative note or source example, and +normative text within the specification, the normative text must be taken to be +correct. [[sec:platformmodel]] == The SYCL platform model -The SYCL platform model consists of a host connected to one or more heterogeneous devices, -called <>. <> are grouped together into one or multiple <>. -An implementation may also expose empty <> that do not contain any <>. +The SYCL platform model consists of a host connected to one or more +heterogeneous devices, called <>. +<> are grouped together into one or multiple <>. +An implementation may also expose empty <> that do not +contain any <>. A SYCL <> is constructed, either directly by the user or implicitly -when creating a <>, to hold all the runtime information required by -the SYCL runtime and the <> to operate on a device, or group of devices. +when creating a <>, to hold all the runtime information required by the +SYCL runtime and the <> to operate on a device, or group of devices. When a group of devices can be grouped together on the same context, they have -some visibility of each other's memory objects. The SYCL runtime can assume that memory -is visible across all devices in the same <>. -Not all devices exposed from the same <> can be grouped together -in the same <>. +some visibility of each other's memory objects. +The SYCL runtime can assume that memory is visible across all devices in the +same <>. +Not all devices exposed from the same <> can be grouped together in +the same <>. A SYCL application executes on the host as a standard {cpp} program. -<> are exposed through different <> to the SYCL application. -The SYCL application submits <> to <>. +<> are exposed through different <> to +the SYCL application. +The SYCL application submits <> to <>. Each <> enables execution on a given device. The <> then extracts operations from the <>, e.g. an explicit copy operation or a -<>. When the operation is a -<>, the <> uses a +<>. +When the operation is a <>, the <> uses a <>-specific mechanism to extract the device binary from the SYCL application and pass it to the heterogeneous API for execution on the <>. -A SYCL <> is divided into one or more compute units (CUs) which are each divided -into one or more processing elements (PEs). Computations on a device occur -within the processing elements. +A SYCL <> is divided into one or more compute units (CUs) which are each +divided into one or more processing elements (PEs). +Computations on a device occur within the processing elements. How computation is mapped to PEs is <> and <> specific. -Two devices exposed via two different backends can map computations differently to the -same device. +Two devices exposed via two different backends can map computations differently +to the same device. When a SYCL application contains <> objects, the SYCL implementation must provide an offline compilation mechanism that enables the integration of the device binaries into the SYCL application. -The output of the offline compiler can be an intermediate representation, such as -SPIR-V, that will be finalized during execution or a final device ISA. +The output of the offline compiler can be an intermediate representation, such +as SPIR-V, that will be finalized during execution or a final device ISA. A device may expose special purpose functionality as a _built-in_ function. The SYCL API exposes functions to query and dispatch said _built-in_ functions. -Some <> and <> may not support programmable kernels, and only support -_built-in_ functions. +Some <> and <> may not support +programmable kernels, and only support _built-in_ functions. // TODO: Conformance of these custom-devices? == The SYCL backend model -SYCL is a generic programming model for the {cpp} language that can target multiple -heterogeneous APIs, such as OpenCL. +SYCL is a generic programming model for the {cpp} language that can target +multiple heterogeneous APIs, such as OpenCL. -SYCL implementations enable these target APIs by implementing <>. +SYCL implementations enable these target APIs by implementing <>. For a SYCL implementation to be conformant on said <>, it must execute -the SYCL generic programming model on the backend. All SYCL implementations must -provide at least one backend. +the SYCL generic programming model on the backend. +All SYCL implementations must provide at least one backend. -The present document covers the SYCL generic interface available to -all <>. How the SYCL generic interface maps to a particular -<> is defined either by a separate <> specification -document, provided by the Khronos SYCL group, or by the SYCL -implementation documentation. Whenever there is a <> -specification document, this takes precedence over SYCL implementation -documentation. +The present document covers the SYCL generic interface available to all +<>. +How the SYCL generic interface maps to a particular <> is defined +either by a separate <> specification document, provided by the Khronos +SYCL group, or by the SYCL implementation documentation. +Whenever there is a <> specification document, this takes precedence +over SYCL implementation documentation. When a SYCL user builds their SYCL application, she decides which of the -<> will be used to build the SYCL application. This is called the set -of _active backends_. Implementations must ensure that the active -backends selected by the user can be used simultaneously by the SYCL -implementation at runtime. If two backends are available at compile time but -will produce an invalid SYCL application at runtime, the SYCL implementation -must emit a compilation error. +<> will be used to build the SYCL application. +This is called the set of _active backends_. +Implementations must ensure that the active backends selected by the user can be +used simultaneously by the SYCL implementation at runtime. +If two backends are available at compile time but will produce an invalid SYCL +application at runtime, the SYCL implementation must emit a compilation error. A SYCL application built with a number of active backends does not necessarily guarantee that said backends can be executed at runtime. -The subset of active backends available at runtime is called -_available backends_. -A backend is said to be _available_ if the host platform where the -SYCL application is executed exposes support for the heterogeneous API -required for the <>. +The subset of active backends available at runtime is called _available +backends_. +A backend is said to be _available_ if the host platform where the SYCL +application is executed exposes support for the heterogeneous API required for +the <>. It is implementation dependent whether certain backends require third-party -libraries to be available in the system. Failure to have all dependencies -required for all active backends at runtime will cause the SYCL application to -not run. +libraries to be available in the system. +Failure to have all dependencies required for all active backends at runtime +will cause the SYCL application to not run. -Once the application is running, users can query what SYCL platforms are available. +Once the application is running, users can query what SYCL platforms are +available. SYCL implementations will expose the devices provided by each backend grouped -into platforms. A backend must expose at least one platform. +into platforms. +A backend must expose at least one platform. Under the <> model, SYCL objects can contain one or multiple references to a certain <> native type. @@ -270,95 +285,99 @@ The mapping of SYCL objects to <> native types is defined by the <> specification document when available, or by the SYCL implementation otherwise. -To guarantee that multiple <> objects can interoperate with -each other, SYCL memory objects are not bound to a particular <>. -SYCL memory objects can be accessed from any device exposed by an -_available_ backend. -SYCL Implementations can potentially map SYCL memory objects to -multiple native types in different <>. +To guarantee that multiple <> objects can interoperate with each other, +SYCL memory objects are not bound to a particular <>. +SYCL memory objects can be accessed from any device exposed by an _available_ +backend. +SYCL Implementations can potentially map SYCL memory objects to multiple native +types in different <>. -Since SYCL memory objects are independent of any particular <>, -SYCL <> can request access to memory objects allocated +Since SYCL memory objects are independent of any particular <>, SYCL +<> can request access to memory objects allocated by any <>, and execute it on the backend associated with the <>. This requires the SYCL implementation to be able to transfer memory objects across <>. -USM allocations are subject to the limitations -described in <>. +USM allocations are subject to the limitations described in <>. -When a SYCL application runs on any number of <> without relying on -any <>-specific behavior or interoperability, it is said to be a -SYCL general application, and it is expected to run in any SYCL-conformant +When a SYCL application runs on any number of <> without +relying on any <>-specific behavior or interoperability, it is said to +be a SYCL general application, and it is expected to run in any SYCL-conformant implementation that supports the required features for the application. === Platform mixed version support -The SYCL generic programming model exposes a number of <>, each of -them either empty or exposing a number of <>. Each <> is bound -to a certain <>. SYCL <> associated with said <> -are associated with that <>. +The SYCL generic programming model exposes a number of <>, +each of them either empty or exposing a number of <>. +Each <> is bound to a certain <>. +SYCL <> associated with said <> are associated with +that <>. -Although the APIs in the SYCL generic programming model are defined according -to this specification and their version is indicated by the macro +Although the APIs in the SYCL generic programming model are defined according to +this specification and their version is indicated by the macro [code]#SYCL_LANGUAGE_VERSION#, this does not apply to APIs exposed by the -<>. Each <> provides its own document that defines its APIs, -and that document tells how to query for the device and platform versions. +<>. +Each <> provides its own document that defines its APIs, and that +document tells how to query for the device and platform versions. == SYCL execution model -As described in <>, a <> is comprised -of three scopes: <>, <>, and -<>. Code in the <> and -<> runs on the host and is governed by the -_SYCL application execution model_. Code in the kernel scope runs on a -device and is governed by the _SYCL kernel execution model_. +As described in <>, a <> is comprised of three +scopes: <>, <>, and <>. +Code in the <> and <> runs on the host +and is governed by the _SYCL application execution model_. +Code in the kernel scope runs on a device and is governed by the _SYCL kernel +execution model_. [NOTE] ==== A SYCL device does not necessarily correspond to a physical accelerator. -A SYCL implementation may choose to expose some or all of the host's -resources as a SYCL device; such an implementation would execute -code in <> on the host, but that code would still be governed by -the _SYCL kernel execution model_. +A SYCL implementation may choose to expose some or all of the host's resources +as a SYCL device; such an implementation would execute code in <> +on the host, but that code would still be governed by the _SYCL kernel execution +model_. ==== [[sec:executionmodel]] === SYCL application execution model -The SYCL application defines the execution order of the kernels by grouping -each kernel with its requirements into a <>. +The SYCL application defines the execution order of the kernels by grouping each +kernel with its requirements into a <>. <> are submitted for execution via a <> object, which defines the device where the kernel -will run. This specification sometimes refers to this as "`submitting the -kernel to a device`". The same <> object can be submitted to -different queues. When a <> is submitted to a SYCL <>, -the requirements of the kernel execution are captured. The implementation can -start executing a kernel as soon as its requirements have been satisfied. +will run. +This specification sometimes refers to this as "`submitting the kernel to a +device`". +The same <> object can be submitted to different queues. +When a <> is submitted to a SYCL <>, the requirements of +the kernel execution are captured. +The implementation can start executing a kernel as soon as its requirements have +been satisfied. ==== Backend resources managed by the SYCL application -The SYCL runtime integrated with the SYCL application will manage -the resources required by the <> -to manage the heterogeneous devices it is providing access to. -This includes, but is not limited to, resource handlers, memory pools, -dispatch queues and other temporary handler objects. +The SYCL runtime integrated with the SYCL application will manage the resources +required by the <> to manage the heterogeneous devices it is +providing access to. +This includes, but is not limited to, resource handlers, memory pools, dispatch +queues and other temporary handler objects. -The SYCL programming interface represents the lifetime of the resources -managed by the SYCL application using RAII rules. +The SYCL programming interface represents the lifetime of the resources managed +by the SYCL application using RAII rules. Construction of a SYCL object will typically entail the creation of multiple -<> objects, which will be properly released on destruction of said -SYCL object. +<> objects, which will be properly released on destruction of said SYCL +object. The overall rules for construction and destruction are detailed in <>. -Those <> with a <> document will detail how the resource -management from SYCL objects map down to the <> objects. +Those <> with a <> document will detail how the +resource management from SYCL objects map down to the <> objects. -In SYCL, the minimum required object for submitting work to devices is -the <>, which contains references to a <>, <> -and a <> internally. +In SYCL, the minimum required object for submitting work to devices is the +<>, which contains references to a <>, <> and a +<> internally. The resources managed by SYCL are: @@ -369,50 +388,54 @@ The resources managed by SYCL are: // from changes in the programming . <>: all features of <>s are implemented by - platforms. A platform can be viewed as a given vendor's runtime and the - devices accessible through it. Some devices will only be accessible to - one vendor's runtime and hence multiple platforms may be present. SYCL manages - the different platforms for the user which are accessible through a - [code]#sycl::platform# object. In some cases, an implementation might also - choose to expose empty [code]#sycl::platform# objects, for example if - a vendor's runtime is available, but no devices supported by that runtime are - available in the system. - . <>: any <> resource that is acquired by the user is - attached to a context. A context contains a collection of devices that - the host can use and manages memory objects that can be shared between - the devices. Devices belonging to the same <> must be able to - access each other's global memory using some implementation-specific - mechanism. A given context can only wrap devices owned by a single - platform. A context is exposed to the user with a - [code]#sycl::context# object. + platforms. + A platform can be viewed as a given vendor's runtime and the devices + accessible through it. + Some devices will only be accessible to one vendor's runtime and hence + multiple platforms may be present. + SYCL manages the different platforms for the user which are accessible + through a [code]#sycl::platform# object. + In some cases, an implementation might also choose to expose empty + [code]#sycl::platform# objects, for example if a vendor's runtime is + available, but no devices supported by that runtime are available in the + system. + . <>: any <> resource that is acquired by the user + is attached to a context. + A context contains a collection of devices that the host can use and manages + memory objects that can be shared between the devices. + Devices belonging to the same <> must be able to access each + other's global memory using some implementation-specific mechanism. + A given context can only wrap devices owned by a single platform. + A context is exposed to the user with a [code]#sycl::context# object. . <>: platforms may provide devices for executing SYCL - kernels. In SYCL, a device is accessible through a - [code]#sycl::device# object. + kernels. + In SYCL, a device is accessible through a [code]#sycl::device# object. . <>: the SYCL functions that run on SYCL devices are defined as {cpp} function objects (a named function object type or a lambda - function). A kernel can be introspected through a - [code]#sycl::kernel# object. + function). + A kernel can be introspected through a [code]#sycl::kernel# object. + -- -Note that some <> may expose non-programmable functionality as -pre-defined kernels. +Note that some <> may expose non-programmable +functionality as pre-defined kernels. -- . <>: Kernels are stored internally in the SYCL application as device images, and these device images can be grouped into a - [code]#sycl::kernel_bundle# object. These objects provide a way for the - application to control the online compilation of kernels for devices. - . <>: SYCL kernels execute in command queues. The user must - create a [code]#sycl::queue# object, - which references an associated context, platform and - device. The context, platform and device may be chosen automatically, or - specified by the user. + [code]#sycl::kernel_bundle# object. + These objects provide a way for the application to control the online + compilation of kernels for devices. + . <>: SYCL kernels execute in command queues. + The user must create a [code]#sycl::queue# object, which references an + associated context, platform and device. + The context, platform and device may be chosen automatically, or specified + by the user. SYCL queues execute <> on a particular device of a particular context, but can have dependencies from any device on any available <>. -The SYCL implementation guarantees the correct initialization and -destruction of any resource handled by the underlying <>, except -for those the user has obtained manually via the SYCL interoperability API. +The SYCL implementation guarantees the correct initialization and destruction of +any resource handled by the underlying <>, except for those the +user has obtained manually via the SYCL interoperability API. [[sec:command-groups-exec-order]] ==== SYCL command groups and execution order @@ -420,84 +443,85 @@ for those the user has obtained manually via the SYCL interoperability API. By default, SYCL queues execute kernel functions in an out-of-order fashion based on dependency information. Developers only need to specify what data is required to execute a particular -kernel. The SYCL runtime will guarantee that kernels are executed in an order -that guarantees correctness. +kernel. +The SYCL runtime will guarantee that kernels are executed in an order that +guarantees correctness. By specifying access modes and types of memory, a directed acyclic dependency -graph (DAG) of kernels is built at runtime. This is achieved via the usage of -<> objects. A SYCL <> object defines a set -of requisites (_R_) and a kernel function (_k_). A <> is -_submitted_ to a queue when using the +graph (DAG) of kernels is built at runtime. +This is achieved via the usage of <> objects. +A SYCL <> object defines a set of requisites (_R_) and a kernel +function (_k_). +A <> is _submitted_ to a queue when using the [code]#sycl::queue::submit# member function. -A *requisite* (_r~i~_) is a requirement that must be fulfilled for -a kernel-function (_k_) to be executed on a particular device. -For example, a requirement may be that certain data is available on a -device, or that another command group has finished execution. -An implementation may evaluate the requirements of a command group at any -point after it has been submitted. -The _processing of a command group_ is the process by which a SYCL -runtime evaluates all the requirements in a given _R_. -The SYCL runtime will execute _k_ only when all _r~i~_ are satisfied (i.e., -when all requirements are satisfied). +A *requisite* (_r~i~_) is a requirement that must be fulfilled for a +kernel-function (_k_) to be executed on a particular device. +For example, a requirement may be that certain data is available on a device, or +that another command group has finished execution. +An implementation may evaluate the requirements of a command group at any point +after it has been submitted. +The _processing of a command group_ is the process by which a SYCL runtime +evaluates all the requirements in a given _R_. +The SYCL runtime will execute _k_ only when all _r~i~_ are satisfied (i.e., when +all requirements are satisfied). To simplify the notation, in the specification we refer to the set of -requirements of a command group named _foo_ as -_CG~foo~ = r~1~, {ldots}, r~n~_. +requirements of a command group named _foo_ as _CG~foo~ = r~1~, {ldots}, r~n~_. -The _evaluation of a requisite_ ({SYCLeval}(_r~i~_)) returns the status of -the requisite, which can be _True_ or _False_. +The _evaluation of a requisite_ ({SYCLeval}(_r~i~_)) returns the status of the +requisite, which can be _True_ or _False_. A _satisfied_ requisite implies the requirement is met. {SYCLeval}(_r~i~_) never alters the requisite, only observes the current status. -The implementation may not block to check the requisite, and the same check -can be performed multiple times. +The implementation may not block to check the requisite, and the same check can +be performed multiple times. -An *action* (_a~i~_) is a collection of implementation-defined -operations that must be performed in order to satisfy a requisite. -The set of actions for a given <> _A_ is permitted -to be empty if no operation is required to satisfy the requirement. +An *action* (_a~i~_) is a collection of implementation-defined operations that +must be performed in order to satisfy a requisite. +The set of actions for a given <> _A_ is permitted to be empty if +no operation is required to satisfy the requirement. The notation _a~i~_ represents the action required to satisfy _r~i~_. -Actions of different requisites can be satisfied in any order with -respect to +Actions of different requisites can be satisfied in any order with respect to each other without side effects (i.e., given two requirements _r~j~_ and _r~k~_, -_(r~j~, r~k~)_ {equiv} _(r~k~, r~j~)_). The intersection of two -actions is not necessarily empty. -*Actions* can include (but are not limited to): memory copy operations, -memory mapping operations, coordination with the host, or implementation-specific +_(r~j~, r~k~)_ {equiv} _(r~k~, r~j~)_). +The intersection of two actions is not necessarily empty. +*Actions* can include (but are not limited to): memory copy operations, memory +mapping operations, coordination with the host, or implementation-specific behavior. -Finally, _Performing an action_ ({SYCLperform}(_a~i~_)) executes the -action operations required to satisfy the requisite _r~j~_. Note that, after -{SYCLperform}(_a~i~_), the evaluation {SYCLeval}(_r~j~_) will return _True_ -until the kernel is executed. After the kernel execution, it is not defined -whether a different <> with the same requirements needs to -perform the action again, where actions of different requisites inside the -same <> object can be satisfied in any order with -respect to each -other without side effects: Given two requirements _r~j~_ and _r~k~_, -{SYCLperform}(_a~j~_) followed by {SYCLperform}(_a~k~_) is equivalent to -{SYCLperform}(_a~k~_) followed by {SYCLperform}(_a~j~_). - -The requirements of different <> submitted to the same -or different queues are evaluated in the relative order of submission. -<> objects whose intersection of requirement sets is -not empty are said to depend on each other. +Finally, _Performing an action_ ({SYCLperform}(_a~i~_)) executes the action +operations required to satisfy the requisite _r~j~_. +Note that, after {SYCLperform}(_a~i~_), the evaluation {SYCLeval}(_r~j~_) will +return _True_ until the kernel is executed. +After the kernel execution, it is not defined whether a different +<> with the same requirements needs to perform the action again, +where actions of different requisites inside the same <> object +can be satisfied in any order with respect to each other without side effects: +Given two requirements _r~j~_ and _r~k~_, {SYCLperform}(_a~j~_) followed by +{SYCLperform}(_a~k~_) is equivalent to {SYCLperform}(_a~k~_) followed by +{SYCLperform}(_a~j~_). + +The requirements of different <> submitted to the +same or different queues are evaluated in the relative order of submission. +<> objects whose intersection of requirement sets is not empty +are said to depend on each other. They are executed in order of submission to the queue. -If <> are submitted to different queues or by multiple -threads, the order of execution is determined by the SYCL runtime. -Note that independent <> objects can be submitted -simultaneously without affecting dependencies. +If <> are submitted to different queues or by +multiple threads, the order of execution is determined by the SYCL runtime. +Note that independent <> objects can be submitted simultaneously +without affecting dependencies. <> illustrates the execution order of three <> objects (_CG~a~,CG~b~,CG~c~_) with certain requirements submitted to the same queue. -Both _CG~a~_ and _CG~b~_ only have one requirement, _r~1~_ and _r~2~_ respectively. +Both _CG~a~_ and _CG~b~_ only have one requirement, _r~1~_ and _r~2~_ +respectively. _CG~c~_ requires both _r~1~_ and _r~2~_. This enables the SYCL runtime to potentially execute _CG~a~_ and _CG~b~_ -simultaneously, whereas _CG~c~_ cannot be executed until both _CG~a~_ and _CG~b~_ -have been completed. -The SYCL runtime evaluates the *requisites* and performs the -*actions* required (if any) for the _CG~a~_ and _CG~b~_. -When evaluating the *requisites* of _CG~c~_, they will be satisfied -once the _CG~a~_ and _CG~b~_ have finished. +simultaneously, whereas _CG~c~_ cannot be executed until both _CG~a~_ and +_CG~b~_ have been completed. +The SYCL runtime evaluates the *requisites* and performs the *actions* required +(if any) for the _CG~a~_ and _CG~b~_. +When evaluating the *requisites* of _CG~c~_, they will be satisfied once the +_CG~a~_ and _CG~b~_ have finished. // Formerly in three_cg_one_queue.tex @@ -522,16 +546,17 @@ syclQueue.submit(_CG~c~(r~1~,r~2~)_); image::{images}/three-cg-one-queue.svg[align="center",opts="{imageopts}"] |==== -<> uses three separate SYCL queue objects -to submit the same <> objects as before. -Regardless of using three different queues, the execution order -of the different <> objects is the same. -When different threads enqueue to different queues, the execution order -of the command group will be the order in which the submit member functions are executed. -In this case, since the different <> objects execute on -different devices, the *actions* required to satisfy the -*requirements* may be different (e.g, the SYCL runtime may -need to copy data to a different device in a separate context). +<> uses three separate SYCL queue objects to submit +the same <> objects as before. +Regardless of using three different queues, the execution order of the different +<> objects is the same. +When different threads enqueue to different queues, the execution order of the +command group will be the order in which the submit member functions are +executed. +In this case, since the different <> objects execute on different +devices, the *actions* required to satisfy the *requirements* may be different +(e.g, the SYCL runtime may need to copy data to a different device in a separate +context). // Formerly in three_cg_three_queue.tex @@ -561,87 +586,99 @@ image::{images}/three-cg-three-queue.svg[align="center",opts="{imageopts}"] ==== Controlling execution order with events -Submitting an action for execution returns an [code]#event# object. Programmers -may use these events to explicitly coordinate host and device execution. Host -code can wait for an event to complete, which will block execution on the host -until the action(s) represented by the event have completed. The [code]#event# -class is described in greater detail in <>. - -Events may also be used to explicitly order the execution of kernels. Host code may -wait for the completion of specific event, which blocks execution on the host until -that event's action has completed. Events may also define requisites between -<>. Using events in this manner informs the runtime -that one or more <> must complete before another -<> may begin executing. See <> for -greater detail. +Submitting an action for execution returns an [code]#event# object. +Programmers may use these events to explicitly coordinate host and device +execution. +Host code can wait for an event to complete, which will block execution on the +host until the action(s) represented by the event have completed. +The [code]#event# class is described in greater detail in +<>. + +Events may also be used to explicitly order the execution of kernels. +Host code may wait for the completion of specific event, which blocks execution +on the host until that event's action has completed. +Events may also define requisites between <>. +Using events in this manner informs the runtime that one or more +<> must complete before another <> +may begin executing. +See <> for greater detail. === SYCL kernel execution model When a kernel is submitted for execution, an index space is defined. An instance of the kernel body executes for each point in this index space. -This kernel instance is called a <> and is identified by its -point in the index space, which provides a <> for the work-item. Each -work-item executes the same code but the specific execution pathway through the -code and the data operated upon can vary by using the work-item global id to +This kernel instance is called a <> and is identified by its point in +the index space, which provides a <> for the work-item. +Each work-item executes the same code but the specific execution pathway through +the code and the data operated upon can vary by using the work-item global id to specialize the computation. -An index space of size zero is allowed. All aspects of kernel execution proceed -as normal with the exception that the kernel function itself is not executed. -Note this means the command queue will still schedule this kernel after satisfying -the requirements and this satisfies requirements of any dependent enqueued kernels. +An index space of size zero is allowed. +All aspects of kernel execution proceed as normal with the exception that the +kernel function itself is not executed. +Note this means the command queue will still schedule this kernel after +satisfying the requirements and this satisfies requirements of any dependent +enqueued kernels. ==== Basic kernels SYCL allows a simple execution model in which a kernel is invoked over an _N_-dimensional index space defined by [code]#range#, where _N_ is one, two -or three. Each work-item in such a kernel executes independently. +or three. +Each work-item in such a kernel executes independently. -Each work-item is identified by a value of type [code]#item#. The type -[code]#item# encapsulates a work-item identifier of type [code]#id# and -a [code]#range# representing the number of work-items executing the kernel. +Each work-item is identified by a value of type [code]#item#. +The type [code]#item# encapsulates a work-item identifier of type +[code]#id# and a [code]#range# representing the number of work-items +executing the kernel. ==== ND-range kernels Work-items can be organized into <>, providing a more -coarse-grained decomposition of the index space. Each work-group is assigned a -unique <> with the same dimensionality as the index space used for -the work-items. Work-items are each assigned a <>, unique within the -work-group, so that a single work-item can be uniquely identified by its global -id or by a combination of its local id and work-group id. The work-items in a -given work-group execute on the processing elements of a single compute unit. +coarse-grained decomposition of the index space. +Each work-group is assigned a unique <> with the same +dimensionality as the index space used for the work-items. +Work-items are each assigned a <>, unique within the work-group, so +that a single work-item can be uniquely identified by its global id or by a +combination of its local id and work-group id. +The work-items in a given work-group execute on the processing elements of a +single compute unit. When work-groups are used in SYCL, the index space is called an <>. -An ND-range is an -_N_-dimensional index space, where _N_ is one, two or three. In -SYCL, the ND-range is represented via the [code]#nd_range# class. An -[code]#nd_range# is made up of a global range and a local range, each +An ND-range is an _N_-dimensional index space, where _N_ is one, two or three. +In SYCL, the ND-range is represented via the [code]#nd_range# class. +An [code]#nd_range# is made up of a global range and a local range, each represented via values of type [code]#range#. -Additionally, there can be a global offset, represented via a value of type [code]#id#; this is deprecated in SYCL 2020. The types -[code]#range# and [code]#id# are each _N_-element -arrays of integers. The iteration space defined via an [code]#nd_range# -is an _N_-dimensional index space starting at the ND-range's global -offset whose size is its global range, split into work-groups of the -size of its local range. +Additionally, there can be a global offset, represented via a value of type +[code]#id#; this is deprecated in SYCL 2020. +The types [code]#range# and [code]#id# are each _N_-element arrays of +integers. +The iteration space defined via an [code]#nd_range# is an _N_-dimensional +index space starting at the ND-range's global offset whose size is its global +range, split into work-groups of the size of its local range. Each work-item in the ND-range is identified by a value of type -[code]#nd_item#. The type [code]#nd_item# encapsulates a -global id, local id and work-group id, all of type [code]#id# -(the iteration space offset also of type [code]#id#, but this is deprecated in SYCL 2020), as well as -global and local ranges and coordination mechanisms necessary to -make work-groups useful. Work-groups are assigned ids using a similar -approach to that used for work-item global ids. Work-items are -assigned to a work-group and given a local id with components in the -range from zero to the size of the work-group in that dimension minus -one. Hence, the combination of a work-group id and the local id -within a work-group uniquely defines a work-item. +[code]#nd_item#. +The type [code]#nd_item# encapsulates a global id, local id and work-group +id, all of type [code]#id# (the iteration space offset also of type +[code]#id#, but this is deprecated in SYCL 2020), as well as global and local +ranges and coordination mechanisms necessary to make work-groups useful. +Work-groups are assigned ids using a similar approach to that used for work-item +global ids. +Work-items are assigned to a work-group and given a local id with components in +the range from zero to the size of the work-group in that dimension minus one. +Hence, the combination of a work-group id and the local id within a work-group +uniquely defines a work-item. ==== Backend-specific kernels SYCL allows a <> to expose fixed functionality as -non-programmable built-in kernels. The availability and behavior of these -built-in kernels are <>-specific, and are not required to follow the -SYCL execution and memory models. Furthermore the interface exposed utilize -these built-in kernels is also <>-specific. +non-programmable built-in kernels. +The availability and behavior of these built-in kernels are +<>-specific, and are not required to follow the SYCL execution and +memory models. +Furthermore the interface exposed utilize these built-in kernels is also +<>-specific. See the relevant backend specification for details. [[sec:memory.model]] @@ -649,71 +686,74 @@ See the relevant backend specification for details. Since SYCL is a single-source programming model, the memory model affects both the application and the device kernel parts of a program. -On the SYCL application, the SYCL runtime will make sure data is available -for execution of the kernels. +On the SYCL application, the SYCL runtime will make sure data is available for +execution of the kernels. On the SYCL device kernel, the <> rules describing how the memory -behaves on a specific device are mapped to SYCL {cpp} constructs. Thus it is -possible to program kernels efficiently in pure {cpp}. +behaves on a specific device are mapped to SYCL {cpp} constructs. +Thus it is possible to program kernels efficiently in pure {cpp}. [[sub.section.memmodel.app]] === SYCL application memory model -The application running on the host uses SYCL <> objects using instances of -the [code]#sycl::buffer# class or <> allocation functions -to allocate memory in the global address -space, or can allocate specialized image memory using the -[code]#sycl::unsampled_image# and [code]#sycl::sampled_image# classes. - -In the SYCL application, memory objects are bound to all devices in which -they are used, regardless of the SYCL context where they reside. -SYCL memory objects (namely, <> and <> objects) -can encapsulate multiple underlying <> memory objects together with -multiple host memory allocations to enable the same object to be shared -between devices in different contexts, platforms or backends. <> -allocations uniquely identify a memory allocation and are bound to a SYCL context. +The application running on the host uses SYCL <> objects using instances +of the [code]#sycl::buffer# class or <> allocation functions to allocate +memory in the global address space, or can allocate specialized image memory +using the [code]#sycl::unsampled_image# and [code]#sycl::sampled_image# classes. + +In the SYCL application, memory objects are bound to all devices in which they +are used, regardless of the SYCL context where they reside. +SYCL memory objects (namely, <> and <> objects) can encapsulate +multiple underlying <> memory objects together with multiple host +memory allocations to enable the same object to be shared between devices in +different contexts, platforms or backends. +<> allocations uniquely identify a memory allocation and are bound to a +SYCL context. They are only valid on the backend used by the context. The order of execution of <> objects ensures a sequentially consistent access to the memory from the different devices to the memory -objects. Accessing a USM allocation does not alter the order of execution. -Users must explicitly inform the SYCL runtime of any requirements necessary -for a legal execution. - -To access a memory object, the user must create an <> object -which parameterizes the type of access to the memory object that a kernel or -the host requires. The <> object defines a requirement to access -a memory object, and this requirement is defined by construction of an -accessor, regardless of whether there are any uses in a kernel or by the -host. An accessor object specifies whether the -access is via global memory, constant memory or image samplers and their -associated access functions. The <> also specifies whether the -access is read-only (RO), write-only (WO) or read-write (RW). An optional -[code]#no_init# property can be added to an accessor to tell the system to -discard any previous contents of the data the accessor refers to, so there -are two additional requirement types: no-init-write-only (NWO) and -no-init-read-write (NRW). For simplicity, when a *requisite* represents an -accessor object in a certain access mode, we represent it as -MemoryObject~AccessMode~. For example, an accessor that -accesses memory object *buf1* in *RW* mode is represented as -_buf1~RW~_. A <> object that uses such an accessor is -represented as _CG(buf1~RW~)_. The *action* required to satisfy a -requisite and the location of the latest copy of a memory object will vary -depending on the implementation. - -<> illustrates an example where -<> objects are enqueued to two separate SYCL queues -executing in devices in different contexts. The *requisites* for the -<> execution are the same, but the *actions* to -satisfy them are different. For example, if the data is on the host before -execution, _A(b1~RW~)_ and _A(b2~RW~)_ can potentially be implemented as -copy operations from the host memory to [code]#context1# or -[code]#context2# respectively. After _CG~a~_ and _CG~b~_ are executed, -_A'(b1~RW~)_ will likely be an empty operation, since the result of the -kernel can stay on the device. On the other hand, the results of _CG~b~_ are -now on a different context than _CG~c~_ is executing, therefore _A'(b2~RW~)_ -will need to copy data across two separate contexts using an -implementation specific mechanism. +objects. +Accessing a USM allocation does not alter the order of execution. +Users must explicitly inform the SYCL runtime of any requirements necessary for +a legal execution. + +To access a memory object, the user must create an <> object which +parameterizes the type of access to the memory object that a kernel or the host +requires. +The <> object defines a requirement to access a memory object, and +this requirement is defined by construction of an accessor, regardless of +whether there are any uses in a kernel or by the host. +An accessor object specifies whether the access is via global memory, constant +memory or image samplers and their associated access functions. +The <> also specifies whether the access is read-only (RO), write-only +(WO) or read-write (RW). +An optional [code]#no_init# property can be added to an accessor to tell the +system to discard any previous contents of the data the accessor refers to, so +there are two additional requirement types: no-init-write-only (NWO) and +no-init-read-write (NRW). +For simplicity, when a *requisite* represents an accessor object in a certain +access mode, we represent it as MemoryObject~AccessMode~. +For example, an accessor that accesses memory object *buf1* in *RW* mode is +represented as _buf1~RW~_. +A <> object that uses such an accessor is represented as +_CG(buf1~RW~)_. +The *action* required to satisfy a requisite and the location of the latest copy +of a memory object will vary depending on the implementation. + +<> illustrates an example where <> objects +are enqueued to two separate SYCL queues executing in devices in different +contexts. +The *requisites* for the <> execution are the same, but the +*actions* to satisfy them are different. +For example, if the data is on the host before execution, _A(b1~RW~)_ and +_A(b2~RW~)_ can potentially be implemented as copy operations from the host +memory to [code]#context1# or [code]#context2# respectively. +After _CG~a~_ and _CG~b~_ are executed, _A'(b1~RW~)_ will likely be an empty +operation, since the result of the kernel can stay on the device. +On the other hand, the results of _CG~b~_ are now on a different context than +_CG~c~_ is executing, therefore _A'(b2~RW~)_ will need to copy data across two +separate contexts using an implementation specific mechanism. // TODO : The example below mentions OpenCL but I think is illustrative of a // potential implementation and behavior so I am inclined to leave it there @@ -751,23 +791,25 @@ image::{images}/device_to_device2.svg[align="center",opts="{imageopts}"] <> shows actions performed when three command groups are submitted to two distinct queues, and potential implementation in an OpenCL -<> by a SYCL runtime. Note that in this example, each SYCL buffer -(_b2,b2_) is implemented as separate [code]#cl_mem# objects per -context. +<> by a SYCL runtime. +Note that in this example, each SYCL buffer (_b2,b2_) is implemented as separate +[code]#cl_mem# objects per context. Note that the order of the definition of the accessors within the <> is irrelevant to the requirements they define. -All accessors always apply to the entire <> object where -they are defined. +All accessors always apply to the entire <> object where they are +defined. -When multiple <> in the same <> define different -requisites to the same memory object these requisites must be resolved. +When multiple <> in the same <> define +different requisites to the same memory object these requisites must be +resolved. Firstly, any requisites with different access modes but the same access target are resolved into a single requisite with the union of the different access -modes according to <>. The atomic access mode acts -as if it was read-write (RW) when determining the combined requirement. The -rules in <> are commutative and associative. +modes according to <>. +The atomic access mode acts as if it was read-write (RW) when determining the +combined requirement. +The rules in <> are commutative and associative. [[table.access.mode.union]] .Combined requirement from two different accessor access modes within the same <>. The rules are commutative and associative @@ -789,13 +831,13 @@ rules in <> are commutative and associative. The result of this should be that there should not be any requisites with the same access target. -Secondly, the remaining requisites must adhere to the following rule. Only -one of the requisites may have write access (_W_ or _RW_), otherwise the -<> must throw an exception. All requisites create a -requirement for the data they represent to be made available in the specified -access target, however only the requisite with write access determines the side -effects of the <>, i.e. only the data which that requisite -represents will be updated. +Secondly, the remaining requisites must adhere to the following rule. +Only one of the requisites may have write access (_W_ or _RW_), otherwise the +<> must throw an exception. +All requisites create a requirement for the data they represent to be made +available in the specified access target, however only the requisite with write +access determines the side effects of the <>, i.e. only the data +which that requisite represents will be updated. For example: @@ -804,24 +846,23 @@ For example: * _CG(b1^G^~W~, b1^C^~RW~)_ is *not* permitted. Where _G_ and _C_ correspond to a [code]#target::device# and -[code]#target::constant_buffer# accessor and _H_ corresponds to a host -accessor. +[code]#target::constant_buffer# accessor and _H_ corresponds to a host accessor. -A buffer created from a range of an existing buffer is called -a [keyword]#sub-buffer#. +A buffer created from a range of an existing buffer is called a +[keyword]#sub-buffer#. A buffer may be overlaid with any number of sub-buffers. Accessors can be created to operate on these [keyword]#sub-buffers#. -Refer to <> for details on [keyword]#sub-buffer# -creation and restrictions. -A requirement to access a sub-buffer is represented by specifying its -range, e.g. _CG(b1~RW,[0,5)~)_ represents the requirement of accessing -the range _[0,5)_ buffer _b1_ in read write mode. - -If two accessors are constructed to -access the same buffer, but both are to non-overlapping sub-buffers of the -buffer, then the two accessors are said to not [keyword]#overlap#, otherwise the -accessors do overlap. Overlapping is the test that is used to determine the -scheduling order of command groups. +Refer to <> for details on [keyword]#sub-buffer# creation and +restrictions. +A requirement to access a sub-buffer is represented by specifying its range, +e.g. _CG(b1~RW,[0,5)~)_ represents the requirement of accessing the range +_[0,5)_ buffer _b1_ in read write mode. + +If two accessors are constructed to access the same buffer, but both are to +non-overlapping sub-buffers of the buffer, then the two accessors are said to +not [keyword]#overlap#, otherwise the accessors do overlap. +Overlapping is the test that is used to determine the scheduling order of +command groups. Command-groups with non-overlapping requirements may execute concurrently. // Formerly in overlap.tex @@ -854,24 +895,23 @@ back to the host or other devices after reading and for the runtime to maintain multiple read-only copies of the data on multiple devices. A special case of requirement is the one defined by a *host accessor*. -Host accessors are represented with -_H(MemoryObject~AccessMode~)_, e.g, +Host accessors are represented with _H(MemoryObject~AccessMode~)_, e.g, _H(b1~RW~)_ represents a host accessor to _b1_ in read-write mode. -Host accessors are a special type of accessor constructed from a memory -object outside a command group, and require that the data associated with -the given memory object is available on the host in the given pointer. +Host accessors are a special type of accessor constructed from a memory object +outside a command group, and require that the data associated with the given +memory object is available on the host in the given pointer. This causes the runtime to block on construction of this object until the requirement has been satisfied. -*Host accessor* objects are effectively barriers on all accesses to -a certain memory object. -<> shows an example of multiple command groups -enqueued to the same queue. Once the host accessor _H(b1~RW~)_ is reached, -the execution cannot proceed until _CG~a~_ is finished. +*Host accessor* objects are effectively barriers on all accesses to a certain +memory object. +<> shows an example of multiple command groups enqueued to the +same queue. +Once the host accessor _H(b1~RW~)_ is reached, the execution cannot proceed +until _CG~a~_ is finished. However, _CG~b~_ does not have any requirements on _b1_, therefore, it can execute concurrently with the barrier. -Finally, _CG~c~_ will be enqueued after _H(b1~RW~)_ is finished, -but still has to wait for _CG~b~_ to conclude for all its requirements to -be satisfied. +Finally, _CG~c~_ will be enqueued after _H(b1~RW~)_ is finished, but still has +to wait for _CG~b~_ to conclude for all its requirements to be satisfied. See <> for details on host-device coordination. // Formerly in host_acc.tex @@ -907,115 +947,126 @@ image::{images}/host-acc.svg[align="center",opts="{imageopts}"] The memory model for SYCL devices is based on the OpenCL 1.2 memory model. Work-items executing in a kernel have access to three distinct address spaces -(memory regions) and a virtual address space overlapping some concrete address spaces: - - * <> is accessible to all work-items in all work-groups. - Work-items can read from or write to any element of a global memory - object. Reads and writes to global memory may be cached depending on the - capabilities of the device. Global memory is persistent across kernel - invocations. Concurrent access to a location in an USM allocation by two or more executing - kernels where at least one kernel modifies that location is a data race; there is no guarantee - of correct results unless <> and atomic operations are used. +(memory regions) and a virtual address space overlapping some concrete address +spaces: + + * <> is accessible to all work-items in all + work-groups. + Work-items can read from or write to any element of a global memory object. + Reads and writes to global memory may be cached depending on the + capabilities of the device. + Global memory is persistent across kernel invocations. + Concurrent access to a location in an USM allocation by two or more + executing kernels where at least one kernel modifies that location is a data + race; there is no guarantee of correct results unless <> and + atomic operations are used. * <> is accessible to all work-items in a single - work-group. Attempting to access local memory in one work-group from - another work-group results in undefined behavior. This memory region can be - used to allocate variables that are shared by all work-items in a - work-group. Work-group-level visibility allows local memory to be - implemented as dedicated regions of the device memory where this is - appropriate. - * <> is a region of memory private to a work-item. + work-group. + Attempting to access local memory in one work-group from another work-group + results in undefined behavior. + This memory region can be used to allocate variables that are shared by all + work-items in a work-group. + Work-group-level visibility allows local memory to be implemented as + dedicated regions of the device memory where this is appropriate. + * <> is a region of memory private to a + work-item. Attempting to access private memory in one work-item from another work-item results in undefined behavior. - * <> is a virtual address space which overlaps the - global, local and private address spaces. Therefore, an object that resides - in the global, local, or private address space can also be accessed through - the generic address space. + * <> is a virtual address space which overlaps + the global, local and private address spaces. + Therefore, an object that resides in the global, local, or private address + space can also be accessed through the generic address space. ==== Access to memory -Accessors in the device kernels provide access to the memory objects, -acting as pointers to the corresponding address space. +Accessors in the device kernels provide access to the memory objects, acting as +pointers to the corresponding address space. Pointers can be passed directly as kernel arguments if an implementation -supports <>. See <> for information on when it is legal -to dereference pointers passed from the host inside kernels. +supports <>. +See <> for information on when it is legal to dereference pointers +passed from the host inside kernels. -To allocate local memory within a kernel, the user can either pass -a [code]#sycl::local_accessor# object as a argument to an ND-range -kernel (that has a user-defined work-group size), or -can define a variable in work-group scope inside -[code]#sycl::parallel_for_work_group#. +To allocate local memory within a kernel, the user can either pass a +[code]#sycl::local_accessor# object as a argument to an ND-range kernel (that +has a user-defined work-group size), or can define a variable in work-group +scope inside [code]#sycl::parallel_for_work_group#. Any variable defined inside a [code]#sycl::parallel_for# scope or -[code]#sycl::parallel_for_work_item# scope will be allocated in private -memory. Any variable defined inside a [code]#sycl::parallel_for_work_group# -scope will be allocated in local memory. +[code]#sycl::parallel_for_work_item# scope will be allocated in private memory. +Any variable defined inside a [code]#sycl::parallel_for_work_group# scope will +be allocated in local memory. Users can create accessors that reference sub-buffers as well as entire buffers. Within kernels, the underlying {cpp} pointer types can be obtained from an -accessor. The pointer types will contain a compile-time deduced address space. -So, for example, if a {cpp} pointer is obtained from an accessor to global memory, -the {cpp} pointer type will have a global address space attribute attached to it. +accessor. +The pointer types will contain a compile-time deduced address space. +So, for example, if a {cpp} pointer is obtained from an accessor to global +memory, the {cpp} pointer type will have a global address space attribute +attached to it. The address space attribute will be compile-time propagated to other pointer values when one pointer is initialized to another pointer value using a defined algorithm. When developers need to explicitly state the address space of a pointer value, -one of the explicit pointer classes can be used. There is a different explicit -pointer class for each address space: [code]#sycl::raw_local_ptr#, -[code]#sycl::raw_global_ptr#, [code]#sycl::raw_private_ptr#, -[code]#sycl::raw_generic_ptr#, -[code]#sycl::decorated_local_ptr#, -[code]#sycl::decorated_global_ptr#, [code]#sycl::decorated_private_ptr#, -or [code]#sycl::decorated_generic_ptr#. +one of the explicit pointer classes can be used. +There is a different explicit pointer class for each address space: +[code]#sycl::raw_local_ptr#, [code]#sycl::raw_global_ptr#, +[code]#sycl::raw_private_ptr#, [code]#sycl::raw_generic_ptr#, +[code]#sycl::decorated_local_ptr#, [code]#sycl::decorated_global_ptr#, +[code]#sycl::decorated_private_ptr#, or [code]#sycl::decorated_generic_ptr#. The classes with the [code]#decorated# prefix expose pointers that use an implementation-defined address space decoration, while the classes with the -[code]#raw# prefix do not. Buffer accessors with an access target -[code]#target::device# or [code]#target::constant_buffer# and local accessors -can be converted into explicit pointer classes ([code]#multi_ptr#). +[code]#raw# prefix do not. +Buffer accessors with an access target [code]#target::device# or +[code]#target::constant_buffer# and local accessors can be converted into +explicit pointer classes ([code]#multi_ptr#). For templates that need to adapt to different address spaces, a -[code]#sycl::multi_ptr# class is defined which is templated -via a compile-time constant enumerator value to specify the address space. +[code]#sycl::multi_ptr# class is defined which is templated via a compile-time +constant enumerator value to specify the address space. [[sec:memoryconsistency]] === SYCL memory consistency model -The SYCL memory consistency model is based upon the memory consistency -model of the {cpp} core language. Where SYCL offers extensions to classes and -functions that may affect memory consistency, the default behavior when these -extensions are not used always matches the behavior of standard {cpp}. +The SYCL memory consistency model is based upon the memory consistency model of +the {cpp} core language. +Where SYCL offers extensions to classes and functions that may affect memory +consistency, the default behavior when these extensions are not used always +matches the behavior of standard {cpp}. A SYCL implementation must guarantee that the same memory consistency model is -used across host and device code. Every <> must support the -memory model defined by the minimum version of {cpp} described in -<>; SYCL implementations supporting -additional versions of {cpp} must also support the corresponding memory models. +used across host and device code. +Every <> must support the memory model defined by the minimum +version of {cpp} described in <>; SYCL +implementations supporting additional versions of {cpp} must also support the +corresponding memory models. Within a work-item, operations are ordered according to the _sequenced before_ relation defined by the {cpp} core language. Ensuring memory consistency across different work-items requires careful usage -of <> operations, <> operations and atomic -operations. The ordering of operations across different work-items is -determined by the _happens before_ relation defined by the {cpp} core language, -with a single relation governing all address spaces (memory regions). - -On any SYCL device, local and global memory may be made consistent -across work-items in a single <> through use of a <> -operation. On SYCL devices supporting acquire-release or sequentially -consistent memory orderings, all memory visible to a set of work-items may be -made consistent across the work-items in that set through the use of -<> and atomic operations. +of <> operations, <> operations and atomic operations. +The ordering of operations across different work-items is determined by the +_happens before_ relation defined by the {cpp} core language, with a single +relation governing all address spaces (memory regions). + +On any SYCL device, local and global memory may be made consistent across +work-items in a single <> through use of a <> operation. +On SYCL devices supporting acquire-release or sequentially consistent memory +orderings, all memory visible to a set of work-items may be made consistent +across the work-items in that set through the use of <> and atomic +operations. Memory consistency between the host and SYCL device(s), or different SYCL -devices in the same context, can be guaranteed through library calls in the -host application, as defined in <>. On SYCL devices -supporting concurrent atomic accesses to USM allocations and acquire-release or -sequentially consistent memory orderings, cross-device memory consistency can -be enforced through the use of <> and atomic operations. +devices in the same context, can be guaranteed through library calls in the host +application, as defined in <>. +On SYCL devices supporting concurrent atomic accesses to USM allocations and +acquire-release or sequentially consistent memory orderings, cross-device memory +consistency can be enforced through the use of <> and atomic +operations. [[sec:memory-ordering]] ==== Memory ordering @@ -1042,16 +1093,16 @@ These memory orders are listed above from weakest ([code]#memory_order::relaxed#) to strongest ([code]#memory_order::seq_cst#). The complete set of memory orders is not guaranteed to be supported by every -device, nor across all combinations of devices within a platform. The set of -supported memory orders can be queried via the information descriptors for the -[code]#sycl::device# and [code]#sycl::context# classes. +device, nor across all combinations of devices within a platform. +The set of supported memory orders can be queried via the information +descriptors for the [code]#sycl::device# and [code]#sycl::context# classes. [NOTE] ==== -SYCL implementations are not required to support a memory order equivalent -to [code]#std::memory_order::consume#, and using this ordering within a SYCL -device kernel results in undefined behavior. Developers are encouraged to use -[code]#sycl::memory_order::acquire# instead. +SYCL implementations are not required to support a memory order equivalent to +[code]#std::memory_order::consume#, and using this ordering within a SYCL device +kernel results in undefined behavior. +Developers are encouraged to use [code]#sycl::memory_order::acquire# instead. ==== [[sec:memory-scope]] @@ -1067,97 +1118,100 @@ constraints of a given atomic operation apply is controlled by a [code]#sycl::memory_scope# parameter, which can take one of the following values: - * [code]#sycl::memory_scope::work_item# The ordering constraint applies - only to the calling work-item; - * [code]#sycl::memory_scope::sub_group# The ordering constraint applies - only to work-items in the same <> as the calling work-item; - * [code]#sycl::memory_scope::work_group# The ordering constraint applies - only to work-items in the same <> as the calling - work-item; - * [code]#sycl::memory_scope::device# The ordering constraint applies only - to work-items executing on the same device as the calling work-item; + * [code]#sycl::memory_scope::work_item# The ordering constraint applies only + to the calling work-item; + * [code]#sycl::memory_scope::sub_group# The ordering constraint applies only + to work-items in the same <> as the calling work-item; + * [code]#sycl::memory_scope::work_group# The ordering constraint applies only + to work-items in the same <> as the calling work-item; + * [code]#sycl::memory_scope::device# The ordering constraint applies only to + work-items executing on the same device as the calling work-item; * [code]#sycl::memory_scope::system# The ordering constraint applies to any - work-item or host thread in the system that is currently permitted to - access the memory allocation containing the referenced object, as - defined by the capabilities of <> and <>. + work-item or host thread in the system that is currently permitted to access + the memory allocation containing the referenced object, as defined by the + capabilities of <> and <>. The memory scopes are listed above from narrowest ([code]#memory_scope::work_item#) to widest ([code]#memory_scope::system#). The complete set of memory scopes is not guaranteed to be supported by every -device. The set of supported memory scopes can be queried via the information +device. +The set of supported memory scopes can be queried via the information descriptors for the [code]#sycl::device# and [code]#sycl::context# classes. -The widest scope that can be applied to an atomic operation corresponds to -the set of work-items which can access the associated memory location. For -example, the widest scope that can be applied to atomic operations in -work-group local memory is [code]#sycl::memory_scope::work_group#. If a -wider scope is supplied, the behavior is as-if the narrowest scope containing -all work-items which can access the associated memory location was supplied. +The widest scope that can be applied to an atomic operation corresponds to the +set of work-items which can access the associated memory location. +For example, the widest scope that can be applied to atomic operations in +work-group local memory is [code]#sycl::memory_scope::work_group#. +If a wider scope is supplied, the behavior is as-if the narrowest scope +containing all work-items which can access the associated memory location was +supplied. [NOTE] ==== -The addition of memory scopes to the {cpp} memory model modifies the -definition of some concepts from the {cpp} core language. For example: -data races, the synchronizes-with relationship and sequential -consistency must be defined in a way that accounts for atomic -operations with differing (but compatible) scopes, in a manner -similar to the <>. Efforts to -formalize the memory model of SYCL are ongoing, and a formal memory model -will be included in a future version of the SYCL specification. +The addition of memory scopes to the {cpp} memory model modifies the definition +of some concepts from the {cpp} core language. +For example: data races, the synchronizes-with relationship and sequential +consistency must be defined in a way that accounts for atomic operations with +differing (but compatible) scopes, in a manner similar to the <>. +Efforts to formalize the memory model of SYCL are ongoing, and a formal memory +model will be included in a future version of the SYCL specification. ==== ==== Atomic operations -Atomic operations can be performed on memory in buffers and USM. The -[code]#sycl::atomic_ref# class must be used to provide safe atomic access -to the buffer or USM allocation from device code. +Atomic operations can be performed on memory in buffers and USM. +The [code]#sycl::atomic_ref# class must be used to provide safe atomic access to +the buffer or USM allocation from device code. ==== Forward progress This section, and any subsequent section referring to progress guarantees, uses -the following terms as defined in the {cpp} core language: thread of -execution; weakly parallel forward progress guarantees; parallel forward -progress guarantees; concurrent forward progress guarantees; and block with -forward progress guarantee delegation. +the following terms as defined in the {cpp} core language: thread of execution; +weakly parallel forward progress guarantees; parallel forward progress +guarantees; concurrent forward progress guarantees; and block with forward +progress guarantee delegation. Each work-item in SYCL is a separate thread of execution, providing at least -weakly parallel forward progress guarantees. Whether work-items provide -stronger forward progress guarantees is implementation-defined. +weakly parallel forward progress guarantees. +Whether work-items provide stronger forward progress guarantees is +implementation-defined. All implementations must additionally ensure that a work-item arriving at a <> does not prevent other work-items in the same -group from making progress. When a work-item arrives at a group barrier acting -on group _G_, implementations must eventually select and potentially strengthen -another work-item in group _G_ that has not yet arrived at the barrier. +group from making progress. +When a work-item arrives at a group barrier acting on group _G_, implementations +must eventually select and potentially strengthen another work-item in group _G_ +that has not yet arrived at the barrier. -When a host thread blocks on the completion of a command previously submitted -to a SYCL queue (for example, via the [code]#sycl::queue::wait# function), it +When a host thread blocks on the completion of a command previously submitted to +a SYCL queue (for example, via the [code]#sycl::queue::wait# function), it blocks with forward progress guarantee delegation. [NOTE] ==== -SYCL commands submitted to a queue are not guaranteed to begin executing until -a host thread blocks on their completion. In the absence of multiple host -threads, there is no guarantee that host and device code will execute -concurrently. +SYCL commands submitted to a queue are not guaranteed to begin executing until a +host thread blocks on their completion. +In the absence of multiple host threads, there is no guarantee that host and +device code will execute concurrently. ==== // Later, this label will move onto a new subsection - see below [[sec:progmodel.cpp]] == The SYCL programming model -A SYCL program is written in standard {cpp}. Host code and device code is -written in the same {cpp} source file, enabling instantiation of templated -kernels from host code and also enabling kernel source code to be shared -between host and device. -The device kernels are encapsulated {cpp} callable types (a function object -with [code]#operator()# or a lambda function), which have -been designated to be compiled as SYCL kernels. +A SYCL program is written in standard {cpp}. +Host code and device code is written in the same {cpp} source file, enabling +instantiation of templated kernels from host code and also enabling kernel +source code to be shared between host and device. +The device kernels are encapsulated {cpp} callable types (a function object with +[code]#operator()# or a lambda function), which have been designated to be +compiled as SYCL kernels. -SYCL programs target heterogeneous systems. The kernels may be compiled and -optimized for multiple different processor architectures with very different -binary representations. +SYCL programs target heterogeneous systems. +The kernels may be compiled and optimized for multiple different processor +architectures with very different binary representations. // TODO: Add \subsection{SYCL {cpp} language requirements} before merging @@ -1168,21 +1222,23 @@ binary representations. === Minimum version of {cpp} The {cpp} features used in SYCL are based on a specific version of {cpp}. -Implementations of SYCL must support this minimum {cpp} version, which defines the -{cpp} constructs that can consequently be used by SYCL feature definitions +Implementations of SYCL must support this minimum {cpp} version, which defines +the {cpp} constructs that can consequently be used by SYCL feature definitions (for example, lambdas). -The minimum {cpp} version of this SYCL specification is determined by the normative {cpp} -core language defined in <>. All implementations -of this specification must support at least this core language, and features within this -specification are defined using features of the core language. Note that not all -core language constructs are supported within <> or code -invoked by a <>, as detailed by -<>. +The minimum {cpp} version of this SYCL specification is determined by the +normative {cpp} core language defined in <>. +All implementations of this specification must support at least this core +language, and features within this specification are defined using features of +the core language. +Note that not all core language constructs are supported within +<> or code invoked by a +<>, as detailed by <>. -Implementations may support newer {cpp} versions than the minimum required by SYCL. -Code written using newer features than the SYCL requirement, though, may -not be portable to other implementations that don't support the same {cpp} version. +Implementations may support newer {cpp} versions than the minimum required by +SYCL. +Code written using newer features than the SYCL requirement, though, may not be +portable to other implementations that don't support the same {cpp} version. [[sec:progmodel.futurecppversion]] @@ -1193,28 +1249,31 @@ in <>. The following features are pre-adopted by SYCL 2020 and made available in the [code]#sycl::# namespace: [code]#std::span#, [code]#std::dynamic_extent#, -[code]#std::bit_cast#. The implementations of pre-adopted features are -compliant with the next {cpp} specification, and are expected to forward directly -to standard {cpp} features in a future version of SYCL. +[code]#std::bit_cast#. +The implementations of pre-adopted features are compliant with the next {cpp} +specification, and are expected to forward directly to standard {cpp} features +in a future version of SYCL. The following features of SYCL 2020 use syntax based on the next {cpp} -specification: [code]#sycl::atomic_ref#. These features behave as -described in the next {cpp} specification, barring modifications to ensure -compatibility with other SYCL 2020 features and heterogeneous -programming. Any such modifications are documented in the corresponding -sections of this specification. +specification: [code]#sycl::atomic_ref#. +These features behave as described in the next {cpp} specification, barring +modifications to ensure compatibility with other SYCL 2020 features and +heterogeneous programming. +Any such modifications are documented in the corresponding sections of this +specification. === Basic data parallel kernels -Data-parallel <> that execute as -multiple <> and where no work-group-local coordination is -required are enqueued with the [code]#sycl::parallel_for# function -parameterized by a [code]#sycl::range# parameter. These kernels will execute -the kernel function body once for each work-item in the specified <>. +Data-parallel <> that execute as multiple +<> and where no work-group-local coordination is required +are enqueued with the [code]#sycl::parallel_for# function parameterized by a +[code]#sycl::range# parameter. +These kernels will execute the kernel function body once for each work-item in +the specified <>. Functionality tied to <> of work-items, including -<> and <>, must not be used -within these kernels. +<> and <>, must not be used within +these kernels. Variables with <> semantics can be added to basic data parallel kernels using the features described in <>. @@ -1223,99 +1282,105 @@ kernels using the features described in <>. Data parallel <> can also execute in a mode where the set of <> is divided into <> of -user-defined dimensions. The user specifies the global <> and local -work-group size as parameters to the [code]#sycl::parallel_for# function with a -[code]#sycl::nd_range# parameter. In this mode of execution, -kernels execute over the <> in work-groups of the specified -size. It is possible to share data among work-items within the same -work-group in <> or <>, and the -[code]#group_barrier# function can be used to block a work-item until all -work-items in the same work-group arrive at the barrier. All work-groups in a -given [code]#parallel_for# will be the same size, and the global size -defined in the nd-range must either be a multiple of the work-group size in -each dimension, or the global size must be zero. When the global size -is zero, the kernel function is not executed, the local size is ignored, and -any dependencies are satisfied. - -Work-groups may be further subdivided into <>. The -work-items that compose a sub-group are selected in an implementation-defined -way, and therefore the size and number of sub-groups may differ for each -kernel. Moreover, different devices may make different guarantees with respect -to how sub-groups within a work-group are scheduled. The maximum number of -work-items in any sub-group in a kernel is based on a combination of the kernel -and its dispatch dimensions. The size of any sub-group in the dispatch is -between 1 and this maximum sub-group size, and the size of an individual -sub-group is invariant for the duration of a kernel's execution. Similarly -to work-groups, the [code]#group_barrier# function can be used to block a -work-item until all work-items in the same sub-group arrive at the barrier. +user-defined dimensions. +The user specifies the global <> and local work-group size as parameters +to the [code]#sycl::parallel_for# function with a [code]#sycl::nd_range# +parameter. +In this mode of execution, kernels execute over the <> in work-groups +of the specified size. +It is possible to share data among work-items within the same work-group in +<> or <>, and the [code]#group_barrier# +function can be used to block a work-item until all work-items in the same +work-group arrive at the barrier. +All work-groups in a given [code]#parallel_for# will be the same size, and the +global size defined in the nd-range must either be a multiple of the work-group +size in each dimension, or the global size must be zero. +When the global size is zero, the kernel function is not executed, the local +size is ignored, and any dependencies are satisfied. + +Work-groups may be further subdivided into <>. +The work-items that compose a sub-group are selected in an +implementation-defined way, and therefore the size and number of sub-groups may +differ for each kernel. +Moreover, different devices may make different guarantees with respect to how +sub-groups within a work-group are scheduled. +The maximum number of work-items in any sub-group in a kernel is based on a +combination of the kernel and its dispatch dimensions. +The size of any sub-group in the dispatch is between 1 and this maximum +sub-group size, and the size of an individual sub-group is invariant for the +duration of a kernel's execution. +Similarly to work-groups, the [code]#group_barrier# function can be used to +block a work-item until all work-items in the same sub-group arrive at the +barrier. Portable device code must not assume that work-items within a sub-group execute in any particular order, that work-groups are subdivided into sub-groups in a specific way, nor that the work-items within a sub-group provide specific forward progress guarantees. -Variables with <> semantics can be added to work-group data -parallel kernels using the features described in <>. +Variables with <> semantics can be added to work-group data parallel +kernels using the features described in <>. === Hierarchical data parallel kernels [NOTE] ==== -Based on developer and implementation feedback, the hierarchical -data parallel kernel feature described next is undergoing -improvements to better align with the frameworks and patterns -prevalent in modern programming. As this is a key part of the SYCL -API and we expect to make changes to it, we temporarily recommend -that new codes refrain from using this feature until the new API -is finished in a near-future version of the SYCL specification, -when full use of the updated feature will be recommended for use -in new code. Existing codes using this feature will of course be -supported by conformant implementations of this specification. +Based on developer and implementation feedback, the hierarchical data parallel +kernel feature described next is undergoing improvements to better align with +the frameworks and patterns prevalent in modern programming. +As this is a key part of the SYCL API and we expect to make changes to it, we +temporarily recommend that new codes refrain from using this feature until the +new API is finished in a near-future version of the SYCL specification, when +full use of the updated feature will be recommended for use in new code. +Existing codes using this feature will of course be supported by conformant +implementations of this specification. ==== -The SYCL compiler provides a way of specifying data parallel kernels -that execute within work-groups via a different syntax which -highlights the hierarchical nature of the parallelism. This mode is -purely a compiler feature and does not change the execution model of -the kernel. Instead of calling [code]#sycl::parallel_for# the -user calls [code]#sycl::parallel_for_work_group# with a -[code]#sycl::range# value representing the number of -work-groups to launch and optionally a second -[code]#sycl::range# representing the size of each work-group -for performance tuning. All code within the -[code]#parallel_for_work_group# scope effectively executes once -per work-group. Within the [code]#parallel_for_work_group# scope, -it is possible to call [code]#parallel_for_work_item# which -creates a new scope in which all work-items within the current -work-group execute. This enables a programmer to write code that looks -like there is an inner work-item loop inside an outer work-group loop, -which closely matches the effect of the execution model. All variables -declared inside the [code]#parallel_for_work_group# scope are -allocated in work-group local memory, whereas all variables declared -inside the [code]#parallel_for_work_item# scope are declared in -private memory. All [code]#parallel_for_work_item# calls within a -given [code]#parallel_for_work_group# execution must have the -same dimensions. +The SYCL compiler provides a way of specifying data parallel kernels that +execute within work-groups via a different syntax which highlights the +hierarchical nature of the parallelism. +This mode is purely a compiler feature and does not change the execution model +of the kernel. +Instead of calling [code]#sycl::parallel_for# the user calls +[code]#sycl::parallel_for_work_group# with a [code]#sycl::range# value +representing the number of work-groups to launch and optionally a second +[code]#sycl::range# representing the size of each work-group for performance +tuning. +All code within the [code]#parallel_for_work_group# scope effectively executes +once per work-group. +Within the [code]#parallel_for_work_group# scope, it is possible to call +[code]#parallel_for_work_item# which creates a new scope in which all work-items +within the current work-group execute. +This enables a programmer to write code that looks like there is an inner +work-item loop inside an outer work-group loop, which closely matches the effect +of the execution model. +All variables declared inside the [code]#parallel_for_work_group# scope are +allocated in work-group local memory, whereas all variables declared inside the +[code]#parallel_for_work_item# scope are declared in private memory. +All [code]#parallel_for_work_item# calls within a given +[code]#parallel_for_work_group# execution must have the same dimensions. === Kernels that are not launched over parallel instances Simple kernels for which only a single instance of the kernel function will be -executed are enqueued with the [code]#sycl::single_task# function. The -kernel enqueued takes no "`work-item id`" parameter and will only execute once. +executed are enqueued with the [code]#sycl::single_task# function. +The kernel enqueued takes no "`work-item id`" parameter and will only execute +once. The behavior is logically equivalent to executing a kernel on a single compute -unit with a single work-group comprising only one work-item. Such kernels may be -enqueued on multiple queues and devices and as a result may be executed in -task-parallel fashion. +unit with a single work-group comprising only one work-item. +Such kernels may be enqueued on multiple queues and devices and as a result may +be executed in task-parallel fashion. [[sec:pre-defined-kernels]] === Pre-defined kernels -Some <> may expose pre-defined functionality to users as kernels. -These kernels are not programmable, hence they are not bound by the SYCL -{cpp} programming model restrictions, and how they are written is +Some <> may expose pre-defined functionality to users as +kernels. +These kernels are not programmable, hence they are not bound by the SYCL {cpp} +programming model restrictions, and how they are written is implementation-defined. @@ -1323,25 +1388,27 @@ implementation-defined. === Coordination and Synchronization Coordination between the host and any devices can be expressed in the host SYCL -application using calls into the SYCL runtime. Coordination between work-items -executing inside of device code can be expressed using group barriers. +application using calls into the SYCL runtime. +Coordination between work-items executing inside of device code can be expressed +using group barriers. Some function calls synchronize with other function calls performed by another -thread (potentially on another device). Other functions are defined in terms of -their synchronization operations. Such functions can be used to ensure that -the host and any devices do not access data concurrently, and/or to reason -about the ordering of operations across the host and any devices. +thread (potentially on another device). +Other functions are defined in terms of their synchronization operations. +Such functions can be used to ensure that the host and any devices do not access +data concurrently, and/or to reason about the ordering of operations across the +host and any devices. ==== Host-Device Coordination The following operations can be used to coordinate host and device(s): - * _Buffer destruction_: The destructors for - [code]#sycl::buffer#, [code]#sycl::unsampled_image# and - [code]#sycl::sampled_image# objects block until all submitted work on - those objects completes and copy the data back to host memory before - returning. These destructors only block if the object was constructed with - attached host memory and if data needs to be copied back to the host. + * _Buffer destruction_: The destructors for [code]#sycl::buffer#, + [code]#sycl::unsampled_image# and [code]#sycl::sampled_image# objects block + until all submitted work on those objects completes and copy the data back + to host memory before returning. + These destructors only block if the object was constructed with attached + host memory and if data needs to be copied back to the host. + -- More complex forms of buffer destruction can be specified by the user by @@ -1349,51 +1416,52 @@ constructing buffers with other kinds of references to memory, such as [code]#shared_ptr# and [code]#unique_ptr#. -- * _Host Accessors_: The constructor for a host accessor blocks until all - kernels that modify the same buffer (or image) in any queues complete - and then copies data back to host memory before the constructor returns. + kernels that modify the same buffer (or image) in any queues complete and + then copies data back to host memory before the constructor returns. Any command groups with requirements to the same memory object cannot execute until the host accessor is destroyed as shown on <>. - * _Command group enqueue_: The <> internally ensures - that any command groups added to queues have the correct event - dependencies added to those queues to ensure correct operation. Adding - command groups to queues never blocks, and the [code]#sycl::event# returned - by the queue's submit function contains event information related to the - specific command group. - * _Queue operations_: The user can manually use queue operations, - such as [code]#sycl::queue::wait()# to block execution of the calling thread until all - the command groups submitted to the queue have finished execution. Note - that this will also affect the dependencies of those command groups in + * _Command group enqueue_: The <> internally ensures that any + command groups added to queues have the correct event dependencies added to + those queues to ensure correct operation. + Adding command groups to queues never blocks, and the [code]#sycl::event# + returned by the queue's submit function contains event information related + to the specific command group. + * _Queue operations_: The user can manually use queue operations, such as + [code]#sycl::queue::wait()# to block execution of the calling thread until + all the command groups submitted to the queue have finished execution. + Note that this will also affect the dependencies of those command groups in other queues. - * _SYCL event objects_: SYCL provides [code]#sycl::event# - objects which can be used to track and specify dependencies. The - <> must ensure that these objects can be used to - enforce dependencies that span SYCL contexts from different <> must ensure that these objects can be used to enforce + dependencies that span SYCL contexts from different <>. The specification for each of these blocking functions defines some set of -operations that cause the function to unblock. These operations always happen -before the blocking function returns (using the definition of "happens before" -from the C++ specification). - -Note that the destructors of other SYCL objects -([code]#sycl::queue#, [code]#sycl::context#,{ldots}) do -not block. Only a [code]#sycl::buffer#, [code]#sycl::sampled_image# or -[code]#sycl::unsampled_image# destructor might block. The rationale is -that an object without any side effect on the host does not need to -block on destruction as it would impact the performance. So it is up -to the programmer to use a member function to wait for completion in some -cases if this does not fit the goal. -See <> for more information -on object life time. +operations that cause the function to unblock. +These operations always happen before the blocking function returns (using the +definition of "happens before" from the C++ specification). + +Note that the destructors of other SYCL objects ([code]#sycl::queue#, +[code]#sycl::context#,{ldots}) do not block. +Only a [code]#sycl::buffer#, [code]#sycl::sampled_image# or +[code]#sycl::unsampled_image# destructor might block. +The rationale is that an object without any side effect on the host does not +need to block on destruction as it would impact the performance. +So it is up to the programmer to use a member function to wait for completion in +some cases if this does not fit the goal. +See <> for more information on object life time. ==== Work-item Coordination A <> provides a mechanism to coordinate all work-items in the -same group. All work-items in a group must execute the barrier before any are -allowed to continue execution beyond the barrier. Note that the group barrier -must be encountered by all work-items of a group executing the kernel or by -none at all. <> and <> functionality is -exposed via the [code]#group_barrier# function. +same group. +All work-items in a group must execute the barrier before any are allowed to +continue execution beyond the barrier. +Note that the group barrier must be encountered by all work-items of a group +executing the kernel or by none at all. +<> and <> functionality is exposed via +the [code]#group_barrier# function. Coordination between work-items in different work-groups must take place via atomic operations, and is possible only on SYCL device with certain @@ -1401,17 +1469,17 @@ capabilities, as described in <>. === Error handling -In SYCL, there are two types of errors: synchronous errors that can be -detected immediately when an API call is made, and <> +In SYCL, there are two types of errors: synchronous errors that can be detected +immediately when an API call is made, and <> that can only be detected later after an API call has returned. -Synchronous errors, such as failure to construct an -object, are reported immediately by the runtime throwing an -exception. <>, such as an error occurring during -execution of a kernel on a device, are reported via an asynchronous -error-handler mechanism. - -<> are not reported immediately as they occur. The -asynchronous error handler for a context or queue is called with a +Synchronous errors, such as failure to construct an object, are reported +immediately by the runtime throwing an exception. +<>, such as an error occurring during execution +of a kernel on a device, are reported via an asynchronous error-handler +mechanism. + +<> are not reported immediately as they occur. +The asynchronous error handler for a context or queue is called with a [code]#sycl::exception_list# object, which contains a list of asynchronously-generated exception objects, on the conditions described by <> and <>. @@ -1420,117 +1488,121 @@ Asynchronous errors may be generated regardless of whether the user has specified any asynchronous error handler(s), as described in <>. -Some <> can report errors that are specific to the platform -they are targeting, or that are more concrete than the errors provided +Some <> can report errors that are specific to the +platform they are targeting, or that are more concrete than the errors provided by the SYCL API. Any error reported by a <> must derive from the base [code]#sycl::exception#. -When a user wishes to capture specifically an error thrown by a <>, -she must include the <>-specific headers for said <>. +When a user wishes to capture specifically an error thrown by a <>, she +must include the <>-specific headers for said <>. === Fallback mechanism -A <> can be submitted either to a single queue -to be executed on, or to a secondary queue. If a -<> fails to be enqueued to the primary queue, then -the system will attempt to enqueue it to the secondary queue, if given as a -parameter to the submit function. If the <> fails to be -queued to both of these queues, then a synchronous SYCL exception will be thrown. - -It is possible that a command group may be successfully enqueued, -but then asynchronously fail to run, for some reason. In this case, it may be -possible for the runtime system to execute the <> -on the secondary queue, instead of the primary queue. The situations where a SYCL -runtime may be able to achieve this asynchronous fall-back is implementation-defined. +A <> can be submitted either to a single queue to +be executed on, or to a secondary queue. +If a <> fails to be enqueued to the primary +queue, then the system will attempt to enqueue it to the secondary queue, if +given as a parameter to the submit function. +If the <> fails to be queued to both of these +queues, then a synchronous SYCL exception will be thrown. + +It is possible that a command group may be successfully enqueued, but then +asynchronously fail to run, for some reason. +In this case, it may be possible for the runtime system to execute the +<> on the secondary queue, instead of the primary +queue. +The situations where a SYCL runtime may be able to achieve this asynchronous +fall-back is implementation-defined. === Scheduling of kernels and data movement A <> takes a reference to a command group -[code]#handler# as a parameter and anything within that scope is -immediately executed and takes the handler object as a parameter. The -intention is that a user will perform calls to SYCL functions, member functions, -destructors and constructors inside that scope. These calls will be non-blocking -on the host, but enqueue operations to the queue that the command group is submitted -to. All user functions within the command group scope will be called on the host -as the <> is executed, but any <> it invokes will be added to the SYCL <>. All commands added -to the <> will be executed out-of-order from each other, according to -their data dependencies. +[code]#handler# as a parameter and anything within that scope is immediately +executed and takes the handler object as a parameter. +The intention is that a user will perform calls to SYCL functions, member +functions, destructors and constructors inside that scope. +These calls will be non-blocking on the host, but enqueue operations to the +queue that the command group is submitted to. +All user functions within the command group scope will be called on the host as +the <> is executed, but any <> +it invokes will be added to the SYCL <>. +All commands added to the <> will be executed out-of-order from each +other, according to their data dependencies. [[sec:managing-object-lifetimes]] === Managing object lifetimes A SYCL application does not initialize any <> features until a -[code]#sycl::context# object is created. A user does not need to -explicitly create a [code]#sycl::context# object, but they do need to -explicitly create a [code]#sycl::queue# object, for which a -[code]#sycl::context# object will be implicitly created if not provided -by the user. - -All <> objects encapsulated in SYCL objects are reference-counted and will -be destroyed once all references have been released. This means that a user needs -only create a SYCL <> (which will automatically create an SYCL context) for -the lifetime of their application to initialize and release any <> objects -safely. - -There is no global state specified to be required in SYCL implementations. This -means, for example, that if the user creates two queues without explicitly +[code]#sycl::context# object is created. +A user does not need to explicitly create a [code]#sycl::context# object, but +they do need to explicitly create a [code]#sycl::queue# object, for which a +[code]#sycl::context# object will be implicitly created if not provided by the +user. + +All <> objects encapsulated in SYCL objects are reference-counted and +will be destroyed once all references have been released. +This means that a user needs only create a SYCL <> (which will +automatically create an SYCL context) for the lifetime of their application to +initialize and release any <> objects safely. + +There is no global state specified to be required in SYCL implementations. +This means, for example, that if the user creates two queues without explicitly constructing a common context, then a SYCL implementation does not have to -create a shared context for the two queues. Implementations are free to share or -cache state globally for performance, but it is not required. - -Memory objects can be constructed with or without attached host memory. If no -host memory is attached at the point of construction, then destruction of that -memory object is non-blocking. The user may use {cpp} standard pointer classes -for sharing the host data with the user application and for defining blocking, -or non-blocking behavior of the buffers and images. -If host memory is attached by using a raw pointer, then the default behavior is +create a shared context for the two queues. +Implementations are free to share or cache state globally for performance, but +it is not required. + +Memory objects can be constructed with or without attached host memory. +If no host memory is attached at the point of construction, then destruction of +that memory object is non-blocking. +The user may use {cpp} standard pointer classes for sharing the host data with +the user application and for defining blocking, or non-blocking behavior of the +buffers and images. +If host memory is attached by using a raw pointer, then the default behavior is followed, which is that the destructor will block until any command groups operating on the memory object have completed, then, if the contents of the memory object is modified on a device those contents are copied back to host and only then does the destructor return. -In the case where host memory is shared -between the user application and the <> with a -[code]#std::shared_ptr#, then the reference counter -of the [code]#std::shared_ptr# determines whether the buffer needs to copy -data back on destruction, and in that case the blocking or non-blocking behavior +In the case where host memory is shared between the user application and the +<> with a [code]#std::shared_ptr#, then the reference counter of +the [code]#std::shared_ptr# determines whether the buffer needs to copy data +back on destruction, and in that case the blocking or non-blocking behavior depends on the user application. -Instead of a [code]#std::shared_ptr#, a [code]#std::unique_ptr# may be -provided, which uses move semantics for initializing and using the -associated host memory. In this case, the behavior of the buffer in -relation to the user application will be non-blocking on destruction. +Instead of a [code]#std::shared_ptr#, a [code]#std::unique_ptr# may be provided, +which uses move semantics for initializing and using the associated host memory. +In this case, the behavior of the buffer in relation to the user application +will be non-blocking on destruction. -As said in <>, the only blocking -operations in SYCL (apart from explicit wait operations) are: +As said in <>, the only blocking operations in SYCL (apart +from explicit wait operations) are: - * host accessor constructor, which waits for any kernels enqueued before - its creation that write to the corresponding object to finish and be - copied back to host memory before it starts processing. The host - accessor does not necessarily copy back to the same host memory as + * host accessor constructor, which waits for any kernels enqueued before its + creation that write to the corresponding object to finish and be copied back + to host memory before it starts processing. + The host accessor does not necessarily copy back to the same host memory as initially given by the user; - * memory object destruction, in the case where copies back to host memory - have to be done or when the host memory is used as a backing-store. + * memory object destruction, in the case where copies back to host memory have + to be done or when the host memory is used as a backing-store. === Device discovery and selection -A user specifies which queue to submit a -<> and each <> is -targeted to run on a specific <> (and <>). A user -can specify the actual device on queue creation, or they can specify a -<> which causes the <> to choose a -device based on the user's provided preferences. Specifying a -<> causes the <> to perform device -discovery. No device discovery is performed until a SYCL -<> is passed to a queue constructor. Device -topology may be cached by the <>, but this is not -required. - -Device discovery will return all <> from all <> exposed -by all the supported <>. +A user specifies which queue to submit a <> and +each <> is targeted to run on a specific <> (and <>). +A user can specify the actual device on queue creation, or they can specify a +<> which causes the <> to choose a device based +on the user's provided preferences. +Specifying a <> causes the <> to perform device +discovery. +No device discovery is performed until a SYCL <> is passed to a +queue constructor. +Device topology may be cached by the <>, but this is not required. + +Device discovery will return all <> from all +<> exposed by all the supported <>. === Interfacing with the SYCL backend API @@ -1539,22 +1611,22 @@ There are two styles of developing a SYCL application: . writing a pure SYCL generic application; . writing a SYCL application that relies on some <> specific behavior. -When users follow 1., there is no assumption about what <> will be used during -compilation or execution of the SYCL application. Therefore, the <> -is not assumed to be available to the developer. -Only standard {cpp} types and interfaces are assumed to be available, -as described in <>. -Users only need to include the [code]## header to write a -SYCL generic application. +When users follow 1., there is no assumption about what <> will be used +during compilation or execution of the SYCL application. +Therefore, the <> is not assumed to be available to the developer. +Only standard {cpp} types and interfaces are assumed to be available, as +described in <>. +Users only need to include the [code]## header to write a SYCL +generic application. On the other hand, when users follow 2., they must know what <>s -they are using. In this case, any header required for the normal -programmability of the <> is assumed to be available to the user. -In addition to the [code]## header, users must also -include the <>-specific header as defined in -<>. The <>-specific header -provides the interoperability interface for the SYCL API to interact with -<>. +they are using. +In this case, any header required for the normal programmability of the +<> is assumed to be available to the user. +In addition to the [code]## header, users must also include the +<>-specific header as defined in <>. +The <>-specific header provides the interoperability interface for the +SYCL API to interact with <>. The interoperability API is defined in <>. @@ -1563,91 +1635,96 @@ The interoperability API is defined in <>. SYCL memory objects represent data that is handled by the <> and can represent allocations in one or multiple <> at any time. Memory objects, both buffers and images, may have one or more underlying -<> to ensure that <> objects -can use data in any device. A SYCL implementation may have multiple -<> for the same device. -The <> is responsible for ensuring the different copies are up-to-date -whenever necessary, using whatever mechanism is available in the system -to update the copies of the underlying <>. +<> to ensure that <> +objects can use data in any device. +A SYCL implementation may have multiple <> for the same device. +The <> is responsible for ensuring the different copies are +up-to-date whenever necessary, using whatever mechanism is available in the +system to update the copies of the underlying <>. [NOTE] .Implementation note ==== -A valid mechanism for this update is to transfer the data from one -<> into the system memory using the <>-specific -mechanism available, and then transfer it to a different device -using the mechanism exposed by the new <>. +A valid mechanism for this update is to transfer the data from one <> +into the system memory using the <>-specific mechanism available, and +then transfer it to a different device using the mechanism exposed by the new +<>. ==== -Memory objects in SYCL fall into one of two categories: <> objects -and <> objects. A buffer object stores a one-, two- or -three-dimensional collection of elements that are stored linearly directly back -to back in the same way C or {cpp} stores arrays. An image object is used to store -a one-, two- or three-dimensional texture, frame-buffer or image data that may be -stored in an optimized and device-specific format in memory and must be accessed -through specialized operations. - -Elements of a buffer object can be a scalar data type (such as an -[code]#int# or [code]#float#), vector data type, or a user-defined -structure. In SYCL, a <> object is a templated type -([code]#sycl::buffer#), parameterized by the element type and number of -dimensions. An <> object is stored in one of a limited number of -formats. The elements of an image object are selected from a list of -predefined image formats which are provided by an underlying <> -implementation. Images are encapsulated in the -[code]#sycl::unsampled_image# or [code]#sycl::sampled_image# -types, which are templated by the number of dimensions in the image. The -minimum number of elements in an image object is one. The minimum number -of elements in a buffer object is zero. +Memory objects in SYCL fall into one of two categories: <> objects and +<> objects. +A buffer object stores a one-, two- or three-dimensional collection of elements +that are stored linearly directly back to back in the same way C or {cpp} stores +arrays. +An image object is used to store a one-, two- or three-dimensional texture, +frame-buffer or image data that may be stored in an optimized and +device-specific format in memory and must be accessed through specialized +operations. + +Elements of a buffer object can be a scalar data type (such as an [code]#int# or +[code]#float#), vector data type, or a user-defined structure. +In SYCL, a <> object is a templated type ([code]#sycl::buffer#), +parameterized by the element type and number of dimensions. +An <> object is stored in one of a limited number of formats. +The elements of an image object are selected from a list of predefined image +formats which are provided by an underlying <> implementation. +Images are encapsulated in the [code]#sycl::unsampled_image# or +[code]#sycl::sampled_image# types, which are templated by the number of +dimensions in the image. +The minimum number of elements in an image object is one. +The minimum number of elements in a buffer object is zero. The fundamental differences between a buffer and an image object are: - * elements in a buffer are stored in an array of 1, 2 or 3 dimensions and - can be accessed using an accessor by a kernel executing on a device. The - accessors for kernels provide a member function to get {cpp} pointer types, or the - [code]#sycl::global_ptr# class; - * elements of an image are stored in a format that is opaque to the user - and cannot be directly accessed using a pointer. SYCL provides image - accessors and samplers to allow a kernel to read from or write to an - image; - * for a buffer object the data is accessed within a kernel in the same - format as it is stored in memory, but in the case of an image object the - data is not necessarily accessed within a kernel in the same format as - it is stored in memory; + * elements in a buffer are stored in an array of 1, 2 or 3 dimensions and can + be accessed using an accessor by a kernel executing on a device. + The accessors for kernels provide a member function to get {cpp} pointer + types, or the [code]#sycl::global_ptr# class; + * elements of an image are stored in a format that is opaque to the user and + cannot be directly accessed using a pointer. + SYCL provides image accessors and samplers to allow a kernel to read from or + write to an image; + * for a buffer object the data is accessed within a kernel in the same format + as it is stored in memory, but in the case of an image object the data is + not necessarily accessed within a kernel in the same format as it is stored + in memory; * image elements are always a 4-component vector (each component can be a - float or signed/unsigned integer) in a kernel. Accessors that read an - image convert image elements from their storage format into a 4-component - vector. + float or signed/unsigned integer) in a kernel. + Accessors that read an image convert image elements from their storage + format into a 4-component vector. + -- -Similarly, the SYCL accessor member functions provided to write to an -image convert the image element from a 4-component vector to -the appropriate image format specified such as four 8-bit -elements, for example. +Similarly, the SYCL accessor member functions provided to write to an image +convert the image element from a 4-component vector to the appropriate image +format specified such as four 8-bit elements, for example. -- Users may want fine-grained control of the memory management and storage -semantics of SYCL image or buffer objects. For example, a user may wish to -specify the host memory for a memory object to use, but may not want the memory -object to block on destruction. +semantics of SYCL image or buffer objects. +For example, a user may wish to specify the host memory for a memory object to +use, but may not want the memory object to block on destruction. -Depending on the control and the use cases of the SYCL applications, -well established {cpp} classes and patterns can be used for reference counting and -sharing data between user applications and the <>. For control over -memory allocation on the host and mapping between host and device memory, -pre-defined or user-defined {cpp} [code]#std::allocator# classes are -used. To avoid data races when sharing data between SYCL and non-SYCL -applications, [code]#std::shared_ptr# and [code]#std::mutex# classes are used. +Depending on the control and the use cases of the SYCL applications, well +established {cpp} classes and patterns can be used for reference counting and +sharing data between user applications and the <>. +For control over memory allocation on the host and mapping between host and +device memory, pre-defined or user-defined {cpp} [code]#std::allocator# classes +are used. +To avoid data races when sharing data between SYCL and non-SYCL applications, +[code]#std::shared_ptr# and [code]#std::mutex# classes are used. == Multi-dimensional objects and linearization SYCL defines a number of multi-dimensional objects such as buffers and -accessors. The iteration space of work-items in a kernel may also be -multi-dimensional. The size of each dimension is defined by a [code]#range# -object of one, two or three dimensions, and an element in the multi-dimensional -space can be identified using an [code]#id# object with the same number of -dimensions as the corresponding [code]#range#. +accessors. +The iteration space of work-items in a kernel may also be multi-dimensional. +The size of each dimension is defined by a [code]#range# object of one, two or +three dimensions, and an element in the multi-dimensional space can be +identified using an [code]#id# object with the same number of dimensions as the +corresponding [code]#range#. If the size of any dimension is zero, there are zero elements in the multi-dimensional range. @@ -1655,13 +1732,12 @@ multi-dimensional range. [[sec:multi-dim-linearization]] === Linearization -Some multi-dimensional objects can be viewed in a linear form. When this -happens, the right-most term in the object's range varies fastest in the -linearization. +Some multi-dimensional objects can be viewed in a linear form. +When this happens, the right-most term in the object's range varies fastest in +the linearization. -A three-dimensional element [code]#id{id0, id1, id2}# within a -three-dimensional object of range [code]#range{r0, r1, r2}# has a linear -position defined by: +A three-dimensional element [code]#id{id0, id1, id2}# within a three-dimensional +object of range [code]#range{r0, r1, r2}# has a linear position defined by: [latexmath] ++++ id2 + (id1 \cdot r2) + (id0 \cdot r1 \cdot r2) @@ -1681,61 +1757,67 @@ A one-dimensional element [code]#id{id0}# within a one-dimensional range [[sec:multi-dim-subscript]] === Multi-dimensional subscript operators -Some multi-dimensional objects can be indexed using the subscript operator -where consecutive subscript operators correspond to each dimension. The -right-most operator varies fastest, as with standard {cpp} arrays. Formally, a -three-dimensional subscript access [code]#a[id0][id1][id2]# references the element -at [code]#id{id0, id1, id2}#. A two-dimensional subscript access -[code]#a[id0][id1]# references the element at [code]#id{id0, id1}#. A -one-dimensional subscript access [code]#a[id0]# references the element at +Some multi-dimensional objects can be indexed using the subscript operator where +consecutive subscript operators correspond to each dimension. +The right-most operator varies fastest, as with standard {cpp} arrays. +Formally, a three-dimensional subscript access [code]#a[id0][id1][id2]# +references the element at [code]#id{id0, id1, id2}#. +A two-dimensional subscript access [code]#a[id0][id1]# references the element at +[code]#id{id0, id1}#. +A one-dimensional subscript access [code]#a[id0]# references the element at [code]#id{id0}#. == Implementation options The SYCL language is designed to allow several different possible -implementations. The contents of this section are non-normative, so -implementations need not follow the guidelines listed here. However, this -section is intended to help readers understand the possible strategies that can -be used to implement SYCL. +implementations. +The contents of this section are non-normative, so implementations need not +follow the guidelines listed here. +However, this section is intended to help readers understand the possible +strategies that can be used to implement SYCL. [[subsec:smcp]] === Single source multiple compiler passes With this technique, known as <>, there are separate host and device -compilers. Each SYCL source file is compiled two times: once by the host -compiler and once by the device compiler. An implementation could support more -than one device compiler, in which case each SYCL source file is compiled -more than two times. The host compiler in this technique could be an -off-the-shelf compiler with no special knowledge of SYCL, but the device -compiler must be SYCL aware. The device compiler parses the source file to -identify each <> and any <> it calls. SYCL is designed so that this analysis can be -done statically. The device compiler then generates code only for the -<> and the <>. +compilers. +Each SYCL source file is compiled two times: once by the host compiler and once +by the device compiler. +An implementation could support more than one device compiler, in which case +each SYCL source file is compiled more than two times. +The host compiler in this technique could be an off-the-shelf compiler with no +special knowledge of SYCL, but the device compiler must be SYCL aware. +The device compiler parses the source file to identify each +<> and any <> it calls. +SYCL is designed so that this analysis can be done statically. +The device compiler then generates code only for the <> and the <>. Typically, the device compilers generate header files which interface between -the host compiler and the <>. Therefore, the device compiler -runs first, and then the host compiler consumes these header files when -generating the host code. +the host compiler and the <>. +Therefore, the device compiler runs first, and then the host compiler consumes +these header files when generating the host code. The device compilers in this technique generate one or more <> for the <>, which -can be read by the <>. Each <> could either -contain native ISA for a device or it could contain an intermediate language -such as SPIR-V. In the later case, the <> must translate the -intermediate language into native device ISA when the <> -is submitted to a device. +can be read by the <>. +Each <> could either contain native ISA for a device or it could +contain an intermediate language such as SPIR-V. +In the later case, the <> must translate the intermediate language +into native device ISA when the <> is submitted to a +device. Since this technique has separate host and device compilers, there needs to be some way to associate a <> (which is compiled by the device compiler) with the code that invokes it (which is compiled by the host -compiler). Implementations conformant to the reduced feature set +compiler). +Implementations conformant to the reduced feature set (<>) can do this by using the {cpp} type of the -<>. This type is specified via the <> -template parameter if the <> is a lambda function, or it -is obtained from the class type if the <> is an object. +<>. +This type is specified via the <> template parameter if the +<> is a lambda function, or it is obtained from the class +type if the <> is an object. Implementations conformant to the full feature set (<>) do not require a <> at the invocation site, so they must implement some other way to make the association. @@ -1744,69 +1826,71 @@ some other way to make the association. [[subsec:sscp]] === Single source single compiler pass -With this technique, known as <>, the vendor implements a custom -compiler that reads each SYCL source file only once, and that compiler -generates the host code as well as the <> -for the <>. As in the -<> case, each <> could either contain native +With this technique, known as <>, the vendor implements a custom compiler +that reads each SYCL source file only once, and that compiler generates the host +code as well as the <> for the +<>. +As in the <> case, each <> could either contain native device ISA or an intermediate language. === Library-only implementation It is also possible to implement SYCL purely as a library, using an -off-the-shelf host compiler with no special support for SYCL. In such an -implementation, each <> may run on the host system. +off-the-shelf host compiler with no special support for SYCL. +In such an implementation, each <> may run on the host system. == Language restrictions in kernels The SYCL <> are executed on SYCL devices and all of the -functions called from a SYCL kernel are going to be compiled for the device -by a SYCL <>. Due to restrictions of the heterogeneous -devices where the SYCL kernel will execute, there are certain restrictions -on the base {cpp} language features that can be used inside kernel code. For -details on language restrictions please refer -to <>. +functions called from a SYCL kernel are going to be compiled for the device by a +SYCL <>. +Due to restrictions of the heterogeneous devices where the SYCL kernel will +execute, there are certain restrictions on the base {cpp} language features that +can be used inside kernel code. +For details on language restrictions please refer to +<>. SYCL kernels use arguments that are captured by value in the <> or are passed from the host to the device using -<>. Sharing data structures between host and device code -imposes certain restrictions, such as using only objects that are -<>, and in general, no pointers -initialized for the host can be used on the device. SYCL memory objects, -such as [code]#sycl::buffer#, [code]#sycl::unsampled_image#, and -[code]#sycl::sampled_image#, cannot be passed to a kernel. Instead, a kernel -must interact with these objects through <>. -No hierarchical structures of -these memory object classes are supported and any other data containers need to be -converted to the SYCL data management classes using the SYCL interface. For -more details on the rules for kernel parameter passing, please refer -to <>. - -Pointers to <> allocations -may be passed to a kernel either directly as arguments or indirectly -inside of other objects. Pointers to <> allocations that are -passed as kernel arguments are treated as being in the global -address space. +<>. +Sharing data structures between host and device code imposes certain +restrictions, such as using only objects that are <>, and in +general, no pointers initialized for the host can be used on the device. +SYCL memory objects, such as [code]#sycl::buffer#, +[code]#sycl::unsampled_image#, and [code]#sycl::sampled_image#, cannot be passed +to a kernel. +Instead, a kernel must interact with these objects through +<>. +No hierarchical structures of these memory object classes are supported and any +other data containers need to be converted to the SYCL data management classes +using the SYCL interface. +For more details on the rules for kernel parameter passing, please refer to +<>. + +Pointers to <> allocations may be passed to a kernel either directly as +arguments or indirectly inside of other objects. +Pointers to <> allocations that are passed as kernel arguments are treated +as being in the global address space. [[sec::device.copyable]] === Device copyable -The SYCL implementation may need to copy data between the host and a device -or between two devices. For example, this may occur when a <> -has a requirement for the contents of a buffer or when the application passes -certain arguments to a <> (as described in -<>). Such data must have a type that is -<> as defined below. +The SYCL implementation may need to copy data between the host and a device or +between two devices. +For example, this may occur when a <> has a requirement for the +contents of a buffer or when the application passes certain arguments to a +<> (as described in <>). +Such data must have a type that is <> as defined below. Any type that is trivially copyable (as defined by the {cpp} core language) is implicitly device copyable. Although implementations are not required to support device code that calls -library functions from the {cpp} core language, some implementations may -provide device support for some of these functions. If the implementation -provides device support for one of the following classes, that type is also -implicitly device copyable: +library functions from the {cpp} core language, some implementations may provide +device support for some of these functions. +If the implementation provides device support for one of the following classes, +that type is also implicitly device copyable: * [code]#std::array#; * [code]#std::array# if [code]#T# is device copyable; @@ -1819,8 +1903,8 @@ implicitly device copyable: * [code]#+std::variant+# if all the types in the parameter pack [code]#Types# are device copyable; * [code]#std::basic_string_view#; - * [code]#std::span# (the [code]#std::span# type has - been introduced in {cpp}20); + * [code]#std::span# (the [code]#std::span# type has been + introduced in {cpp}20); * [code]#sycl::span#. If the implementation provides device support for one of the classes listed @@ -1831,25 +1915,27 @@ device copyable. ==== The types [code]#std::basic_string_view# and [code]#std::span# are both view types, which reference -underlying data that is not contained within their type. Although these view -types are device copyable, the implementation copies just the view and not -the contained data when doing an inter-device copy. In order to reference the -contained data after such a copy, the application must allocate the contained -data in unified shared memory (USM) that is accessible on both the host and -device (or on both devices in the case of a device-to-device copy). +underlying data that is not contained within their type. +Although these view types are device copyable, the implementation copies just +the view and not the contained data when doing an inter-device copy. +In order to reference the contained data after such a copy, the application must +allocate the contained data in unified shared memory (USM) that is accessible on +both the host and device (or on both devices in the case of a device-to-device +copy). ==== In addition, the implementation may allow the application to explicitly declare -certain class types as device copyable. If the implementation has this support, -it must predefine the preprocessor macro [code]#SYCL_DEVICE_COPYABLE# to -[code]#1#, and it must not predefine this preprocessor macro if it does not -have this support. When the implementation has this support, a class type -[code]#T# is device copyable if all of the following statements are true: +certain class types as device copyable. +If the implementation has this support, it must predefine the preprocessor macro +[code]#SYCL_DEVICE_COPYABLE# to [code]#1#, and it must not predefine this +preprocessor macro if it does not have this support. +When the implementation has this support, a class type [code]#T# is device +copyable if all of the following statements are true: * The application defines the trait [code]#is_device_copyable_v# to [code]#true#; - * Type [code]#T# has at least one eligible copy constructor, move - constructor, copy assignment operator, or move assignment operator; + * Type [code]#T# has at least one eligible copy constructor, move constructor, + copy assignment operator, or move assignment operator; * Each eligible copy constructor, move constructor, copy assignment operator, and move assignment operator is [code]#public#; * When doing an inter-device transfer of an object of type [code]#T#, the @@ -1869,8 +1955,9 @@ copyable, and the implementation sets the [code]#is_device_copyable_v# trait to It is unspecified whether the implementation actually calls the copy constructor, move constructor, copy assignment operator, or move assignment operator of a class declared as [code]#is_device_copyable_v# when doing an -inter-device copy. Since these operations must all be the same as a bitwise -copy, the implementation may simply copy the memory where the object resides. +inter-device copy. +Since these operations must all be the same as a bitwise copy, the +implementation may simply copy the memory where the object resides. Likewise, it is unspecified whether the implementation actually calls the destructor for such a class on the device since the destructor must have no effect on the device. @@ -1879,9 +1966,10 @@ effect on the device. == Endianness support -SYCL does not mandate any particular byte order, but the byte order of the -host always matches the byte order of the devices. This allows data to be -copied between the host and the devices without any byte swapping. +SYCL does not mandate any particular byte order, but the byte order of the host +always matches the byte order of the devices. +This allows data to be copied between the host and the devices without any byte +swapping. == Example SYCL application diff --git a/adoc/chapters/copyright-spec.adoc b/adoc/chapters/copyright-spec.adoc index 2a679aa9..5fe3423a 100644 --- a/adoc/chapters/copyright-spec.adoc +++ b/adoc/chapters/copyright-spec.adoc @@ -1,35 +1,35 @@ Copyright (c) 2011-2023 The Khronos Group, Inc. This Specification is protected by copyright laws and contains material -proprietary to Khronos. Except as described by these terms, it or any -components may not be reproduced, republished, distributed, transmitted, -displayed, broadcast or otherwise exploited in any manner without the -express prior written permission of Khronos. +proprietary to Khronos. +Except as described by these terms, it or any components may not be reproduced, +republished, distributed, transmitted, displayed, broadcast or otherwise +exploited in any manner without the express prior written permission of Khronos. Khronos grants a conditional copyright license to use and reproduce the unmodified Specification for any purpose, without fee or royalty, EXCEPT no licenses to any patent, trademark or other intellectual property rights are granted under these terms. -Khronos makes no, and expressly disclaims any, representations or -warranties, express or implied, regarding this Specification, including, -without limitation: merchantability, fitness for a particular purpose, -non-infringement of any intellectual property, correctness, accuracy, -completeness, timeliness, and reliability. -Under no circumstances will Khronos, or any of its Promoters, Contributors -or Members, or their respective partners, officers, directors, employees, -agents or representatives be liable for any damages, whether direct, -indirect, special or consequential damages for lost revenues, lost profits, -or otherwise, arising from or in connection with these materials. +Khronos makes no, and expressly disclaims any, representations or warranties, +express or implied, regarding this Specification, including, without limitation: +merchantability, fitness for a particular purpose, non-infringement of any +intellectual property, correctness, accuracy, completeness, timeliness, and +reliability. +Under no circumstances will Khronos, or any of its Promoters, Contributors or +Members, or their respective partners, officers, directors, employees, agents or +representatives be liable for any damages, whether direct, indirect, special or +consequential damages for lost revenues, lost profits, or otherwise, arising +from or in connection with these materials. This Specification has been created under the Khronos Intellectual Property -Rights Policy, which is Attachment A of the Khronos Group Membership -Agreement available at https://www.khronos.org/files/member_agreement.pdf, and which +Rights Policy, which is Attachment A of the Khronos Group Membership Agreement +available at https://www.khronos.org/files/member_agreement.pdf, and which defines the terms 'Scope', 'Compliant Portion', and 'Necessary Patent Claims'. -Parties desiring to implement the Specification and make use of Khronos trademarks -in relation to that implementation, and receive reciprocal patent license protection -under the Khronos Intellectual Property Rights Policy must become Adopters and -confirm the implementation as conformant under the process defined by Khronos for -this Specification; see https://www.khronos.org/adopters. +Parties desiring to implement the Specification and make use of Khronos +trademarks in relation to that implementation, and receive reciprocal patent +license protection under the Khronos Intellectual Property Rights Policy must +become Adopters and confirm the implementation as conformant under the process +defined by Khronos for this Specification; see https://www.khronos.org/adopters. Some parts of this Specification are purely informative and so are EXCLUDED from the Scope of this Specification. @@ -39,42 +39,43 @@ the Scope of this Specification. // The <> section of the // <> defines how these parts of the Specification are identified. -Where this Specification uses technical -terminology, defined in the <> or otherwise, that refer to -enabling technologies that are not expressly set forth in this -Specification, those enabling technologies are EXCLUDED from the Scope of -this Specification. +Where this Specification uses technical terminology, defined in the <> +or otherwise, that refer to enabling technologies that are not expressly set +forth in this Specification, those enabling technologies are EXCLUDED from the +Scope of this Specification. For clarity, enabling technologies not disclosed with particularity in this Specification (e.g. semiconductor manufacturing technology, hardware -architecture, processor architecture or microarchitecture, memory -architecture, compiler technology, object oriented technology, basic -operating system technology, compression technology, algorithms, and so on) -are NOT to be considered expressly set forth; only those application program -interfaces and data structures disclosed with particularity are included in -the Scope of this Specification. +architecture, processor architecture or microarchitecture, memory architecture, +compiler technology, object oriented technology, basic operating system +technology, compression technology, algorithms, and so on) are NOT to be +considered expressly set forth; only those application program interfaces and +data structures disclosed with particularity are included in the Scope of this +Specification. -For purposes of the Khronos Intellectual Property Rights Policy as it relates -to the definition of Necessary Patent Claims, all recommended or optional -features, behaviors and functionality set forth in this Specification, if -implemented, are considered to be included as Compliant Portions. +For purposes of the Khronos Intellectual Property Rights Policy as it relates to +the definition of Necessary Patent Claims, all recommended or optional features, +behaviors and functionality set forth in this Specification, if implemented, are +considered to be included as Compliant Portions. -Where this Specification includes -normative references to external documents, only the specifically -identified sections of those external documents are INCLUDED in the Scope of -this Specification. If not created by Khronos, those external documents may -contain contributions from non-members of Khronos not covered by the Khronos -Intellectual Property Rights Policy. +Where this Specification includes normative references to external documents, +only the specifically identified sections of those external documents are +INCLUDED in the Scope of this Specification. +If not created by Khronos, those external documents may contain contributions +from non-members of Khronos not covered by the Khronos Intellectual Property +Rights Policy. ifndef::ratified_core_spec[] -This document contains extensions which are not ratified by Khronos, and as -such is not a ratified Specification, though it contains text from (and is a -superset of) the ratified SYCL Specification. The ratified version of the -SYCL Specification can be found at +This document contains extensions which are not ratified by Khronos, and as such +is not a ratified Specification, though it contains text from (and is a superset +of) the ratified SYCL Specification. +The ratified version of the SYCL Specification can be found at https://www.khronos.org/registry/SYCL . endif::ratified_core_spec[] -Khronos and Vulkan are registered trademarks, and SPIR-V is a trademark of -The Khronos Group Inc. OpenCL is a trademark of Apple Inc. and OpenGL is a -registered trademarks of Hewlett Packard Enterprise, all used under license -by Khronos. All other product names, trademarks, and/or company names are -used solely for identification and belong to their respective owners. +Khronos and Vulkan are registered trademarks, and SPIR-V is a trademark of The +Khronos Group Inc. +OpenCL is a trademark of Apple Inc. +and OpenGL is a registered trademarks of Hewlett Packard Enterprise, all used +under license by Khronos. +All other product names, trademarks, and/or company names are used solely for +identification and belong to their respective owners. diff --git a/adoc/chapters/device_compiler.adoc b/adoc/chapters/device_compiler.adoc index 17a0a8ee..ac13cfaa 100644 --- a/adoc/chapters/device_compiler.adoc +++ b/adoc/chapters/device_compiler.adoc @@ -5,103 +5,106 @@ This section specifies the requirements of the SYCL device compiler. Most features described in this section relate to underlying <> -capabilities of target devices and limiting the requirements of device -code to ensure portability. +capabilities of target devices and limiting the requirements of device code to +ensure portability. == Offline compilation of SYCL source files There are two alternatives for a SYCL <>: a -[keyword]#single-source device compiler# and a device compiler that supports -the technique of <>. +[keyword]#single-source device compiler# and a device compiler that supports the +technique of <>. A SYCL device compiler takes in a {cpp} source file, extracts only the SYCL kernels and outputs the device code in a form that can be enqueued from host -code by the associated <>. How the <> -invokes the kernels is implementation-defined, but a typical approach is for -a device compiler to produce a header file with the compiled kernel -contained within it. By providing a command-line option to the host -compiler, it would cause the implementation's SYCL header files to -[code]#{hash}include# the generated header file. The SYCL specification has -been written to allow this as an implementation approach in order to allow -<>. However, any of the mechanisms needed from the SYCL compiler, -the <> and build system are implementation-defined, as they -can vary depending on the platform and approach. +code by the associated <>. +How the <> invokes the kernels is implementation-defined, but a +typical approach is for a device compiler to produce a header file with the +compiled kernel contained within it. +By providing a command-line option to the host compiler, it would cause the +implementation's SYCL header files to [code]#{hash}include# the generated header +file. +The SYCL specification has been written to allow this as an implementation +approach in order to allow <>. +However, any of the mechanisms needed from the SYCL compiler, the +<> and build system are implementation-defined, as they can vary +depending on the platform and approach. A SYCL single-source device compiler takes in a {cpp} source file and compiles -both host and device code at the same time. This specification specifies how -a SYCL single-source device compiler sees and outputs device code for kernels, -but does not specify the host compilation. +both host and device code at the same time. +This specification specifies how a SYCL single-source device compiler sees and +outputs device code for kernels, but does not specify the host compilation. [[sec:naming.kernels]] == Naming of kernels SYCL kernels are extracted from {cpp} source files and stored in an -implementation-defined format. In the case of the shared-source compilation model, the kernels -have to be uniquely identified by both host and device compiler. This is -required in order for the host runtime to be able to load the kernel by using -a backend-specific host runtime interface. +implementation-defined format. +In the case of the shared-source compilation model, the kernels have to be +uniquely identified by both host and device compiler. +This is required in order for the host runtime to be able to load the kernel by +using a backend-specific host runtime interface. From this requirement the following rules apply for naming the kernels: * The kernel name is a [keyword]#{cpp} typename#. - * The kernel name must be forward declarable at namespace scope - (including global namespace scope) and may not be forward declared other - than at namespace scope. If it isn't forward declared - but is specified as a template argument in a kernel invoking interface, - as described in <>, then it may not conflict - with a name in any enclosing namespace scope. + * The kernel name must be forward declarable at namespace scope (including + global namespace scope) and may not be forward declared other than at + namespace scope. + If it isn't forward declared but is specified as a template argument in a + kernel invoking interface, as described in <>, then + it may not conflict with a name in any enclosing namespace scope. [NOTE] ==== -The requirement that a kernel name be forward declarable makes some types -for kernel names illegal, such as anything declared in the [code]#std# -namespace (adding a declaration to namespace [code]#std# leads to undefined -behavior). +The requirement that a kernel name be forward declarable makes some types for +kernel names illegal, such as anything declared in the [code]#std# namespace +(adding a declaration to namespace [code]#std# leads to undefined behavior). ==== - * If the kernel is defined as a named function object type, the name can - be the typename of the function object as long as it is either declared - at namespace scope, or does not conflict with any name in an enclosing + * If the kernel is defined as a named function object type, the name can be + the typename of the function object as long as it is either declared at + namespace scope, or does not conflict with any name in an enclosing namespace scope. - * If the kernel is defined as a lambda, a typename can optionally be - provided to the kernel invoking interface as described - in <>, so that the developer can control the - kernel name for purposes such as debugging or referring to the kernel - when applying build options. + * If the kernel is defined as a lambda, a typename can optionally be provided + to the kernel invoking interface as described in <>, + so that the developer can control the kernel name for purposes such as + debugging or referring to the kernel when applying build options. * If a kernel function relies on template parameters, then those template - parameters must be contained by the kernel name. If such a kernel name - is specified as a template argument in a kernel invoking interface, then - the template parameters on which the kernel depends must be forward - declarable at namespace scope. - -In both single-source and shared-source implementations, a device compiler should -detect the kernel invocations (e.g. [code]#parallel_for)# -in the source code and compile the enclosed kernels, storing them with their + parameters must be contained by the kernel name. + If such a kernel name is specified as a template argument in a kernel + invoking interface, then the template parameters on which the kernel depends + must be forward declarable at namespace scope. + +In both single-source and shared-source implementations, a device compiler +should detect the kernel invocations (e.g. [code]#parallel_for)# in +the source code and compile the enclosed kernels, storing them with their associated type name. The format of the kernel and the compilation techniques are details of an -implementation and not specified. The interface between the compiler and the -runtime for extracting and executing SYCL kernels on the device is a detail of -an implementation and not specified. +implementation and not specified. +The interface between the compiler and the runtime for extracting and executing +SYCL kernels on the device is a detail of an implementation and not specified. == Compilation of functions The SYCL device compiler parses an entire {cpp} source file supplied by the user, including any header files referenced via [code]#{hash}include# -directives. From this source file, the SYCL device compiler must compile -kernels for the device, as well as any functions that the kernels call. +directives. +From this source file, the SYCL device compiler must compile kernels for the +device, as well as any functions that the kernels call. The device compiler identifies kernels by looking for calls to -<> such as [code]#parallel_for#. One of -the parameters is a function object which is known as a -<>, and this function must always return -[code]#void#. Any function called by the <> is -also compiled for the device, and these functions together with the -<> are known as <>. The device -compiler searches recursively for any functions called from a +<> such as +[code]#parallel_for#. +One of the parameters is a function object which is known as a +<>, and this function must always return [code]#void#. +Any function called by the <> is also compiled for the +device, and these functions together with the <> are known as <>. +The device compiler searches recursively for any functions called from a <>, and these functions are also compiled for the device and known as <>. @@ -129,96 +132,93 @@ void h() { } ---- -In order for the SYCL device compiler to correctly compile <>, all -functions in the source file, whether <> or not, must be -syntactically correct functions according to this specification. A syntactically -correct function adheres to at least the minimum required {cpp} version -defined in <>. +In order for the SYCL device compiler to correctly compile <>, all functions in the source file, whether <> or not, must be syntactically correct functions according to +this specification. +A syntactically correct function adheres to at least the minimum required {cpp} +version defined in <>. [[sec:language.restrictions.kernels]] == Language restrictions for device functions -<> must abide by certain restrictions. The full set of -{cpp} features are not available to these functions. Following is a list of -these restrictions: - - * Pointers and objects containing pointers may be shared. However, when a pointer is - passed between SYCL devices or between the host and a SYCL device, - dereferencing that pointer on the device produces undefined behavior unless - the device supports <> and the pointer is an address within a - <> memory region (see <>). - * Memory storage allocation is not allowed in kernels. All memory allocation - for the device is done on the host using accessor classes or using - <> as explained in <>. - Consequently, the default allocation [code]#operator new# overloads - that allocate storage are disallowed in a SYCL kernel. The placement - [code]#new# operator and any user-defined overloads that do not - allocate storage are permitted. - * Kernel functions must always have a [code]#void# return type. A - kernel lambda trailing-return-type that is not [code]#void# is - therefore illegal, as is a return statement (that would return from the - kernel function) with an expression that does not convert to - [code]#void#. - * The odr-use of polymorphic classes and classes with virtual - inheritance is allowed. However, no virtual member functions are allowed - to be called in a <>. +<> must abide by certain restrictions. +The full set of {cpp} features are not available to these functions. +Following is a list of these restrictions: + + * Pointers and objects containing pointers may be shared. + However, when a pointer is passed between SYCL devices or between the host + and a SYCL device, dereferencing that pointer on the device produces + undefined behavior unless the device supports <> and the pointer is an + address within a <> memory region (see <>). + * Memory storage allocation is not allowed in kernels. + All memory allocation for the device is done on the host using accessor + classes or using <> as explained in <>. + Consequently, the default allocation [code]#operator new# overloads that + allocate storage are disallowed in a SYCL kernel. + The placement [code]#new# operator and any user-defined overloads that do + not allocate storage are permitted. + * Kernel functions must always have a [code]#void# return type. + A kernel lambda trailing-return-type that is not [code]#void# is therefore + illegal, as is a return statement (that would return from the kernel + function) with an expression that does not convert to [code]#void#. + * The odr-use of polymorphic classes and classes with virtual inheritance is + allowed. + However, no virtual member functions are allowed to be called in a + <>. * No function pointers or references are allowed to be called in a <>. * RTTI is disabled inside <>. - * No variadic functions are allowed to be called in a - <>. - * Exception-handling cannot be used inside a - <>. + * No variadic functions are allowed to be called in a <>. + * Exception-handling cannot be used inside a <>. [code]#noexcept# is allowed. * Recursion is not allowed in a <>. - * Variables with thread storage duration ([code]#thread_local# - storage class specifier) are not allowed to be odr-used in a - <>. + * Variables with thread storage duration ([code]#thread_local# storage class + specifier) are not allowed to be odr-used in a <>. * Variables with static storage duration that are odr-used inside a - <>, must be either [code]#const# - or [code]#constexpr#, and must also be either - zero-initialized or constant-initialized. + <>, must be either [code]#const# or [code]#constexpr#, and + must also be either zero-initialized or constant-initialized. [NOTE] ==== Amongst other things, this restriction makes it illegal for a -<> to access a global variable that isn't [code]#const# -or [code]#constexpr#. +<> to access a global variable that isn't [code]#const# or +[code]#constexpr#. ==== - * The rules for kernels apply to both the kernel function objects - themselves and all functions, operators, member functions, constructors - and destructors called by the kernel. This means that kernels can only - use library functions that have been adapted to work with SYCL. - Implementations are not required to support any library routines in - kernels beyond those explicitly mentioned as usable in kernels in this - spec. Developers should refer to the SYCL built-in functions - in <> to find functions that are specified to be usable - in kernels. + * The rules for kernels apply to both the kernel function objects themselves + and all functions, operators, member functions, constructors and destructors + called by the kernel. + This means that kernels can only use library functions that have been + adapted to work with SYCL. + Implementations are not required to support any library routines in kernels + beyond those explicitly mentioned as usable in kernels in this spec. + Developers should refer to the SYCL built-in functions in <> + to find functions that are specified to be usable in kernels. * Interacting with a special <> class (e.g. SYCL - [code]#accessor# or [code]#stream#) that is stored within a {cpp} union - is undefined behavior. - * Any variable or function that is odr-used from a <> must - be defined in the same translation unit as that use. However, a function - may be defined in another translation unit if the implementation defines - the [code]#SYCL_EXTERNAL# macro as described in <>. + [code]#accessor# or [code]#stream#) that is stored within a {cpp} union is + undefined behavior. + * Any variable or function that is odr-used from a <> must be + defined in the same translation unit as that use. + However, a function may be defined in another translation unit if the + implementation defines the [code]#SYCL_EXTERNAL# macro as described in + <>. [[subsec:scalartypes]] == Built-in scalar data types In a SYCL device compiler, the device definition of all standard {cpp} -fundamental types from <> must match the -host definition of those types, in both size and alignment. A device -compiler may have this preconfigured so that it can match them based on the -definitions of those types on the platform, or there may be a necessity for +fundamental types from <> must match the host +definition of those types, in both size and alignment. +A device compiler may have this preconfigured so that it can match them based on +the definitions of those types on the platform, or there may be a necessity for a device compiler command-line option to ensure the types are the same. -The standard {cpp} fixed width types, e.g. [code]#int8_t#, -[code]#int16_t#, [code]#int32_t#,[code]#int64_t#, -should have the same size as defined by the {cpp} standard for host and -device. +The standard {cpp} fixed width types, e.g. [code]#int8_t#, [code]#int16_t#, +[code]#int32_t#,[code]#int64_t#, should have the same size as defined by the +{cpp} standard for host and device. [[table.types.fundamental]] @@ -339,31 +339,32 @@ The standard {cpp} preprocessing directives and macros are supported. The following preprocessor macros must be defined by all conformant implementations: - * [code]#SYCL_LANGUAGE_VERSION# substitutes an integer reflecting - the version number and revision of the SYCL language being supported - by the implementation. The version of SYCL defined in this document - will have [code]#SYCL_LANGUAGE_VERSION# substitute the integer + * [code]#SYCL_LANGUAGE_VERSION# substitutes an integer reflecting the version + number and revision of the SYCL language being supported by the + implementation. + The version of SYCL defined in this document will have + [code]#SYCL_LANGUAGE_VERSION# substitute the integer [code]#{SYCL_LANGUAGE_VERSION}#, composed with the general SYCL version followed by 2 digits representing the revision number; - * [code]#SYCL_DEVICE_COPYABLE# is defined to 1 if the implementation - supports explicitly specified <> types as described in - <>. Otherwise, the implementation's definition of - device copyable falls back to {cpp} trivially copyable and - [code]#sycl::is_device_copyable# is ignored; - * [code]#+__SYCL_DEVICE_ONLY__+# is defined to 1 if the source file is - being compiled with a SYCL device compiler which does not produce host - binary; - * [code]#+__SYCL_SINGLE_SOURCE__+# is defined to 1 if the source file - is being compiled with a SYCL single-source compiler which produces host - as well as device binary; + * [code]#SYCL_DEVICE_COPYABLE# is defined to 1 if the implementation supports + explicitly specified <> types as described in + <>. + Otherwise, the implementation's definition of device copyable falls back to + {cpp} trivially copyable and [code]#sycl::is_device_copyable# is ignored; + * [code]#+__SYCL_DEVICE_ONLY__+# is defined to 1 if the source file is being + compiled with a SYCL device compiler which does not produce host binary; + * [code]#+__SYCL_SINGLE_SOURCE__+# is defined to 1 if the source file is being + compiled with a SYCL single-source compiler which produces host as well as + device binary; * [code]#SYCL_FEATURE_SET_FULL# is defined to 1 if the SYCL implementation - supports the full feature set and is not defined otherwise. For more details - see <>; + supports the full feature set and is not defined otherwise. + For more details see <>; * [code]#SYCL_FEATURE_SET_REDUCED# is defined to 1 if the SYCL implementation supports the reduced feature set and not the full feature set, otherwise it - is not defined. For more details see <>; - * [code]#SYCL_EXTERNAL# is an optional macro which enables external - linkage of SYCL functions and member functions to be included in a SYCL kernel. + is not defined. + For more details see <>; + * [code]#SYCL_EXTERNAL# is an optional macro which enables external linkage of + SYCL functions and member functions to be included in a SYCL kernel. The macro is only defined if the implementation supports external linkage. For more details see <>. @@ -375,10 +376,10 @@ in <> must be defined by all conformant implementations. == Optional kernel features A number of kernel features defined by this SYCL specification are optional; -they may be supported on some devices but not on other devices. As described -in <>, an application can test whether a device supports -these features by testing whether the device has an associated aspect. The -following aspects are those that correspond to optional kernel features: +they may be supported on some devices but not on other devices. +As described in <>, an application can test whether a device +supports these features by testing whether the device has an associated aspect. +The following aspects are those that correspond to optional kernel features: * [code]#fp16# * [code]#fp64# @@ -398,12 +399,12 @@ implementation supports the features on any of its devices. Of course, applications that make use of optional kernel features should ensure that a kernel using such a feature is submitted only to a device that supports -the feature. If the application submits a <> using a secondary -queue, then any kernel submitted from the <> should use only -features that are supported by both the primary queue's device and the -secondary queue's device. If an application fails to do this, the -implementation must throw a synchronous exception with the -[code]#errc::kernel_not_supported# error code from the +the feature. +If the application submits a <> using a secondary queue, then any +kernel submitted from the <> should use only features that are +supported by both the primary queue's device and the secondary queue's device. +If an application fails to do this, the implementation must throw a synchronous +exception with the [code]#errc::kernel_not_supported# error code from the <> (e.g. [code]#parallel_for()#). It is legal for a SYCL application to define several kernels in the same @@ -417,21 +418,23 @@ include::{code_dir}/twoOptionalFeatures.cpp[lines=4..-1] An implementation may not raise a compile time diagnostic or a run time exception merely due to speculative compilation of a kernel for a device when -the application does not actually submit the kernel to that device. To -illustrate using the example above, assume that device [code]#dev1# does not +the application does not actually submit the kernel to that device. +To illustrate using the example above, assume that device [code]#dev1# does not have [code]#aspect::atomic64# and device [code]#dev2# doe not have -[code]#aspect::fp16#. An implementation cannot raise a diagnostic due to -compilation of [code]#KernelA# for device [code]#dev2# or for compilation of -[code]#KernelB# for device [code]#dev1# because the application does not submit -these kernels to those devices. +[code]#aspect::fp16#. +An implementation cannot raise a diagnostic due to compilation of +[code]#KernelA# for device [code]#dev2# or for compilation of [code]#KernelB# +for device [code]#dev1# because the application does not submit these kernels to +those devices. [NOTE] ==== It is expected that this requirement will have an impact on the way an -implementation bundles kernels into device images. For example, naively -bundling [code]#KernelA# and [code]#KernelB# into the same device image could -run afoul of this requirement if the implementation compiles the entire device -image when [code]#KernelA# is submitted to device [code]#dev1#. +implementation bundles kernels into device images. +For example, naively bundling [code]#KernelA# and [code]#KernelB# into the same +device image could run afoul of this requirement if the implementation compiles +the entire device image when [code]#KernelA# is submitted to device +[code]#dev1#. ==== @@ -439,24 +442,27 @@ image when [code]#KernelA# is submitted to device [code]#dev1#. == Attributes for device code {cpp} attributes may be used to decorate kernels and device functions in order -to influence the code generated by the device compiler. These attributes are -all defined in the [code]#+[[sycl::]]+# namespace. +to influence the code generated by the device compiler. +These attributes are all defined in the [code]#+[[sycl::]]+# namespace. If one of the attributes defined in this section is applied to a kernel or -device function, it must be applied to the first declaration of that kernel -or device function in the translation unit. Programs which fail to do this are -ill formed and the compiler must issue a diagnostic. Redeclarations of the -kernel or device function in the same translation unit may optionally have the -same attribute applied (so long as the attribute arguments are the same between -the declarations), but this is not required. The attribute remains in effect -regardless of whether it appears in the redeclaration. +device function, it must be applied to the first declaration of that kernel or +device function in the translation unit. +Programs which fail to do this are ill formed and the compiler must issue a +diagnostic. +Redeclarations of the kernel or device function in the same translation unit may +optionally have the same attribute applied (so long as the attribute arguments +are the same between the declarations), but this is not required. +The attribute remains in effect regardless of whether it appears in the +redeclaration. Unless an attribute's description specifically allows it, a kernel or device function may not be declared with the more than one instance of the same -attribute unless all instances have the same attribute arguments. The compiler -must issue a diagnostic for programs which violate this requirement. When two -or more instances of the same attribute appear on the declaration of a kernel -or device function, the effect is as though a single instance appeared +attribute unless all instances have the same attribute arguments. +The compiler must issue a diagnostic for programs which violate this +requirement. +When two or more instances of the same attribute appear on the declaration of a +kernel or device function, the effect is as though a single instance appeared (assuming that all instances have the same attribute arguments). If a kernel or device function is declared with an attribute in one translation @@ -465,8 +471,8 @@ attribute (and its same attribute arguments) in another translation unit, the program is ill formed and no diagnostic is required. If any of these attributes are applied to a device function that is also -compiled for the host, they have no effect when the function is compiled for -the host. +compiled for the host, they have no effect when the function is compiled for the +host. Applying these attributes to any language construct other than those specified in this section has implementation-defined effect. @@ -477,23 +483,26 @@ in this section has implementation-defined effect. The attributes listed in <> have a different position depending on whether the kernel is defined as a lambda function or as a named -function object. If the kernel is a named function object, the attribute is -applied to the declarator-id in the function declaration. However, if the -kernel is a lambda function, the attribute is applied to the lambda declarator. +function object. +If the kernel is a named function object, the attribute is applied to the +declarator-id in the function declaration. +However, if the kernel is a lambda function, the attribute is applied to the +lambda declarator. [NOTE] ==== The reason for the different positions is because the {cpp} core language does not currently define a position for attributes to appertain to the lambda's corresponding function operator or operator template, only to the corresponding -_type_ of the function operator or operator template. This is expected to be -remedied in a future version of the {cpp} core language specification. +_type_ of the function operator or operator template. +This is expected to be remedied in a future version of the {cpp} core language +specification. ==== The example below demonstrates these attribute positions using the -[code]#[[sycl::reqd_work_group_size(16)]]# attribute. Note that the {cpp} core -language allows two possible positions for kernels that are defined as a named -function object. +[code]#[[sycl::reqd_work_group_size(16)]]# attribute. +Note that the {cpp} core language allows two possible positions for kernels that +are defined as a named function object. [source,,linenums] ---- @@ -620,10 +629,10 @@ include::{code_dir}/deviceHas.cpp[lines=4..-1] === Device function attributes -The attributes in <> are applied to the declaration -of a non-kernel device function. The position of the attribute is the same -as for the kernel function attributes defined above in -<>. +The attributes in <> are applied to the declaration of +a non-kernel device function. +The position of the attribute is the same as for the kernel function attributes +defined above in <>. [[table.device.attributes]] .Attributes for non-kernel device functions @@ -667,54 +676,57 @@ associated with an aspect that is not listed in the attribute. == Address-space deduction -{cpp} has no type-level support to represent address spaces. As a consequence, -the SYCL generic programming model does not directly affect the {cpp} type of -unannotated pointers and references. +{cpp} has no type-level support to represent address spaces. +As a consequence, the SYCL generic programming model does not directly affect +the {cpp} type of unannotated pointers and references. -Source level guarantees about address spaces in the SYCL generic -programming model can only be achieved using pointer classes (instances of -[code]#multi_ptr#), which are regular classes that represent pointers -to data stored in the corresponding address spaces. +Source level guarantees about address spaces in the SYCL generic programming +model can only be achieved using pointer classes (instances of +[code]#multi_ptr#), which are regular classes that represent pointers to data +stored in the corresponding address spaces. In SYCL, the address space of pointer and references are derived from: - * Accessors that give access to shared data. They can be bound to a memory - object in a command group and passed into a kernel. Accessors are used - in scheduling of kernels to define ordering. Accessors to buffers have a - compile-time address space based on their access mode. - * Explicit pointer classes (e.g. [code]#global_ptr#) holds a pointer - which is known to be addressing the address space represented by the - [code]#access::address_space#. This allows the compiler to - determine whether the pointer references global, local, constant or - private memory and generate code accordingly. - * Raw {cpp} pointer and reference types (e.g. [code]#int*#) are allowed - within SYCL kernels. They can be constructed from the address of local - variables, explicit pointer classes, or accessors. + * Accessors that give access to shared data. + They can be bound to a memory object in a command group and passed into a + kernel. + Accessors are used in scheduling of kernels to define ordering. + Accessors to buffers have a compile-time address space based on their access + mode. + * Explicit pointer classes (e.g. [code]#global_ptr#) holds a pointer which is + known to be addressing the address space represented by the + [code]#access::address_space#. + This allows the compiler to determine whether the pointer references global, + local, constant or private memory and generate code accordingly. + * Raw {cpp} pointer and reference types (e.g. [code]#int*#) are allowed within + SYCL kernels. + They can be constructed from the address of local variables, explicit + pointer classes, or accessors. [[subsec:addrspaceAssignment]] === Address space assignment -In order to understand where data lives, the device compiler is -expected to assign address spaces while lowering types for the -underlying target based on the context. Depending on the <> -and mode, address space deducing rules differ slightly. +In order to understand where data lives, the device compiler is expected to +assign address spaces while lowering types for the underlying target based on +the context. +Depending on the <> and mode, address space deducing +rules differ slightly. -If the target of the SYCL backend can represent the generic address space, -then the "common address space deduction rules" in -<> and the "generic as default address space rules" -in <> apply. If the target of the SYCL backend -cannot represent the generic address space, then the "common address space -deduction rules" in <> and the "inferred address -space rules" in <> apply. +If the target of the SYCL backend can represent the generic address space, then +the "common address space deduction rules" in <> and +the "generic as default address space rules" in <> +apply. +If the target of the SYCL backend cannot represent the generic address space, +then the "common address space deduction rules" in <> +and the "inferred address space rules" in <> apply. [NOTE] ==== -SYCL address space does not affect the type, address space shall be -understood as memory segment in which data is allocated. For -instance, if [code]#int i;# is allocated to the global address -space, then [code]#decltype(&i)# shall evaluate to -[code]#int*#. +SYCL address space does not affect the type, address space shall be understood +as memory segment in which data is allocated. +For instance, if [code]#int i;# is allocated to the global address space, then +[code]#decltype(&i)# shall evaluate to [code]#int*#. ==== @@ -725,40 +737,41 @@ The variable declarations get assigned to an address space depending on their scope and storage class: * Namespace scope - ** If the type is [code]#const#, the address space the declaration is assigned to - is implementation-defined. If the target of the SYCL backend can represent the - generic address space, then the assigned address space must be compatible with - the generic address space. + ** If the type is [code]#const#, the address space the declaration is assigned + to is implementation-defined. + If the target of the SYCL backend can represent the generic address space, + then the assigned address space must be compatible with the generic address + space. [NOTE] ==== Namespace scope non-[code]#const# declarations cannot be used within a kernel, -as restricted in <>. This means that -non-[code]#const# global variables cannot be accessed by any device kernel or -code called by the device kernel. +as restricted in <>. +This means that non-[code]#const# global variables cannot be accessed by any +device kernel or code called by the device kernel. ==== * Block scope and function parameter scope - ** Declarations with static storage duration are treated the same way as variables - in namespace scope + ** Declarations with static storage duration are treated the same way as + variables in namespace scope ** Otherwise the declaration is assigned to the local address space if declared in a hierarchical context ** Otherwise the declaration is assigned to the private address space * Class scope - ** Static data members are treated the same way as for variable in - namespace scope + ** Static data members are treated the same way as for variable in namespace + scope -The result of a prvalue-to-xvalue conversion is assigned to the local -address space if it happens in a hierarchical context or to the private -address space otherwise. +The result of a prvalue-to-xvalue conversion is assigned to the local address +space if it happens in a hierarchical context or to the private address space +otherwise. [[subsec:genericAddressSpace]] === Generic as default address space -For SYCL backends that can represent the generic address space -(see <>), unannotated pointers and -references are considered to be pointing to the generic address space. +For SYCL backends that can represent the generic address space (see +<>), unannotated pointers and references are +considered to be pointing to the generic address space. [[subsec:inferredAddressSpace]] @@ -767,17 +780,16 @@ references are considered to be pointing to the generic address space. [NOTE] .Note for this version ==== -The address space deduction feature described next is inherited from -the SYCL 1.2.1 specifications. This section will be changed in a future version -to better align with addition of generic address space and generic -as default address space. +The address space deduction feature described next is inherited from the SYCL +1.2.1 specifications. +This section will be changed in a future version to better align with addition +of generic address space and generic as default address space. ==== -For SYCL backends that cannot represent the generic address space -(see <>), inside kernels the SYCL device -compiler will need to auto-deduce the memory region -of unannotated pointer and reference types during the lowering of types -from {cpp} to the underlying representation. +For SYCL backends that cannot represent the generic address space (see +<>), inside kernels the SYCL device compiler will +need to auto-deduce the memory region of unannotated pointer and reference types +during the lowering of types from {cpp} to the underlying representation. If a kernel function or device function contains a pointer or reference type, then the address space deduction must be attempted using the following rules: @@ -786,29 +798,30 @@ then the address space deduction must be attempted using the following rules: the {cpp} pointer value will point to same address space as the one represented by the explicit pointer class. * If a variable is declared as a pointer type, but initialized in its - declaration to a pointer value with an already-deduced address space, - then that variable will have the same address space as its initializer. - * If a function parameter is declared as a pointer type, and the argument - is a pointer value with a deduced address space, then the function will - be compiled as if the parameter had the same address space as its - argument. It is legal for a function to be called in different places - with different address spaces for its arguments: in this case the - function is said to be "`duplicated`" and compiled multiple times. Each - duplicated instance of the function must compile legally in order to + declaration to a pointer value with an already-deduced address space, then + that variable will have the same address space as its initializer. + * If a function parameter is declared as a pointer type, and the argument is a + pointer value with a deduced address space, then the function will be + compiled as if the parameter had the same address space as its argument. + It is legal for a function to be called in different places with different + address spaces for its arguments: in this case the function is said to be + "`duplicated`" and compiled multiple times. + Each duplicated instance of the function must compile legally in order to have defined behavior. * If a function return type is declared as a pointer type and return - statements use address space deduced expressions, then the function will - be compiled as if the return type had the same address space. To compile - legally, all return expressions must deduce to the same address space. - * The rules for pointer types also apply to reference types. i.e. a - reference variable takes its address space from its initializer. A - function with a reference parameter takes its address space from its + statements use address space deduced expressions, then the function will be + compiled as if the return type had the same address space. + To compile legally, all return expressions must deduce to the same address + space. + * The rules for pointer types also apply to reference types. + i.e. a reference variable takes its address space from its initializer. + A function with a reference parameter takes its address space from its argument. - * If no other rule above can be applied to a declaration of a pointer, - then it is assumed to be in the private address space. + * If no other rule above can be applied to a declaration of a pointer, then it + is assumed to be in the private address space. -It is illegal to assign a pointer value addressing one address space to a pointer -variable addressing a different address space. +It is illegal to assign a pointer value addressing one address space to a +pointer variable addressing a different address space. == SYCL offline linking @@ -820,8 +833,9 @@ variable addressing a different address space. === SYCL functions and member functions linkage By default, any function that is odr-used from a <> must be -defined in the same translation unit as that use. However, this restriction is -relaxed if both of the following conditions are met: +defined in the same translation unit as that use. +However, this restriction is relaxed if both of the following conditions are +met: * The implementation defines the [code]#SYCL_EXTERNAL# macro; * The translation unit that calls the function declares the function with @@ -840,15 +854,15 @@ A function may only be declared with [code]#SYCL_EXTERNAL# if it has external linkage by normal C++ rules. A function declared with [code]#SYCL_EXTERNAL# may be called from both host and -device code. The macro has no effect when the function is called from host -code. +device code. +The macro has no effect when the function is called from host code. In order to declare a function with [code]#SYCL_EXTERNAL#, the macro name -[code]#SYCL_EXTERNAL# must appear before the function declaration. If the -function is also decorated with {cpp} attributes that appear before the +[code]#SYCL_EXTERNAL# must appear before the function declaration. +If the function is also decorated with {cpp} attributes that appear before the declaration, the [code]#SYCL_EXTERNAL# may appear before, after, or between -these attributes. The following example demonstrates the use of -[code]#SYCL_EXTERNAL#. +these attributes. +The following example demonstrates the use of [code]#SYCL_EXTERNAL#. [source,,linenums] ---- @@ -859,8 +873,8 @@ Functions that are declared using [code]#SYCL_EXTERNAL# have the following additional restrictions beyond those imposed on other device functions: * If the SYCL backend does not support the generic address space then the - function cannot use raw pointers as parameter or return types. Explicit - pointer classes must be used instead; + function cannot use raw pointers as parameter or return types. + Explicit pointer classes must be used instead; * The function cannot call [code]#group::parallel_for_work_item#; diff --git a/adoc/chapters/extensions.adoc b/adoc/chapters/extensions.adoc index 846c7237..cf39e929 100644 --- a/adoc/chapters/extensions.adoc +++ b/adoc/chapters/extensions.adoc @@ -3,44 +3,48 @@ [[chapter.extensions]] = SYCL Extensions -This chapter describes the mechanism by which the <> can be -extended. Some parts of this chapter are requirements that all implementations -must follow if they extend the <>, while other parts of the chapter -are merely guidelines. Unless a requirement is specifically stated as -normative, all content in this chapter is a non-normative guideline. +This chapter describes the mechanism by which the <> can be extended. +Some parts of this chapter are requirements that all implementations must follow +if they extend the <>, while other parts of the chapter are merely +guidelines. +Unless a requirement is specifically stated as normative, all content in this +chapter is a non-normative guideline. An extension can be either of two flavors: an extension ratified by the Khronos -SYCL group or a vendor supplied extension. In both cases, an extension is an -optional feature set which an implementation need not implement in order to be -conformant with the <>. - -Vendors may choose to define extensions in order to expose custom features or -to gather feedback on an API that is not yet ready for inclusion in the -<>. Once a vendor extension has stabilized, the vendor is -encouraged to promote it to a future version of the <> or to a -ratified Khronos extension. Thus, vendor extensions can be viewed as a -pipeline of features for consideration in future SYCL versions. - -The Khronos SYCL group may define extensions for features that are not yet -ready for the <> but are implemented by more than one vendor. -These extensions also may be considered for inclusion in a future version of -the <>. - -This chapter does not describe any particular extension to SYCL. Rather, it -describes the _mechanism_ for defining an extension. Each extension is defined -by its own separate document. If an extension is ratified by the Khronos SYCL -group, that group will release a document describing the extension. If a -vendor defines an extension, the vendor is responsible for releasing its +SYCL group or a vendor supplied extension. +In both cases, an extension is an optional feature set which an implementation +need not implement in order to be conformant with the <>. + +Vendors may choose to define extensions in order to expose custom features or to +gather feedback on an API that is not yet ready for inclusion in the +<>. +Once a vendor extension has stabilized, the vendor is encouraged to promote it +to a future version of the <> or to a ratified Khronos extension. +Thus, vendor extensions can be viewed as a pipeline of features for +consideration in future SYCL versions. + +The Khronos SYCL group may define extensions for features that are not yet ready +for the <> but are implemented by more than one vendor. +These extensions also may be considered for inclusion in a future version of the +<>. + +This chapter does not describe any particular extension to SYCL. +Rather, it describes the _mechanism_ for defining an extension. +Each extension is defined by its own separate document. +If an extension is ratified by the Khronos SYCL group, that group will release a +document describing the extension. +If a vendor defines an extension, the vendor is responsible for releasing its documentation. == Definition of an extension -An extension can take many possible forms. Some examples include: +An extension can take many possible forms. +Some examples include: * adding new types or free functions to the SYCL runtime; - * modifying existing SYCL classes, structs, or enumeration types by - adding new members, member functions, or enumerated values; + * modifying existing SYCL classes, structs, or enumeration types by adding new + members, member functions, or enumerated values; * adding new overloads for existing free functions or member functions; * defining new specializations for existing SYCL templates; * adding new {cpp} attributes; @@ -55,27 +59,29 @@ the <>. == Requirements for an extension -This section is normative. All vendors which provide an extension must abide -by the requirements described here. +This section is normative. +All vendors which provide an extension must abide by the requirements described +here. An extension may not change the definition of existing functions defined by the -<> in a way that changes their specified behavior. Also, an -extension may not remove any feature defined by the <>. +<> in a way that changes their specified behavior. +Also, an extension may not remove any feature defined by the <>. The vendor must choose at least one [code]## which uniquely -identifies the vendor's SYCL implementation. The Khronos SYCL group does not -provide any registry of the strings, so each vendor is responsible for choosing -its own. One way to choose a unique string is to use the vendor's company name -or a marketing name that is associated with the vendor's implementation. +identifies the vendor's SYCL implementation. +The Khronos SYCL group does not provide any registry of the strings, so each +vendor is responsible for choosing its own. +One way to choose a unique string is to use the vendor's company name or a +marketing name that is associated with the vendor's implementation. Ultimately, it is each vendor's responsibility to choose a string that is -unique. The strings "khr" and "KHR" are reserved for the Khronos SYCL group -for its own extensions, so vendors may not use these as a -[code]##. +unique. +The strings "khr" and "KHR" are reserved for the Khronos SYCL group for its own +extensions, so vendors may not use these as a [code]##. The implementation must predefine at least one macro of the form [code]#SYCL_IMPLEMENTATION_# which allows applications to test -whether they are being compiled with that vendor's implementation. For -example, the Acme vendor could predefine a macro whose name is +whether they are being compiled with that vendor's implementation. +For example, the Acme vendor could predefine a macro whose name is [code]#SYCL_IMPLEMENTATION_ACME#. @@ -83,122 +89,133 @@ example, the Acme vendor could predefine a macro whose name is Vendors who want to ensure that their extension does not collide with other vendors' extensions or with future versions of the <> should follow -the additional rules specified in this section. However, this is not a -requirement for conformance. +the additional rules specified in this section. +However, this is not a requirement for conformance. === Extension namespace If an extension adds new types or free functions, it should avoid adding these directly in the [code]#sycl::# namespace since future versions of the -<> may also add new identifiers in this namespace. The namespace -[code]#sycl::ext::# is reserved for use by extensions. For -example, the Acme vendor could define extended types and free functions in the -namespace [code]#sycl::ext::acme#, and this would guarantee that they will not -collide with definitions in other vendors' extensions or with future versions -of the <>. +<> may also add new identifiers in this namespace. +The namespace [code]#sycl::ext::# is reserved for use by +extensions. +For example, the Acme vendor could define extended types and free functions in +the namespace [code]#sycl::ext::acme#, and this would guarantee that they will +not collide with definitions in other vendors' extensions or with future +versions of the <>. === Names for extensions to existing classes or enumerations -An extension may add new members or member functions to existing SYCL classes -or new values to existing SYCL enumeration types. To ensure these extensions -do not collide, vendors are encouraged to name them with the prefix -[code]#ext__#. For example, the Acme vendor could add a new -member function to the [code]#sycl::device# class named -[code]#device::ext_acme_fancy()# or a new value to the [code]#sycl::aspect# -enumeration named [code]#aspect::ext_acme_fancier#. +An extension may add new members or member functions to existing SYCL classes or +new values to existing SYCL enumeration types. +To ensure these extensions do not collide, vendors are encouraged to name them +with the prefix [code]#ext__#. +For example, the Acme vendor could add a new member function to the +[code]#sycl::device# class named [code]#device::ext_acme_fancy()# or a new value +to the [code]#sycl::aspect# enumeration named [code]#aspect::ext_acme_fancier#. In some cases, an extension does not have the freedom to choose a specific -function name. For example, this could happen if the extension adds a new -constructor overload for an existing SYCL class. In cases like this, the -extension should ensure that one of the function parameters has a type that is -defined in the extension's namespace. For example, the Acme vendor could add -a new constructor for [code]#sycl::context# with the signature -[code]#context(ext::acme::frobber&)#. +function name. +For example, this could happen if the extension adds a new constructor overload +for an existing SYCL class. +In cases like this, the extension should ensure that one of the function +parameters has a type that is defined in the extension's namespace. +For example, the Acme vendor could add a new constructor for +[code]#sycl::context# with the signature [code]#context(ext::acme::frobber&)#. A similar situation can occur if an existing SYCL template is specialized with an extended enumerated value. -Obviously, the extension cannot rename the template in this case. Instead, -it is sufficient that the template is specialized with an extended enumerated -value, and this guarantees that the extended specialization will not collide. +Obviously, the extension cannot rename the template in this case. +Instead, it is sufficient that the template is specialized with an extended +enumerated value, and this guarantees that the extended specialization will not +collide. [NOTE] ==== Vendors are encouraged to use the [code]#ext__# prefix form when possible for additions to existing SYCL classes because this form makes the -extension's vendor name apparent. People reading application code will -immediately know that a member function is an extension, and they will -immediately know which vendor's documentation to consult. +extension's vendor name apparent. +People reading application code will immediately know that a member function is +an extension, and they will immediately know which vendor's documentation to +consult. ==== === Feature test macros Vendors are encouraged to group a related set of extensions together into a -"feature" and to predefine a feature-test macro when the implementation -supports the extensions in that feature. The feature-test macro should have -the following form to ensure it is unique: -[code]#SYCL_EXT__#. For example, the Acme vendor -might define a feature-test macro named [code]#SYCL_EXT_ACME_FANCYFEATURE#. +"feature" and to predefine a feature-test macro when the implementation supports +the extensions in that feature. +The feature-test macro should have the following form to ensure it is unique: +[code]#SYCL_EXT__#. +For example, the Acme vendor might define a feature-test macro named +[code]#SYCL_EXT_ACME_FANCYFEATURE#. This allows applications to protect code using the extension with -[code]##ifdef#, so that the code is skipped when compiled with an -implementation that doesn't support the feature. +[code]##ifdef#, so that the code is skipped when compiled with an implementation +that doesn't support the feature. Since the interface to an extension might change from one release to another, -vendors are also encouraged to predefine the macro's value to the version of -the extension. Vendors should use a numerical value that monotonically -increases for each revision of the extension API. - -Of course, an extension may also predefine other macros. In order to ensure -that these macro names do not collide with other extensions or future versions -of the <>, the name should start with the prefix +vendors are also encouraged to predefine the macro's value to the version of the +extension. +Vendors should use a numerical value that monotonically increases for each +revision of the extension API. + +Of course, an extension may also predefine other macros. +In order to ensure that these macro names do not collide with other extensions +or future versions of the <>, the name should start with the prefix [code]#SYCL_EXT_# or [code]#SYCL_IMPLEMENTATION_#. === Attribute namespace -An extension may define new {cpp} attributes. The attribute namespace -[code]#sycl::# is reserved for the <>, so vendors should choose a -different namespace for any attributes they add. +An extension may define new {cpp} attributes. +The attribute namespace [code]#sycl::# is reserved for the <>, so +vendors should choose a different namespace for any attributes they add. === Include file paths An extension may define new [code]##include# files under the [code]#"sycl"# -path. The path prefix [code]#"sycl/ext/"# is reserved for this -purpose. For example, the Acme vendor could add a header file +path. +The path prefix [code]#"sycl/ext/"# is reserved for this purpose. +For example, the Acme vendor could add a header file [code]#"sycl/ext/acme/fancy.h"# and be guaranteed that it would not conflict with other extensions or with future versions of the <>. === Optional kernel features An extension may also add new optional kernel features -- features which are -supported on some devices but not on others. Vendors are encouraged to follow -the same mechanism outlined in <>. Therefore, -an extended optional kernel feature should have a matching extension to the -[code]#sycl::aspect# enumerated type. +supported on some devices but not on others. +Vendors are encouraged to follow the same mechanism outlined in +<>. +Therefore, an extended optional kernel feature should have a matching extension +to the [code]#sycl::aspect# enumerated type. === Adding a backend -An extension may also add a new backend. If it does, the naming of the -backend APIs follows the normal guidelines for extensions and also follows -the naming pattern for backends that are defined in the <>. To -illustrate: +An extension may also add a new backend. +If it does, the naming of the backend APIs follows the normal guidelines for +extensions and also follows the naming pattern for backends that are defined in +the <>. +To illustrate: * The extension should add a new value to the [code]#sycl::backend# enumeration - type using a naming scheme like [code]#ext__#. For - example, if the Acme vendor adds a backend named "foo", it would add an + type using a naming scheme like [code]#ext__#. + For example, if the Acme vendor adds a backend named "foo", it would add an enumerated value named [code]#sycl::backend::ext_acme_foo#. * The extension should define the backend's interop API in a namespace named - [code]#sycl::ext::::#. For our hypothetical Acme - example, this would be a namespace named [code]#sycl::ext::acme::foo#. + [code]#sycl::ext::::#. + For our hypothetical Acme example, this would be a namespace named + [code]#sycl::ext::acme::foo#. * If the backend interop API is available through a separate header file, that header should be named - [code]#"sycl/ext//backend/.hpp"#. For our - hypothetical Acme example this would be + [code]#"sycl/ext//backend/.hpp"#. + For our hypothetical Acme example this would be [code]#"sycl/ext/acme/backend/foo.hpp"#. * The extension should predefine a macro for the backend when it is "active". The name of this macro should be - [code]#SYCL_EXT__BACKEND_#. For our hypothetical - Acme example this would be [code]#SYCL_EXT_ACME_BACKEND_FOO#. + [code]#SYCL_EXT__BACKEND_#. + For our hypothetical Acme example this would be + [code]#SYCL_EXT_ACME_BACKEND_FOO#. // %%%%%%%%%%%%%%%%%%%%%%%%%%%% end extensions %%%%%%%%%%%%%%%%%%%%%%%%%%%% diff --git a/adoc/chapters/feature_sets.adoc b/adoc/chapters/feature_sets.adoc index 94acb198..da9dfd55 100644 --- a/adoc/chapters/feature_sets.adoc +++ b/adoc/chapters/feature_sets.adoc @@ -24,22 +24,24 @@ no exceptions. == Reduced feature set The reduced feature set makes certain features optional or restricted to -specific forms. The following list defines all the differences between the -reduced feature set and the full feature set. - - . *Un-named SYCL kernel functions:* <> - which are defined using a lambda expression and therefore have no standard - name are required to be provided a name via the kernel name template parameter - of kernel invocation functions such as [code]#parallel_for#. This overrides - the <> rules for <> naming as specified in - <>. +specific forms. +The following list defines all the differences between the reduced feature set +and the full feature set. + + . *Un-named SYCL kernel functions:* <> which are defined using a lambda expression and therefore have + no standard name are required to be provided a name via the kernel name + template parameter of kernel invocation functions such as + [code]#parallel_for#. + This overrides the <> rules for <> naming + as specified in <>. . *Address space mode:* The <> mode used in the reduced feature set is not required to be <>, regardless of SYCL backend in use. - Instead the <> mode - may always be used. + Instead the <> mode may + always be used. . *Declarations:* In addition to the requirements specified in <>, the reduced feature set does not require @@ -51,10 +53,11 @@ reduced feature set and the full feature set. == Compatibility In order to avoid introducing any kind of divergence the reduced and full -feature sets are defined such that the full feature set is a subsumption of -the reduced feature set. This means that any applications which are -developed for the reduced feature set will be compatible with both a SYCL -reduced implementation and a SYCL full implementation. +feature sets are defined such that the full feature set is a subsumption of the +reduced feature set. +This means that any applications which are developed for the reduced feature set +will be compatible with both a SYCL reduced implementation and a SYCL full +implementation. [[sec:feature-sets.conformance]] @@ -62,13 +65,14 @@ reduced implementation and a SYCL full implementation. One of the reasons for having this be defined in the specification is that hardware vendors which wish to support SYCL on their platform(s) want to be able -to demonstrate their support for it by passing conformance. However, if passing -conformance means adopting features which they do not believe to be necessary at -an additional development effort then this may deter them. +to demonstrate their support for it by passing conformance. +However, if passing conformance means adopting features which they do not +believe to be necessary at an additional development effort then this may deter +them. Each feature set has its own route for passing conformance allowing adopters of -SYCL to specify the feature set they wish to test conformance against. The -conformance test suite would then alter or disable the tests within the test +SYCL to specify the feature set they wish to test conformance against. +The conformance test suite would then alter or disable the tests within the test suite according to how the feature sets are differentiated above. // %%%%%%%%%%%%%%%%%%%%%%%%%%%% end feature_sets %%%%%%%%%%%%%%%%%%%%%%%%%%%% diff --git a/adoc/chapters/glossary.adoc b/adoc/chapters/glossary.adoc index a83050a9..f036f4ae 100644 --- a/adoc/chapters/glossary.adoc +++ b/adoc/chapters/glossary.adoc @@ -14,102 +14,107 @@ [glossary] [[accessor]]accessor:: - An accessor is a class which allows a <> to access data managed - by a <> or <> class or allows a <> - to access local memory on a <>. Accessors are also used to express - the dependencies among the different <>. + An accessor is a class which allows a <> to access data managed by + a <> or <> class or allows a <> to + access local memory on a <>. + Accessors are also used to express the dependencies among the different + <>. For the full description please refer to <> [[application-scope]]application scope:: - The application scope starts with the construction first - <> class object and finishes with the destruction of the - last one. Application refers to the {cpp} <> and not - the <>. + The application scope starts with the construction first <> + class object and finishes with the destruction of the last one. + Application refers to the {cpp} <> and not the + <>. [[aspect]]aspect:: - A characteristic of a <> which determines whether it supports - some optional feature. Aspects are always boolean, so a <> - either has or does not have an aspect. + A characteristic of a <> which determines whether it supports some + optional feature. + Aspects are always boolean, so a <> either has or does not have an + aspect. [[async-error]]asynchronous error:: A SYCL asynchronous error is an error occurring after the host API call - invoking the error causing action has returned, such that the error - cannot be thrown as a typical {cpp} exception from a host API call. Such - errors are typically generated from device kernel invocations which are + invoking the error causing action has returned, such that the error cannot + be thrown as a typical {cpp} exception from a host API call. + Such errors are typically generated from device kernel invocations which are executed when SYCL task graph dependencies are satisfied, which occur - asynchronously from host code execution. For the full description and - associated asynchronous error handling mechanisms, please refer to - <>. + asynchronously from host code execution. + For the full description and associated asynchronous error handling + mechanisms, please refer to <>. [[async-handler]]async_handler:: - An asynchronous error handler object is a function class instance - providing necessary code for handling all the asynchronous errors - triggered from the execution of command groups on a queue, within a - context or an associated event. For the full description please refer to - <>. + An asynchronous error handler object is a function class instance providing + necessary code for handling all the asynchronous errors triggered from the + execution of command groups on a queue, within a context or an associated + event. + For the full description please refer to <>. [[barrier]]barrier:: A barrier may refer to either a <> used for host-device - coordination, or a <> used to coordinate work-items in - a kernel. + coordination, or a <> used to coordinate work-items in a + kernel. [[blocking-accessor]]blocking accessor:: - A blocking accessor is an <> which provides immediate access - and continues to provide access until it is destroyed. For the full - description please refer to <> + A blocking accessor is an <> which provides immediate access and + continues to provide access until it is destroyed. + For the full description please refer to <> [[buffer]]buffer:: + -- -The buffer class manages data for the SYCL {cpp} host application and the -SYCL device kernels. The buffer class may acquire ownership of some host -pointers passed to its constructors according to the constructor kind. - -The buffer class, together with the accessor class, is responsible for -tracking memory transfers and guaranteeing data consistency among the -different kernels. The <> manages the memory allocations -on both the host and the <> within the lifetime of the buffer -object. For the full description please refer to <>. +The buffer class manages data for the SYCL {cpp} host application and the SYCL +device kernels. +The buffer class may acquire ownership of some host pointers passed to its +constructors according to the constructor kind. + +The buffer class, together with the accessor class, is responsible for tracking +memory transfers and guaranteeing data consistency among the different kernels. +The <> manages the memory allocations on both the host and the +<> within the lifetime of the buffer object. +For the full description please refer to <>. -- [[bundle-state]]bundle state:: A SYCL bundle state represents the state of a <> and - therefore its capabilities in the SYCL programming API. Possible states - are <>, <> or <>. + therefore its capabilities in the SYCL programming API. + Possible states are <>, <> or <>. [[command]]command:: A request to execute work that is submitted to a <> such as the - invocation of a <>, the invocation of a - <> or an asynchronous copy. + invocation of a <>, the invocation of a <> + or an asynchronous copy. [[command-group]]command group:: In SYCL, the operations required to process data on a <> are - represented using a <>. Each - <> is given a unique <> - object to perform all the necessary work required to correctly process - data on a <> using a kernel. In this way, the group of - commands for transferring and processing data is enqueued as a command - group on a <> for execution. A command group is submitted - atomically to a SYCL queue. + represented using a <>. + Each <> is given a unique <> object + to perform all the necessary work required to correctly process data on a + <> using a kernel. + In this way, the group of commands for transferring and processing data is + enqueued as a command group on a <> for execution. + A command group is submitted atomically to a SYCL queue. [[command-group-function-object]]command group function object:: - A type which is callable with [code]#operator()# that takes a - reference to a <>, that defines a <> which - can be submitted by a <>. The function object can be a named - type, lambda function or [code]#std::function#. + A type which is callable with [code]#operator()# that takes a reference to a + <>, that defines a <> which can be submitted by a + <>. + The function object can be a named type, lambda function or + [code]#std::function#. [[handler]]command group handler:: - The command group handler class provides the interface for the commands - that can be executed inside the <>. It is - provided as a scoped object to all of the data access requests within - the command group scope. For the full description please refer to - <>. + The command group handler class provides the interface for the commands that + can be executed inside the <>. + It is provided as a scoped object to all of the data access requests within + the command group scope. + For the full description please refer to <>. [[command-group-scope]]command group scope:: The command group scope is the function scope defined by the - <>. The command group <> - object lifetime is restricted to the command group scope. For more - details see <>. + <>. + The command group <> object lifetime is restricted to the command + group scope. + For more details see <>. [[queue-barrier]]command queue barrier:: The [code]#sycl::queue::wait()# and [code]#sycl::queue::wait_and_throw()# @@ -117,28 +122,28 @@ object. For the full description please refer to <>. <> completes. [[constant-memory]]constant memory:: - A region of memory that remains constant during the execution of - a kernel. The <> allocates and initializes memory - objects placed into constant memory. + A region of memory that remains constant during the execution of a kernel. + The <> allocates and initializes memory objects placed into + constant memory. [[context]]context:: - A <> represents the runtime data structures and state - required by a <> API to interact with a group of <> - associated with a <>. The context is defined as the - [code]#sycl::context# class, for further details please see - <>. + A <> represents the runtime data structures and state required by a + <> API to interact with a group of <> associated + with a <>. + The context is defined as the [code]#sycl::context# class, for further + details please see <>. [[control-flow]]control flow:: When all <> in a <> are executing the same sequence of statements, they are said to be executing under _converged_ - control flow. Control flow _diverges_ when different work-items in a - group execute a different sequence of statements, typically as a result - of evaluating conditions differently (e.g. in selection statements or - loops). + control flow. + Control flow _diverges_ when different work-items in a group execute a + different sequence of statements, typically as a result of evaluating + conditions differently (e.g. in selection statements or loops). [[core-spec]]core SYCL specification:: - The text of the SYCL language specification (this document), excluding - the text of any backend specifications and excluding the text for any + The text of the SYCL language specification (this document), excluding the + text of any backend specifications and excluding the text for any extensions. [[descendent-device]]descendent device:: @@ -150,36 +155,38 @@ object. For the full description please refer to <>. <>. [[device-compiler]]device compiler:: - A SYCL device compiler is a compiler that produces <> - binaries from a valid <>. For the full description - please refer to <>. + A SYCL device compiler is a compiler that produces <> binaries from + a valid <>. + For the full description please refer to <>. [[device-copyable]]device copyable:: - Data that is shared between the host and the devices must generally - have a type that abides by the restrictions listed in - <> for a device copyable type. + Data that is shared between the host and the devices must generally have a + type that abides by the restrictions listed in <> for + a device copyable type. [[device-function]]device function:: - A device function is any function in a <> - that can be run on a <>. This includes - <> and, recursively, functions - they call. + A device function is any function in a <> that can be run + on a <>. + This includes <> and, + recursively, functions they call. [[device-image]]device image:: A device image is a representation of one or more <> in an - implementation-defined format. A device image could be a compiled version - of the kernels in an intermediate language representation which needs to be - translated at runtime into a form that can be invoked on a <>, it - could be a compiled version of the kernels in a native code format that is - ready to be invoked without further translation, or it could be a source - code representation which needs to be compiled before it can be invoked. + implementation-defined format. + A device image could be a compiled version of the kernels in an intermediate + language representation which needs to be translated at runtime into a form + that can be invoked on a <>, it could be a compiled version of the + kernels in a native code format that is ready to be invoked without further + translation, or it could be a source code representation which needs to be + compiled before it can be invoked. Other representations are possible too. [[device-selector]]device selector:: - A way to select a device used in various places. This is a callable - object taking a <> reference and returning an integer rank. - One of the device with the highest non-negative value is selected. See - <> for more details. + A way to select a device used in various places. + This is a callable object taking a <> reference and returning an + integer rank. + One of the device with the highest non-negative value is selected. + See <> for more details. [[event]]event:: A SYCL object that represents the status of an operation that is being @@ -194,10 +201,11 @@ object. For the full description please refer to <>. <>, <> and <> region. [[global-id]]global id:: - As in OpenCL, a global ID is used to uniquely identify a <> - and is derived from the number of global <> specified - when executing a kernel. A global ID is a one, two or three-dimensional - value that starts at 0 per dimension. + As in OpenCL, a global ID is used to uniquely identify a <> and + is derived from the number of global <> specified when + executing a kernel. + A global ID is a one, two or three-dimensional value that starts at 0 per + dimension. [[global-memory]]global memory:: Global memory is a memory region accessible to all <> @@ -212,52 +220,55 @@ object. For the full description please refer to <>. See the definition of the [code]#group_barrier# function. [[h-item]]h-item:: - A unique identifier representing a single <> within the - index space of a SYCL kernel hierarchical execution. Can be one, two or - three dimensional. In the SYCL interface a <> is represented - by the [code]#h_item# class (see <>). + A unique identifier representing a single <> within the index + space of a SYCL kernel hierarchical execution. + Can be one, two or three dimensional. + In the SYCL interface a <> is represented by the [code]#h_item# + class (see <>). [[host]]host:: Host is the system that executes the {cpp} application including the SYCL API. [[host-pointer]]host pointer:: - A pointer to memory on the host. Cannot be accessed directly from a - <>. + A pointer to memory on the host. + Cannot be accessed directly from a <>. [[host-task]]host task:: - A <> which invokes a native {cpp} callable, scheduled - conforming to SYCL dependency rules. + A <> which invokes a native {cpp} callable, scheduled conforming to + SYCL dependency rules. [[host-task-command]]host task command:: - A type of command that can be used inside a <> in order - to schedule a native {cpp} function. + A type of command that can be used inside a <> in order to + schedule a native {cpp} function. [[id]]id:: - It is a unique identifier of an item in an index space. It can be one, - two or three dimensional index space, since the SYCL kernel execution - model is an <>. It is one of the index space classes. For - the full description please refer to <>. + It is a unique identifier of an item in an index space. + It can be one, two or three dimensional index space, since the SYCL kernel + execution model is an <>. + It is one of the index space classes. + For the full description please refer to <>. [[image]]image:: Images in SYCL, like buffers, are abstractions of multidimensional - structured arrays. Image can refer to [code]#unsampled_image# and - [code]#sampled_image#. For the full description please refer to - <>. + structured arrays. + Image can refer to [code]#unsampled_image# and [code]#sampled_image#. + For the full description please refer to <>. [[implementation-defined]]implementation-defined:: Behavior that is explicitly allowed to vary between conforming - implementations of SYCL. A SYCL implementer is required to document the - implementation-defined behavior. + implementations of SYCL. + A SYCL implementer is required to document the implementation-defined + behavior. [[index-space-classes]]index space classes:: - Like in OpenCL, the kernel execution model defines an - <> index space. + Like in OpenCL, the kernel execution model defines an <> index + space. The <> class that defines an <> is the - [code]#sycl::nd_range#, which takes as input the sizes of global - and local work-items, represented using the [code]#sycl::range# - class. The kernel library classes for indexing in the defined - <> are the following classes: + [code]#sycl::nd_range#, which takes as input the sizes of global and local + work-items, represented using the [code]#sycl::range# class. + The kernel library classes for indexing in the defined <> are the + following classes: + * [code]#sycl::id# : The basic index class representing an <>; * [code]#sycl::item# : The <> index class that contains the @@ -269,28 +280,30 @@ object. For the full description please refer to <>. [[input]]input:: A state which a <> can be in, representing - <> as a source or intermediate representation + <> as a source or intermediate + representation [[item]]item:: An item id is an interface used to retrieve the <>, - <> and <>. For further details see - <>. + <> and <>. + For further details see <>. [[kernel]]kernel:: A kernel represents a <> that has been compiled for a device, including all of the <> it calls. A kernel is implicitly created when a <> is submitted - to a device via a <>. However, a kernel can - also be created manually by pre-compiling a <> (see - <>). + to a device via a <>. + However, a kernel can also be created manually by pre-compiling a + <> (see <>). [[kernel-bundle]]kernel bundle:: A kernel bundle is a collection of <> that are associated with the same <> and with a set of <>. Kernel bundles have one of three states: <>, <> or - <>. Kernel bundles in the executable state are ready to be - invoked on a device, whereas bundles in the other states need to be - translated into the executable state before they can be invoked. + <>. + Kernel bundles in the executable state are ready to be invoked on a device, + whereas bundles in the other states need to be translated into the + executable state before they can be invoked. [[kernel-handler]]kernel handler:: A representation of a <> being invoked that is @@ -299,24 +312,24 @@ object. For the full description please refer to <>. // May conflict with host_task MR [[kernel-invocation-command]]kernel invocation command:: - A type of command that can be used inside a <> in order - to schedule a <>, includes - [code]#single_task#, all variants of [code]#parallel_for# and - [code]#parallel_for_workgroup#. + A type of command that can be used inside a <> in order to + schedule a <>, includes [code]#single_task#, all + variants of [code]#parallel_for# and [code]#parallel_for_workgroup#. [[kernel-name]]kernel name:: - A kernel name is a class type that is used to assign a name to the - kernel function, used to link the host system with the kernel object - output by the device compiler. For details on naming kernels please see - <>. + A kernel name is a class type that is used to assign a name to the kernel + function, used to link the host system with the kernel object output by the + device compiler. + For details on naming kernels please see <>. [[kernel-scope]]kernel scope:: - The function scope of the [code]#operator()# on a - <>. Note that any function or member function called from - the kernel is also compiled in kernel scope. The kernel scope allows {cpp} - language extensions as well as restrictions to reflect the capabilities - of devices. The extensions and restrictions are defined in the - SYCL device compiler specification. + The function scope of the [code]#operator()# on a <>. + Note that any function or member function called from the kernel is also + compiled in kernel scope. + The kernel scope allows {cpp} language extensions as well as restrictions to + reflect the capabilities of devices. + The extensions and restrictions are defined in the SYCL device compiler + specification. [[local-id]]local id:: A unique identifier of a <> among other work-items of a @@ -327,9 +340,9 @@ object. For the full description please refer to <>. accessible only by <> in that <>. [[native-backend-object]]native backend object:: - An opaque object defined by a specific backend that represents a - high-level SYCL object on said backend. There is no guarantee of having - native backend objects for all SYCL types. + An opaque object defined by a specific backend that represents a high-level + SYCL object on said backend. + There is no guarantee of having native backend objects for all SYCL types. [[native-specialization-constant]]native-specialization constant:: A <> in a device image whose value can be used by @@ -337,28 +350,29 @@ object. For the full description please refer to <>. [[nd-item]]nd-item:: - A unique identifier representing a single <> and - <> within the index space of a SYCL kernel execution. Can - be one, two or three dimensional. In the SYCL interface an <> - is represented by the [code]#nd_item# class (see - <>). + A unique identifier representing a single <> and <> + within the index space of a SYCL kernel execution. + Can be one, two or three dimensional. + In the SYCL interface an <> is represented by the [code]#nd_item# + class (see <>). [[nd-range]]nd-range:: A representation of the index space of a SYCL kernel execution, the - distribution of <> within into <>. + distribution of <> within into + <>. Contains a <> specifying the number of global <>, a <> specifying the number of local - <> and a <> specifying the global offset. Can be - one, two or three dimensional. The minimum size of <> - within the <> is 0 per dimension; where any dimension is set to zero, - the index space in all dimensions will be zero. - In the SYCL interface an - <> is represented by the [code]#nd_range# class (see - <>). + <> and a <> specifying the global offset. + Can be one, two or three dimensional. + The minimum size of <> within the <> is 0 per dimension; + where any dimension is set to zero, the index space in all dimensions will + be zero. + In the SYCL interface an <> is represented by the [code]#nd_range# + class (see <>). [[mem-fence]]mem-fence:: - A memory fence provides control over re-ordering of memory load - and store operations when coupled with an atomic operation. + A memory fence provides control over re-ordering of memory load and store + operations when coupled with an atomic operation. See the definition of the [code]#sycl::atomic_fence# function. [[object]]object:: @@ -366,45 +380,48 @@ object. For the full description please refer to <>. <> as a non-executable object. [[platform]]platform:: - A collection of <> managed by a single - <>. + A collection of <> managed by a single <>. [[private-memory]]private memory:: - A region of memory private to a <>. Variables defined in one - work-item's private memory are not visible to another work-item. - The [code]#sycl::private_memory# class provides - access to the work-item's private memory for the hierarchical API as it - is described at <>. + A region of memory private to a <>. + Variables defined in one work-item's private memory are not visible to + another work-item. + The [code]#sycl::private_memory# class provides access to the work-item's + private memory for the hierarchical API as it is described at + <>. [[queue]]queue:: - A SYCL command queue is an object that holds command groups to be - executed on a SYCL <>. SYCL provides a heterogeneous platform - integration using device queue, which is the minimum requirement for a - SYCL application to run on a SYCL <>. For the full description - please refer to <>. + A SYCL command queue is an object that holds command groups to be executed + on a SYCL <>. + SYCL provides a heterogeneous platform integration using device queue, which + is the minimum requirement for a SYCL application to run on a SYCL + <>. + For the full description please refer to <>. [[range]]range:: A representation of a number of <> or <> within the index space of a SYCL kernel - execution. Can be one, two or three dimensional. In the SYCL interface a - <> is represented by the [code]#range# class + execution. + Can be one, two or three dimensional. + In the SYCL interface a <> is represented by the [code]#range# class (see <>). [[ranged-accessor]]ranged accessor:: - A ranged accessor is a host or buffer <> that was constructed - with a non-zero offset into the data buffer or with an access range smaller - than the range of the data buffer, or both. Please refer to - <> for more info. + A ranged accessor is a host or buffer <> that was constructed with + a non-zero offset into the data buffer or with an access range smaller than + the range of the data buffer, or both. + Please refer to <> for more info. [[reduction]]reduction:: - An operation that produces a single value by combining multiple values - in an unspecified order using a binary operator. If the operator is - non-associative or non-commutative, the behavior of a reduction may be - non-deterministic. + An operation that produces a single value by combining multiple values in an + unspecified order using a binary operator. + If the operator is non-associative or non-commutative, the behavior of a + reduction may be non-deterministic. [[root-device]]root device:: - A device that is not a sub-device. The function - [code]#device::get_devices()# returns a vector of all the root devices. + A device that is not a sub-device. + The function [code]#device::get_devices()# returns a vector of all the root + devices. [[rule-of-five]]rule of five:: For a given class, if at least one of the copy constructor, move @@ -414,53 +431,52 @@ object. For the full description please refer to <>. [[rule-of-zero]]rule of zero:: For a given class, if the copy constructor, move constructor, copy - assignment operator, move assignment operator and destructor would all - be inlined, public and defaulted, none of them should be explicitly - declared. + assignment operator, move assignment operator and destructor would all be + inlined, public and defaulted, none of them should be explicitly declared. [[smcp]]SMCP:: The single-source multiple compiler-passes (SMCP) - technique allows a single-source file to be parsed by multiple - compilers for building native programs per compilation target. For - example, a standard {cpp} CPU compiler for targeting <> will - parse the <> to create the {cpp} <> - which offloads parts of the computation to other - <>. A SYCL device compiler will parse the same - source file and target only SYCL kernels. For the full description - please refer to <>. See <> for another - approach. + technique allows a single-source file to be parsed by multiple compilers for + building native programs per compilation target. + For example, a standard {cpp} CPU compiler for targeting <> will parse + the <> to create the {cpp} <> which offloads + parts of the computation to other <>. + A SYCL device compiler will parse the same source file and target only SYCL + kernels. + For the full description please refer to <>. + See <> for another approach. [[specialization-constant]]specialization constant:: - A constant variable where the value is not known until compilation of - the <>. + A constant variable where the value is not known until compilation of the + <>. [[specialization-id]]specialization id:: - An identifier which represents a reference to a - <> both in the <> for setting - the value prior to the compilation of a <> and in a - <> for retrieving the value during invocation. + An identifier which represents a reference to a <> + both in the <> for setting the value prior to the + compilation of a <> and in a <> for + retrieving the value during invocation. [[sscp]]SSCP:: - The single-source single compiler-pass (SSCP) technique - allows a single-source file to be parsed only once by a single - compiler. For example, the SYCL compiler will parse the - <> once. Then, from this single intermediate - representation, for each kind of device architecture a compilation - flow will generate the binary for each kernel and another - compilation flow will generate the <> code of the {cpp} - <>. For the full description please refer to - <>. See <> for another approach. + The single-source single compiler-pass (SSCP) technique allows a + single-source file to be parsed only once by a single compiler. + For example, the SYCL compiler will parse the <> once. + Then, from this single intermediate representation, for each kind of device + architecture a compilation flow will generate the binary for each kernel and + another compilation flow will generate the <> code of the {cpp} + <>. + For the full description please refer to <>. + See <> for another approach. [[string-kernel-name]]string kernel name:: - The name of a <> in string form, this can be the - name of a kernel function created via interop or a string form of a + The name of a <> in string form, this can be the name + of a kernel function created via interop or a string form of a <>. [[sub-group]]sub-group:: - The SYCL sub-group ([code]#sycl::sub_group# class) is a - representation of a collection of related work-items within a - <>. For further details for the [code]#sycl::sub_group# class - see <>. + The SYCL sub-group ([code]#sycl::sub_group# class) is a representation of a + collection of related work-items within a <>. + For further details for the [code]#sycl::sub_group# class see + <>. [[sub-group-barrier]]sub-group barrier:: A <> for all <> in a <>. @@ -474,9 +490,10 @@ object. For the full description please refer to <>. [[backend]]SYCL backend:: An implementation of the SYCL programming model using an heterogeneous - programming API. A SYCL backend exposes one or multiple SYCL - <>. For example, the OpenCL backend, via the ICD loader, - can expose multiple OpenCL <>. + programming API. + A SYCL backend exposes one or multiple SYCL <>. + For example, the OpenCL backend, via the ICD loader, can expose multiple + OpenCL <>. [[backend-api]]SYCL backend API:: The exposed API for writing SYCL code against a given <>. @@ -489,35 +506,37 @@ object. For the full description please refer to <>. A SYCL {cpp} source file that contains SYCL API calls. [[sycl-kernel-function]]SYCL kernel function:: - A type which is callable with [code]#operator()# that takes an - <>, <>, <> or <>, and an optional - [code]#kernel_handler# as its last parameter. This type can be passed to - kernel enqueue member functions of the <>. A - <> defines an entry point to a <>. The - function object can be a named <> type or lambda + A type which is callable with [code]#operator()# that takes an <>, + <>, <> or <>, and an optional + [code]#kernel_handler# as its last parameter. + This type can be passed to kernel enqueue member functions of the + <>. + A <> defines an entry point to a <>. + The function object can be a named <> type or lambda function. [[sycl-runtime]]SYCL runtime:: - A SYCL runtime is an implementation of the SYCL API specification. The - SYCL runtime manages the different <>, + A SYCL runtime is an implementation of the SYCL API specification. + The SYCL runtime manages the different <>, <>, <> as well as memory - handling of data between host and <> <> - to enable semantically correct execution of SYCL programs. + handling of data between host and <> <> to enable + semantically correct execution of SYCL programs. [[type-kernel-name]]type kernel name:: - The name of a <> in type form, this can be either - a <> provided to a <> or the - type of a function object use as a <>. + The name of a <> in type form, this can be either a + <> provided to a <> or the type of a + function object use as a <>. [[usm]]USM:: + -- Unified Shared Memory (USM) provides a pointer-based alternative to the -<> programming model. USM enables: +<> programming model. +USM enables: - * easier integration into existing code bases by representing allocations - as pointers rather than buffers, with full support for pointer - arithmetic into allocations; + * easier integration into existing code bases by representing allocations as + pointers rather than buffers, with full support for pointer arithmetic into + allocations; * fine-grain control over ownership and accessibility of allocations, to optimally choose between performance and programmer convenience; * a simpler programming model, by automatically migrating some allocations @@ -527,12 +546,12 @@ See <> -- [[work-group]]work-group:: - The SYCL work-group ([code]#sycl::group# class) is a representation - of a collection of related <> that execute on a single - compute unit. The <> in the group execute the same - kernel-instance and <>. - For further details for the [code]#sycl::group# - class see <>. + The SYCL work-group ([code]#sycl::group# class) is a representation of a + collection of related <> that execute on a single + compute unit. + The <> in the group execute the same kernel-instance + and <>. + For further details for the [code]#sycl::group# class see <>. [[work-group-barrier]]work-group barrier:: A <> for all <> in a <>. @@ -541,20 +560,21 @@ See <> A <> for all <> in a <>. [[work-group-id]]work-group id:: - As in OpenCL, SYCL kernels execute in <>. The group ID - is the ID of the <> that a <> is executing - within. A group ID is an one, two or three dimensional value that starts - at 0 per dimension. + As in OpenCL, SYCL kernels execute in <>. + The group ID is the ID of the <> that a <> is + executing within. + A group ID is an one, two or three dimensional value that starts at 0 per + dimension. [[work-group-range]]work-group range:: A group range is the size of the <> for every dimension. [[work-item]]work-item:: - The SYCL work-item is a representation of a <> among a - collection of parallel executions of a kernel invoked on a <> - by a <>. A <> is executed by one or more processing - elements as part of a <> executing on a compute unit. A - <> is distinguished from other <> by its + The SYCL work-item is a representation of a <> among a collection + of parallel executions of a kernel invoked on a <> by a <>. + A <> is executed by one or more processing elements as + part of a <> executing on a compute unit. + A <> is distinguished from other <> by its <> or the combination of its <> and its <> within a <>. diff --git a/adoc/chapters/host_backend.adoc b/adoc/chapters/host_backend.adoc index 1c514546..24a871b1 100644 --- a/adoc/chapters/host_backend.adoc +++ b/adoc/chapters/host_backend.adoc @@ -22,8 +22,8 @@ = Host backend specification This chapter describes how SYCL is mapped on the <>. -The <> exposes the host where the SYCL application is executing -as a platform to dispatch SYCL kernels. +The <> exposes the host where the SYCL application is executing as +a platform to dispatch SYCL kernels. The <> exposes at least one <>. @@ -31,33 +31,37 @@ The <> exposes at least one <>. // From Glossary, reworded to match backend -The SYCL host device implements all functionality required to execute the -SYCL kernels directly on the host, without relying on a third party API. -It has full SYCL capabilities and reports them through the SYCL information retrieval -interface. At least one SYCL host device must be exposed in the SYCL host -backend in all SYCL implementations, and it must always be available. -Any {cpp} application debugger, if available on the system, -can be used for debugging SYCL kernels executing on a SYCL host device. +The SYCL host device implements all functionality required to execute the SYCL +kernels directly on the host, without relying on a third party API. +It has full SYCL capabilities and reports them through the SYCL information +retrieval interface. +At least one SYCL host device must be exposed in the SYCL host backend in all +SYCL implementations, and it must always be available. +Any {cpp} application debugger, if available on the system, can be used for +debugging SYCL kernels executing on a SYCL host device. // From Architecture, Section 3.3 -When a SYCL implementation executes kernels on the host device, -it is free to use whatever parallel execution facilities available on the -host, as long as it executes within the semantics of the kernel execution model -defined by the SYCL kernel execution model. +When a SYCL implementation executes kernels on the host device, it is free to +use whatever parallel execution facilities available on the host, as long as it +executes within the semantics of the kernel execution model defined by the SYCL +kernel execution model. Kernel math library functions on the host must conform to OpenCL math precision -requirements. The SYCL host device needs to be queried for the capabilities it -provides. This ensures consistency when executing any SYCL general application. +requirements. +The SYCL host device needs to be queried for the capabilities it provides. +This ensures consistency when executing any SYCL general application. -The <> must report as supporting images and therefore support -the minimum image formats. +The <> must report as supporting images and therefore support the +minimum image formats. -The range of image formats supported by the host device is implementation-defined, -but must match the minimum requirements of the OpenCL specification. +The range of image formats supported by the host device is +implementation-defined, but must match the minimum requirements of the OpenCL +specification. SYCL implementors can provide extensions on the host-device to match any other -backend-specific extension. This allows developers to rely on the host device -to execute their programs when said backend is not available. +backend-specific extension. +This allows developers to rely on the host device to execute their programs when +said backend is not available. === SYCL memory model on the host @@ -81,15 +85,15 @@ All SYCL device memories are available on devices from the host backend. The host backend must ensure all functionality of the SYCL generic programming model is always available to developers. However, since there is no heterogeneous API behind the host backend (it -directly targets the host platform), there are no native types for SYCL -objects to map to in the SYCL application. - -Inside SYCL kernels, the host backend must ensure interoperability with -existing host code, so that existing host libraries can be used inside -SYCL kernels executing on the host. -In particular, when retrieving a raw pointer from a multi pointer object, -the pointer returned must be usable by any library accessible by the -SYCL application. +directly targets the host platform), there are no native types for SYCL objects +to map to in the SYCL application. + +Inside SYCL kernels, the host backend must ensure interoperability with existing +host code, so that existing host libraries can be used inside SYCL kernels +executing on the host. +In particular, when retrieving a raw pointer from a multi pointer object, the +pointer returned must be usable by any library accessible by the SYCL +application. // %%%%%%%%%%%%%%%%%%%%%%%%%%%% end host_backend %%%%%%%%%%%%%%%%%%%%%%%%%%%% diff --git a/adoc/chapters/information_descriptors.adoc b/adoc/chapters/information_descriptors.adoc index a24d967f..2cfc189a 100644 --- a/adoc/chapters/information_descriptors.adoc +++ b/adoc/chapters/information_descriptors.adoc @@ -54,8 +54,8 @@ include::{header_dir}/queueInfo.h[lines=4..-1] [[appendix.kernel.descriptors]] == Kernel information descriptors -The following interface includes all the information descriptors -that apply to kernels as described in <>. +The following interface includes all the information descriptors that apply to +kernels as described in <>. [source,,linenums] ---- include::{header_dir}/kernelInfo.h[lines=4..-1] @@ -65,9 +65,9 @@ include::{header_dir}/kernelInfo.h[lines=4..-1] [[appendix.event.descriptors]] == Event information descriptors -The following interface includes all the information descriptors -for the [code]#event# class as described in <> -and <>. +The following interface includes all the information descriptors for the +[code]#event# class as described in <> and +<>. [source,,linenums] ---- include::{header_dir}/eventInfo.h[lines=4..-1] diff --git a/adoc/chapters/introduction.adoc b/adoc/chapters/introduction.adoc index f75fb1b1..4d8050a5 100644 --- a/adoc/chapters/introduction.adoc +++ b/adoc/chapters/introduction.adoc @@ -3,144 +3,153 @@ [[introduction]] = Introduction -SYCL (pronounced "`sickle`") is a royalty-free, cross-platform -abstraction {cpp} programming model for heterogeneous computing. SYCL -builds on the underlying concepts, portability and efficiency of -parallel API or standards like OpenCL while adding much of the ease of -use and flexibility of single-source {cpp}. +SYCL (pronounced "`sickle`") is a royalty-free, cross-platform abstraction {cpp} +programming model for heterogeneous computing. +SYCL builds on the underlying concepts, portability and efficiency of parallel +API or standards like OpenCL while adding much of the ease of use and +flexibility of single-source {cpp}. -Developers using SYCL are able to write standard modern {cpp} code, with -many of the techniques they are accustomed to, such as inheritance and -templates. At the same time, developers have access to the full range -of capabilities of the underlying implementation (such as OpenCL) both -through the features of the SYCL libraries and, where necessary, -through interoperation with code written directly using the underneath -implementation, via their APIs. +Developers using SYCL are able to write standard modern {cpp} code, with many of +the techniques they are accustomed to, such as inheritance and templates. +At the same time, developers have access to the full range of capabilities of +the underlying implementation (such as OpenCL) both through the features of the +SYCL libraries and, where necessary, through interoperation with code written +directly using the underneath implementation, via their APIs. -To reduce programming effort and increase the flexibility with which -developers can write code, SYCL extends the concepts found in -standards like OpenCL model in a few ways beyond the general use of {cpp} -features: +To reduce programming effort and increase the flexibility with which developers +can write code, SYCL extends the concepts found in standards like OpenCL model +in a few ways beyond the general use of {cpp} features: * execution of parallel kernels on a heterogeneous device is made - simultaneously convenient and flexible. Common parallel patterns are - prioritized with simple syntax, which through a series {cpp} types allow - the programmer to express additional requirements, such as dependencies, - if needed; - * when using buffers and accessors, data access in SYCL is separated from - data storage. By relying on the {cpp}-style resource acquisition is - initialization (RAII) idiom to capture data dependencies between device - code blocks, the runtime library can track data movement and provide - correct behavior without the complexity of manually managing event - dependencies between kernel instances and without the programmer having to - explicitly move data. This approach enables the data-parallel task-graphs - that might be already part of the execution model to be built up easily - and safely by SYCL programmers; + simultaneously convenient and flexible. + Common parallel patterns are prioritized with simple syntax, which through a + series {cpp} types allow the programmer to express additional requirements, + such as dependencies, if needed; + * when using buffers and accessors, data access in SYCL is separated from data + storage. + By relying on the {cpp}-style resource acquisition is initialization (RAII) + idiom to capture data dependencies between device code blocks, the runtime + library can track data movement and provide correct behavior without the + complexity of manually managing event dependencies between kernel instances + and without the programmer having to explicitly move data. + This approach enables the data-parallel task-graphs that might be already + part of the execution model to be built up easily and safely by SYCL + programmers; * Unified Shared Memory (<>) provides a mechanism for explicit data - allocation and movement. This approach enables the use of pointer-based - algorithms and data structures on heterogeneous devices, and allows for - increased re-use of code across host and device; - * the hierarchical parallelism syntax offers a way of expressing - data parallelism similar to the OpenCL device or OpenMP target - device execution model in an easy-to-understand modern {cpp} form. It - more cleanly layers parallel loops to avoid fragmentation of code and to + allocation and movement. + This approach enables the use of pointer-based algorithms and data + structures on heterogeneous devices, and allows for increased re-use of code + across host and device; + * the hierarchical parallelism syntax offers a way of expressing data + parallelism similar to the OpenCL device or OpenMP target device execution + model in an easy-to-understand modern {cpp} form. + It more cleanly layers parallel loops to avoid fragmentation of code and to more efficiently map to CPU-style architectures. -SYCL retains the execution model, runtime feature set and device -capabilities inspired by the OpenCL standard. This standard imposes -some limitations on the full range of {cpp} features that SYCL is able -to support. This ensures portability of device code across as wide a -range of devices as possible. As a result, while the code can be -written in standard {cpp} syntax with interoperability with standard {cpp} -programs, the entire set of {cpp} features is not available in SYCL -device code. In particular, SYCL device code, as defined by this -specification, does not support virtual function calls, function -pointers in general, exceptions, runtime type information or the full -set of {cpp} libraries that may depend on these features or on features -of a particular host compiler. Nevertheless, these basic restrictions -can be relieved by some specific Khronos or vendor extensions. +SYCL retains the execution model, runtime feature set and device capabilities +inspired by the OpenCL standard. +This standard imposes some limitations on the full range of {cpp} features that +SYCL is able to support. +This ensures portability of device code across as wide a range of devices as +possible. +As a result, while the code can be written in standard {cpp} syntax with +interoperability with standard {cpp} programs, the entire set of {cpp} features +is not available in SYCL device code. +In particular, SYCL device code, as defined by this specification, does not +support virtual function calls, function pointers in general, exceptions, +runtime type information or the full set of {cpp} libraries that may depend on +these features or on features of a particular host compiler. +Nevertheless, these basic restrictions can be relieved by some specific Khronos +or vendor extensions. -SYCL implements an <> design which offers the power of source -integration while allowing toolchains to remain flexible. The <> -design supports embedding of code intended to be compiled for a device, -for example a GPU, inline with host code. This embedding of code offers three -primary benefits: +SYCL implements an <> design which offers the power of source integration +while allowing toolchains to remain flexible. +The <> design supports embedding of code intended to be compiled for a +device, for example a GPU, inline with host code. +This embedding of code offers three primary benefits: Simplicity:: - For novice programmers using frameworks like OpenCL, the separation of - host and device source code in OpenCL can become complicated to deal - with, particularly when similar kernel code is used for multiple - different operations on different data types. A single compiler flow and - integrated tool chain combined with libraries that perform a lot of - simple tasks simplifies initial OpenCL programs to a minimum complexity. - This reduces the learning curve for programmers new to heterogeneous programming and allows - them to concentrate on parallelization techniques rather than syntax. + For novice programmers using frameworks like OpenCL, the separation of host + and device source code in OpenCL can become complicated to deal with, + particularly when similar kernel code is used for multiple different + operations on different data types. + A single compiler flow and integrated tool chain combined with libraries + that perform a lot of simple tasks simplifies initial OpenCL programs to a + minimum complexity. + This reduces the learning curve for programmers new to heterogeneous + programming and allows them to concentrate on parallelization techniques + rather than syntax. Reuse:: {cpp}'s type system allows for complex interactions between different code - units and supports efficient abstract interface design and reuse of - library code. For example, a [keyword]#transform# or [keyword]#map# - operation applied to an array of data may allow specialization on both - the operation applied to each element of the array and on the type of - the data. The <> design of SYCL enables this interaction to - bridge the host code/device code boundary such that the device code to - be specialized on both of these factors directly from the host code. + units and supports efficient abstract interface design and reuse of library + code. + For example, a [keyword]#transform# or [keyword]#map# operation applied to + an array of data may allow specialization on both the operation applied to + each element of the array and on the type of the data. + The <> design of SYCL enables this interaction to bridge the host + code/device code boundary such that the device code to be specialized on + both of these factors directly from the host code. Efficiency:: - Tight integration with the type system and reuse of library code enables - a compiler to perform inlining of code and to produce efficient - specialized device code based on decisions made in the host code without - having to generate kernel source strings dynamically. + Tight integration with the type system and reuse of library code enables a + compiler to perform inlining of code and to produce efficient specialized + device code based on decisions made in the host code without having to + generate kernel source strings dynamically. The use of {cpp} features such as generic programming, templated code, -functional programming and inheritance on top of existing -heterogeneous execution model opens a wide scope for innovation in -software design for heterogeneous systems. Clean integration of device -and host code within a single {cpp} type system enables the development -of modern, templated generic and adaptable libraries that build -simple, yet efficient, interfaces to offer more developers access to -heterogeneous computing capabilities and devices. SYCL is intended to -serve as a foundation for innovation in programming models for -heterogeneous systems, that builds on open and widely implemented -standard foundation like OpenCL or Vulkan. +functional programming and inheritance on top of existing heterogeneous +execution model opens a wide scope for innovation in software design for +heterogeneous systems. +Clean integration of device and host code within a single {cpp} type system +enables the development of modern, templated generic and adaptable libraries +that build simple, yet efficient, interfaces to offer more developers access to +heterogeneous computing capabilities and devices. +SYCL is intended to serve as a foundation for innovation in programming models +for heterogeneous systems, that builds on open and widely implemented standard +foundation like OpenCL or Vulkan. -SYCL is designed to be as close to standard {cpp} as possible. In -practice, this means that as long as no dependence is created on -SYCL's integration with the underlying implementation, a -standard {cpp} compiler can compile SYCL programs and they will run -correctly on a host CPU. Any use of specialized low-level features can -be masked using the C preprocessor in the same way that -compiler-specific intrinsics may be hidden to ensure portability -between different host compilers. +SYCL is designed to be as close to standard {cpp} as possible. +In practice, this means that as long as no dependence is created on SYCL's +integration with the underlying implementation, a standard {cpp} compiler can +compile SYCL programs and they will run correctly on a host CPU. +Any use of specialized low-level features can be masked using the C preprocessor +in the same way that compiler-specific intrinsics may be hidden to ensure +portability between different host compilers. SYCL is designed to allow a compilation flow where the source file is passed -through multiple different compilers, including a standard {cpp} host compiler of -the developer's choice, and where the resulting application combines the results -of these compilation passes. This is distinct from a single-source flow that -might use language extensions that preclude the use of a standard host compiler. +through multiple different compilers, including a standard {cpp} host compiler +of the developer's choice, and where the resulting application combines the +results of these compilation passes. +This is distinct from a single-source flow that might use language extensions +that preclude the use of a standard host compiler. The SYCL standard does not preclude the use of a single compiler flow, but is -designed to not require it. SYCL can also be implemented purely as a library, -in which case no special compiler support is required at all. +designed to not require it. +SYCL can also be implemented purely as a library, in which case no special +compiler support is required at all. -The advantages of this design are two-fold. First, it offers better integration -with existing tool chains. An application that already builds using a chosen -compiler can continue to do so when SYCL code is added. Using the SYCL tools on -a source file within a project will both compile for a device and let -the same source file be compiled using the same host compiler that the rest of -the project is compiled with. Linking and library relationships are unaffected. -This design simplifies porting of pre-existing applications to SYCL. Second, the -design allows the optimal compiler to be chosen for each device where different -vendors may provide optimized tool-chains. +The advantages of this design are two-fold. +First, it offers better integration with existing tool chains. +An application that already builds using a chosen compiler can continue to do so +when SYCL code is added. +Using the SYCL tools on a source file within a project will both compile for a +device and let the same source file be compiled using the same host compiler +that the rest of the project is compiled with. +Linking and library relationships are unaffected. +This design simplifies porting of pre-existing applications to SYCL. +Second, the design allows the optimal compiler to be chosen for each device +where different vendors may provide optimized tool-chains. -To summarize, SYCL enables computational kernels to be written inside -{cpp} source files as normal {cpp} code, leading to the concept of -"`single-source`" programming. This means that software developers can -develop and use generic algorithms and data structures using standard -{cpp} template techniques, while still supporting multi-platform, -multi-device heterogeneous execution. Access to the low level APIs of -an underlying implementation (such as OpenCL) is also supported. -The specification has been designed to enable implementation -across as wide a variety of platforms as possible as well as ease of -integration with other platform-specific technologies, thereby letting -both users and implementers build on top of SYCL as an open platform -for system-wide heterogeneous processing innovation. +To summarize, SYCL enables computational kernels to be written inside {cpp} +source files as normal {cpp} code, leading to the concept of "`single-source`" +programming. +This means that software developers can develop and use generic algorithms and +data structures using standard {cpp} template techniques, while still supporting +multi-platform, multi-device heterogeneous execution. +Access to the low level APIs of an underlying implementation (such as OpenCL) is +also supported. +The specification has been designed to enable implementation across as wide a +variety of platforms as possible as well as ease of integration with other +platform-specific technologies, thereby letting both users and implementers +build on top of SYCL as an open platform for system-wide heterogeneous +processing innovation. // %%%%%%%%%%%%%%%%%%%%%%%%%%%% end introduction %%%%%%%%%%%%%%%%%%%%%%%%%%%% diff --git a/adoc/chapters/opencl_backend.adoc b/adoc/chapters/opencl_backend.adoc index 6ed5edd1..dd1bd0fe 100644 --- a/adoc/chapters/opencl_backend.adoc +++ b/adoc/chapters/opencl_backend.adoc @@ -13,11 +13,11 @@ applications written for the OpenCL backend are interoperable. [[sec:opencl:native-interop-application]] == SYCL application interoperability native backend objects -For each <> class which supports <> interoperability, -specializations of [code]#backend_traits::input_type# -and [code]#backend_traits::return_type# must be defined as the -type of <> interoperability <> -associated with [code]#SyclType# for the <>. +For each <> class which supports <> +interoperability, specializations of [code]#backend_traits::input_type# and +[code]#backend_traits::return_type# must be defined as the type of +<> interoperability <> associated with +[code]#SyclType# for the <>. The types of the native backend objects for <> interoperability are described in <>. @@ -26,9 +26,9 @@ interoperability are described in <>. == Kernel function interoperability native backend objects For each <> class which supports kernel function interoperability, -a specialization of [code]#backend_traits::return_type# must be defined as the type of kernel -function interoperability <> associated with [code]#SyclType# -for the <>. +a specialization of [code]#backend_traits::return_type# must be defined as the +type of kernel function interoperability <> associated +with [code]#SyclType# for the <>. The types of the native backend objects for kernel function interoperability are described in <>. @@ -94,61 +94,60 @@ is described in <>. // From 3.8 SYCL for OpenCL Framework == SYCL for OpenCL framework -The SYCL framework allows applications to -use a host and one or more OpenCL devices as a single heterogeneous parallel -computer system. The framework contains the following components: +The SYCL framework allows applications to use a host and one or more OpenCL +devices as a single heterogeneous parallel computer system. +The framework contains the following components: - * <>: The template library provides a set of {cpp} templates - and classes which provide the programming model to the user. It enables - the creation of runtime classes such as SYCL queues, buffers and images, - as well as access to some underlying OpenCL runtime object, such as + * <>: The template library provides a set of {cpp} templates and + classes which provide the programming model to the user. + It enables the creation of runtime classes such as SYCL queues, buffers and + images, as well as access to some underlying OpenCL runtime object, such as contexts, platforms, devices and program objects. - * <>: The <> interfaces with the - underlying OpenCL implementations and handles scheduling of commands in - queues, moving of data between host and devices, manages contexts, - programs, kernel compilation and memory management. - * [keyword]#OpenCL Implementation(s)#: The SYCL system assumes the - existence of one or more OpenCL implementations available on the host - machine. - * SYCL <>: The SYCL <> compile - SYCL {cpp} kernels into a format which can be executed on an OpenCL device - at runtime. There may be more than one SYCL device compiler in a SYCL - implementation. The format of the compiled SYCL kernels is not defined. - A SYCL device compiler may, or may not, also compile the host parts of - the program. - -The OpenCL backend is enabled using the [code]#sycl::backend::opencl# -value of [code]#enum class backend#. That means that when the OpenCL -backend is active, the value of + * <>: The <> interfaces with the underlying OpenCL + implementations and handles scheduling of commands in queues, moving of data + between host and devices, manages contexts, programs, kernel compilation and + memory management. + * [keyword]#OpenCL Implementation(s)#: The SYCL system assumes the existence + of one or more OpenCL implementations available on the host machine. + * SYCL <>: The SYCL <> compile SYCL {cpp} kernels into a format which can be executed + on an OpenCL device at runtime. + There may be more than one SYCL device compiler in a SYCL implementation. + The format of the compiled SYCL kernels is not defined. + A SYCL device compiler may, or may not, also compile the host parts of the + program. + +The OpenCL backend is enabled using the [code]#sycl::backend::opencl# value of +[code]#enum class backend#. +That means that when the OpenCL backend is active, the value of [code]#sycl::is_backend_active::value# will be [code]#true#. == Mapping of SYCL programming model on top of OpenCL -The SYCL programming model was originally designed as a high-level model -for the OpenCL API, hence the mapping of SYCL on the OpenCL API is -mostly straightforward. +The SYCL programming model was originally designed as a high-level model for the +OpenCL API, hence the mapping of SYCL on the OpenCL API is mostly +straightforward. -When the OpenCL backend is active on a SYCL application, all visible -OpenCL platforms are exported as SYCL platforms. +When the OpenCL backend is active on a SYCL application, all visible OpenCL +platforms are exported as SYCL platforms. // From Architecture, Section 3.3 -When a SYCL implementation executes kernels on an OpenCL -device, it achieves this by enqueuing OpenCL *commands* to -execute computations on the processing elements within a device. The -processing elements within an OpenCL compute unit may execute a single -stream of instructions as ALUs within a SIMD unit (which execute in -lockstep with a single stream of instructions), as independent SPMD -units (where each PE maintains its own program counter) or as some -combination of the two. +When a SYCL implementation executes kernels on an OpenCL device, it achieves +this by enqueuing OpenCL *commands* to execute computations on the processing +elements within a device. +The processing elements within an OpenCL compute unit may execute a single +stream of instructions as ALUs within a SIMD unit (which execute in lockstep +with a single stream of instructions), as independent SPMD units (where each PE +maintains its own program counter) or as some combination of the two. === Backend specific information descriptors -Some of the SYCL information descriptors are backend-defined. For the OpenCL -backend these information descriptors map directly to OpenCL properties as -described in the table below: +Some of the SYCL information descriptors are backend-defined. +For the OpenCL backend these information descriptors map directly to OpenCL +properties as described in the table below: [[table.opencl.info]] .Mapping of SYCL information descriptors to OpenCL properties @@ -161,15 +160,16 @@ described in the table below: === OpenCL memory model -The memory model for SYCL devices running on OpenCL platforms follows the -memory model of the OpenCL version they conform to. +The memory model for SYCL devices running on OpenCL platforms follows the memory +model of the OpenCL version they conform to. -In addition to <> , <> and <> memory, -the OpenCL backend permits the use of <> space in SYCL: +In addition to <> , <> and <> +memory, the OpenCL backend permits the use of <> space in SYCL: - * <> is a region of memory that remains constant - during the execution of a kernel. A pointer to the generic address space cannot - represent an address to this memory region. + * <> is a region of memory that remains + constant during the execution of a kernel. + A pointer to the generic address space cannot represent an address to this + memory region. Work-items executing in a kernel have access to four distinct memory regions, with the mapping between SYCL and OpenCL described in <>. @@ -188,47 +188,57 @@ with the mapping between SYCL and OpenCL described in <>. === OpenCL interface for buffer command accessors The enumerator [code]#target::constant_buffer# is deprecated, but will remain a -part of the OpenCL backend as an extension. This enables SYCL kernel functions -to access the contents of a buffer through the OpenCL device’s constant memory. +part of the OpenCL backend as an extension. +This enables SYCL kernel functions to access the contents of a buffer through +the OpenCL device’s constant memory. // From 3.4.1.1 OpenCL resources managed by SYCL Application === OpenCL resources managed by SYCL application -In OpenCL, a developer must create a <> to be able to execute -commands on a device. Creating a context involves choosing a <> -and a list of <>. In SYCL, contexts, platforms and devices all -exist, but the user can choose whether to specify them or have the SYCL -implementation create them automatically. The minimum required object for -submitting work to devices in SYCL is the <>, which contains -references to a platform, device and context internally. +In OpenCL, a developer must create a <> to be able to execute commands +on a device. +Creating a context involves choosing a <> and a list of +<>. +In SYCL, contexts, platforms and devices all exist, but the user can choose +whether to specify them or have the SYCL implementation create them +automatically. +The minimum required object for submitting work to devices in SYCL is the +<>, which contains references to a platform, device and context +internally. The resources managed by SYCL are: - . <>: all features of OpenCL are implemented by platforms. A - platform can be viewed as a given hardware vendor's runtime and the - devices accessible through it. Some devices will only be accessible to - one vendor's runtime and hence multiple platforms may be present. SYCL - manages the different platforms for the user. In SYCL, a platform - resource is accessible through a [code]#sycl::platform# object. + . <>: all features of OpenCL are implemented by platforms. + A platform can be viewed as a given hardware vendor's runtime and the + devices accessible through it. + Some devices will only be accessible to one vendor's runtime and hence + multiple platforms may be present. + SYCL manages the different platforms for the user. + In SYCL, a platform resource is accessible through a [code]#sycl::platform# + object. . <>: any OpenCL resource that is acquired by the user is - attached to a context. A context contains a collection of devices that - the host can use and manages memory objects that can be shared between - the devices. Data movement between devices within a context may be - efficient and hidden by the underlying OpenCL runtime while data - movement between contexts may involve the host. A given context can only - wrap devices owned by a single platform. In SYCL, a context resource is - accessible through a [code]#sycl::context# object. + attached to a context. + A context contains a collection of devices that the host can use and manages + memory objects that can be shared between the devices. + Data movement between devices within a context may be efficient and hidden + by the underlying OpenCL runtime while data movement between contexts may + involve the host. + A given context can only wrap devices owned by a single platform. + In SYCL, a context resource is accessible through a [code]#sycl::context# + object. . <>: platforms provide one or more devices for executing - kernels. In SYCL, a device is accessible through a - [code]#sycl::device# object. + kernels. + In SYCL, a device is accessible through a [code]#sycl::device# object. . <>: OpenCL objects that store implementation - data for the SYCL kernels. These objects are only required for advanced use - in SYCL and are encapsulated in the [code]#sycl::kernel_bundle# class. - . <>: SYCL kernels execute in command queues. The user must - create a queue, which references an associated context, platform and - device. The context, platform and device may be chosen automatically, or - specified by the user. In SYCL, command queues are accessible through - [code]#sycl::queue# objects. + data for the SYCL kernels. + These objects are only required for advanced use in SYCL and are + encapsulated in the [code]#sycl::kernel_bundle# class. + . <>: SYCL kernels execute in command queues. + The user must create a queue, which references an associated context, + platform and device. + The context, platform and device may be chosen automatically, or specified + by the user. + In SYCL, command queues are accessible through [code]#sycl::queue# objects. // Removed from OpenCL Spec document // In OpenCL, queues can operate using in-order execution or out-of-order @@ -247,60 +257,60 @@ The resources managed by SYCL are: // a program that mixes standard OpenCL C kernels and OpenCL API code with // SYCL code and expect fully compatible interoperability. -The OpenCL backend for SYCL ensures maximum compatibility between SYCL -and OpenCL kernels and API. This includes supporting devices with -different capabilities and support for different versions of the -OpenCL C language, in addition to supporting SYCL kernels written in {cpp}. +The OpenCL backend for SYCL ensures maximum compatibility between SYCL and +OpenCL kernels and API. +This includes supporting devices with different capabilities and support for +different versions of the OpenCL C language, in addition to supporting SYCL +kernels written in {cpp}. // Original from 3.6.11, Interfacing with OpenCL // https://cvs.khronos.org/bugzilla/show_bug.cgi?id=10426 -<> classes which encapsulate an OpenCL opaque type such as -SYCL [code]#context# or SYCL [code]#queue# must provide an -interoperability constructor taking an instance of the OpenCL opaque type. +<> classes which encapsulate an OpenCL opaque type such as SYCL +[code]#context# or SYCL [code]#queue# must provide an interoperability +constructor taking an instance of the OpenCL opaque type. When the OpenCL object supports reference counting, these constructors must retain that instance to increase the reference count of the OpenCL resource. Likewise, the destructor for the <> classes which encapsulate a reference counted OpenCL opaque type must release that instance to decrease the -reference count of the OpenCL resource. Since the OpenCL [code]#platform_id# -is not reference counted, the encapsulating SYCL [code]#platform# class neither -retains nor releases this OpenCL resource. - -Note that an instance of a <> class which encapsulates an -OpenCL opaque type can encapsulate any number of instances of the OpenCL -type, unless it was constructed via the interoperability constructor, in -which case it can encapsulate only a single instance of the OpenCL type. - -The lifetime of a <> class that encapsulates an OpenCL -opaque type and the instance of that opaque type retrieved via the -[code]#get_native()# free function are not tied in either direction given -correct usage of OpenCL reference counting. For example if a user were to -retrieve a [code]#cl_command_queue# instance from a SYCL -[code]#queue# instance and then immediately destroy the SYCL -[code]#queue# instance, the [code]#cl_command_queue# instance is -still valid. Or if a user were to construct a SYCL [code]#queue# -instance from a [code]#cl_command_queue# instance and then immediately -release the [code]#cl_command_queue# instance, the SYCL -[code]#queue# instance is still valid. - -Note that a <> class that encapsulates an OpenCL opaque type -is not responsible for any incorrect use of OpenCL reference counting -outside of the <>. For example if a user were to retrieve a -[code]#cl_command_queue# instance from a SYCL [code]#queue# -instance and then release the [code]#cl_command_queue# instance more -than once without any prior retain then the SYCL [code]#queue# instance -that the [code]#cl_command_queue# instance was retrieved from is now +reference count of the OpenCL resource. +Since the OpenCL [code]#platform_id# is not reference counted, the encapsulating +SYCL [code]#platform# class neither retains nor releases this OpenCL resource. + +Note that an instance of a <> class which encapsulates an OpenCL +opaque type can encapsulate any number of instances of the OpenCL type, unless +it was constructed via the interoperability constructor, in which case it can +encapsulate only a single instance of the OpenCL type. + +The lifetime of a <> class that encapsulates an OpenCL opaque type +and the instance of that opaque type retrieved via the [code]#get_native()# free +function are not tied in either direction given correct usage of OpenCL +reference counting. +For example if a user were to retrieve a [code]#cl_command_queue# instance from +a SYCL [code]#queue# instance and then immediately destroy the SYCL +[code]#queue# instance, the [code]#cl_command_queue# instance is still valid. +Or if a user were to construct a SYCL [code]#queue# instance from a +[code]#cl_command_queue# instance and then immediately release the +[code]#cl_command_queue# instance, the SYCL [code]#queue# instance is still +valid. + +Note that a <> class that encapsulates an OpenCL opaque type is +not responsible for any incorrect use of OpenCL reference counting outside of +the <>. +For example if a user were to retrieve a [code]#cl_command_queue# instance from +a SYCL [code]#queue# instance and then release the [code]#cl_command_queue# +instance more than once without any prior retain then the SYCL [code]#queue# +instance that the [code]#cl_command_queue# instance was retrieved from is now undefined. -Note that an instance of the SYCL [code]#buffer# or SYCL -[code]#image# class templates constructed via the interoperability -constructor is free to copy from the [code]#cl_mem# into another memory -allocation within the <> to achieve normal SYCL semantics, -for as long as the SYCL [code]#buffer# or SYCL [code]#image# -instance is alive. +Note that an instance of the SYCL [code]#buffer# or SYCL [code]#image# class +templates constructed via the interoperability constructor is free to copy from +the [code]#cl_mem# into another memory allocation within the <> to +achieve normal SYCL semantics, for as long as the SYCL [code]#buffer# or SYCL +[code]#image# instance is alive. -<> relates SYCL objects -to their OpenCL native type in the SYCL application. +<> relates SYCL objects to their OpenCL native type in the +SYCL application. [[table.opencl.interop]] .List of native types per SYCL object in the OpenCL backend @@ -406,7 +416,8 @@ unsampled_image The interoperability interface will return a list of active images in the SYCL runtime. |==== -Inside the SYCL kernel, the SYCL API offers interoperability with OpenCL device types. +Inside the SYCL kernel, the SYCL API offers interoperability with OpenCL device +types. <> describes the mapping of kernel types. [[table.opencl.kerneltypes]] @@ -434,27 +445,27 @@ multi_ptr::get_decorated() // From 3.7 memory object -When a buffer or image is allocated on more than -one OpenCL device, if these devices are on separate contexts then multiple -[code]#cl_mem# objects may be allocated for the memory object, depending on -whether the object has actively been used on these devices yet or not. +When a buffer or image is allocated on more than one OpenCL device, if these +devices are on separate contexts then multiple [code]#cl_mem# objects may be +allocated for the memory object, depending on whether the object has actively +been used on these devices yet or not. // From 3.10 Language restrictions in kernels -The OpenCL C function qualifier [code]#+__kernel+# and the access -qualifiers: [code]#+__read_only+#, [code]#+__write_only+# and [code]#+__read_write+# -are not exposed in SYCL via keywords, but are instead encapsulated in -SYCL's parameter passing system inside accessors. Users wishing to -achieve the OpenCL equivalent of these qualifiers in SYCL should -instead use SYCL accessors with equivalent semantics. +The OpenCL C function qualifier [code]#+__kernel+# and the access qualifiers: +[code]#+__read_only+#, [code]#+__write_only+# and [code]#+__read_write+# are not +exposed in SYCL via keywords, but are instead encapsulated in SYCL's parameter +passing system inside accessors. +Users wishing to achieve the OpenCL equivalent of these qualifiers in SYCL +should instead use SYCL accessors with equivalent semantics. // From 3.10.1 SYCL Linker -Any OpenCL C function included in a pre-built OpenCL library can be -defined as an [code]#extern "C"# function and the OpenCL program -has to be linked against any SYCL program that contains kernels using -the external function. In this case, the data types used have to comply with -the interoperability aliases defined in <>. +Any OpenCL C function included in a pre-built OpenCL library can be defined as +an [code]#extern "C"# function and the OpenCL program has to be linked against +any SYCL program that contains kernels using the external function. +In this case, the data types used have to comply with the interoperability +aliases defined in <>. == Programming interface @@ -465,8 +476,8 @@ The following section describes the OpenCL-specific API. The OpenCL backend provides the following specializations of the [code]#make_{sycl_class}# template functions which are defined in -<>. These functions are in the -[code]#sycl# namespace. +<>. +These functions are in the [code]#sycl# namespace. [width="100%",options="header",separator="@",cols="40%,60%"] |==== @@ -657,9 +668,10 @@ bool has_extension(const sycl::device& syclDevice, const std::string& extension) === Reference counting -Most OpenCL objects are reference counted. The SYCL general programming model -doesn't require that native objects are reference counted. However, for -convenience, the following function is provided in the +Most OpenCL objects are reference counted. +The SYCL general programming model doesn't require that native objects are +reference counted. +However, for convenience, the following function is provided in the [code]#sycl::opencl# namespace. [width="100%",options="header",separator="@",cols="35%,65%"] @@ -677,9 +689,10 @@ template cl_uint get_reference_count(openCLT obj) === Errors and limitations If there is an OpenCL error associated with an exception triggered, then the -OpenCL error code can be obtained by the free function [code]#cl_int sycl::opencl::get_error_code(sycl::exception&)#. In the case where there is -no OpenCL error associated with the exception triggered, the OpenCL error -code will be [code]#CL_SUCCESS#. +OpenCL error code can be obtained by the free function [code]#cl_int +sycl::opencl::get_error_code(sycl::exception&)#. +In the case where there is no OpenCL error associated with the exception +triggered, the OpenCL error code will be [code]#CL_SUCCESS#. // TODO: Errors and limitations @@ -703,18 +716,19 @@ code will be [code]#CL_SUCCESS#. [[sec:opencl:interop-kernel-bundle]] === Interoperability with kernel bundles -In <> any kernel function that is enqueued over an nd-range -is represented by a [code]#cl_kernel# and must be compiled and linked via a -[code]#cl_program# using [code]#clBuildProgram#, -[code]#clCompileProgram# and [code]#clLinkProgram#. +In <> any kernel function that is enqueued over an nd-range is +represented by a [code]#cl_kernel# and must be compiled and linked via a +[code]#cl_program# using [code]#clBuildProgram#, [code]#clCompileProgram# and +[code]#clLinkProgram#. -For OpenCL <> this detail is abstracted away by <> and -a [code]#kernel_bundle# object containing all <> -is retrieved by calling the free function [code]#get_kernel_bundle#. +For OpenCL <> this detail is abstracted away by <> and a [code]#kernel_bundle# object containing all +<> is retrieved by calling the free +function [code]#get_kernel_bundle#. -The OpenCL <> specification provides additional free functions -which provide convenience functions for constructing kernel bundles -from OpenCL specific objects. +The OpenCL <> specification provides additional free functions which +provide convenience functions for constructing kernel bundles from OpenCL +specific objects. [source,,linenums] ---- @@ -728,24 +742,26 @@ kernel_bundle create_bundle(const context& ctxt, const std::vector& devs, const std::vector& clPrograms) ---- - . _Preconditions:_ The <> specified by [code]#ctxt# - must be associated with the OpenCL <>. + . _Preconditions:_ The <> specified by [code]#ctxt# must be + associated with the OpenCL <>. All devices in [code]#devs# must be associated with [code]#ctxt#. - All OpenCL programs in [code]#clPrograms# must be associated with [code]#ctxt#. + All OpenCL programs in [code]#clPrograms# must be associated with + [code]#ctxt#. + -- _Effects:_ Constructs a <> in the specified [code]#bundle_state# -from the provided list of OpenCL programs and associated with the -<> specified by [code]#syclContext# by invoking the necessary OpenCL APIs. -Follows the same rules as calling [code]#make_kernel_bundle# on a single OpenCL program, -except that the rules apply to all OpenCL programs in [code]#clPrograms#. -Multiple programs will be linked together into a single one -if required by the requested [code]#State#. +from the provided list of OpenCL programs and associated with the <> +specified by [code]#syclContext# by invoking the necessary OpenCL APIs. +Follows the same rules as calling [code]#make_kernel_bundle# on a single OpenCL +program, except that the rules apply to all OpenCL programs in +[code]#clPrograms#. +Multiple programs will be linked together into a single one if required by the +requested [code]#State#. The constructed [code]#kernel_bundle# will retain all provided OpenCL programs and will also release them on destruction. -_Throws:_ An [code]#exception# with the [code]#errc::build# error code if any error is produced -by invoking the OpenCL APIs. +_Throws:_ An [code]#exception# with the [code]#errc::build# error code if any +error is produced by invoking the OpenCL APIs. -- [source,,linenums] @@ -754,43 +770,45 @@ kernel_bundle create_bundle(const context& ctxt, const std::vector& devs, const std::vector& clKernels) ---- - . _Preconditions:_ The <> specified by [code]#ctxt# - must be associated with the OpenCL <>. + . _Preconditions:_ The <> specified by [code]#ctxt# must be + associated with the OpenCL <>. All devices in [code]#devs# must be associated with [code]#ctxt#. - All OpenCL kernels in [code]#clKernels# must be associated with [code]#ctxt#. + All OpenCL kernels in [code]#clKernels# must be associated with + [code]#ctxt#. + -- -_Effects:_ Constructs an executable <> -from the provided list of OpenCL kernels and associated with the -<> specified by [code]#syclContext# by invoking the necessary OpenCL APIs. -[code]#cl_kernel# objects might be associated with different [code]#cl_program# objects, -the kernel bundle will encapsulate all of them. - -_Throws:_ An [code]#exception# with the [code]#errc::build# error code if any error is produced -by invoking the OpenCL APIs. +_Effects:_ Constructs an executable <> from the provided list of +OpenCL kernels and associated with the <> specified by +[code]#syclContext# by invoking the necessary OpenCL APIs. +[code]#cl_kernel# objects might be associated with different [code]#cl_program# +objects, the kernel bundle will encapsulate all of them. + +_Throws:_ An [code]#exception# with the [code]#errc::build# error code if any +error is produced by invoking the OpenCL APIs. -- === Interoperability with kernels -A [code]#kernel_bundle# object contains one or multiple OpenCL programs -and one or multiple OpenCL kernels. -Calling [code]#kernel_bundle::get_kernel# returns a [code]#kernel# object - which can be invoked by any of -<> such as [code]#parallel_for# which take -a [code]#kernel# but not <>. +A [code]#kernel_bundle# object contains one or multiple OpenCL programs and one + or multiple OpenCL kernels. + Calling [code]#kernel_bundle::get_kernel# returns a [code]#kernel# object which + can be invoked by any of +<> such as +[code]#parallel_for# which take a [code]#kernel# but not +<>. -Calling [code]#make_kernel# must trigger a call to [code]#clRetainKernel# -and the resulting [code]#kernel# object must call -[code]#clReleaseKernel# on destruction. +Calling [code]#make_kernel# must trigger a call to [code]#clRetainKernel# and +the resulting [code]#kernel# object must call [code]#clReleaseKernel# on +destruction. -It is also possible to construct a <> from previously created OpenCL -[code]#cl_kernel# objects by calling the free function [code]#create_bundle# -as described in <>. +It is also possible to construct a <> from previously created +OpenCL [code]#cl_kernel# objects by calling the free function +[code]#create_bundle# as described in <>. The kernel arguments for the OpenCL C kernel kernel can either be set prior to -creating the [code]#kernel# object or by calling [code]#set_arg# or [code]#set_args# -member functions of the [code]#handler# class. +creating the [code]#kernel# object or by calling [code]#set_arg# or +[code]#set_args# member functions of the [code]#handler# class. If kernel arguments are set prior to creating the [code]#kernel# object the <> is not responsible for managing the data of these arguments. @@ -799,24 +817,24 @@ If kernel arguments are set prior to creating the [code]#kernel# object the [[sec:opencl:kernel-conventions-sycl]] === OpenCL kernel conventions and SYCL -OpenCL and SYCL use opposite conventions for the unit stride dimension. SYCL -aligns with {cpp} conventions, which is important to understand from a -performance perspective when porting code to SYCL. The unit stride -dimension, at least for data, is implicit in the linearization equations in -SYCL (<>) and OpenCL. SYCL aligns with -{cpp} array subscript ordering [code]#arr[a][b][c]#, in that range -constructor dimension ordering used to launch a kernel (e.g. -[code]#range<3> R{a,b,c}#) and range and ID queries within a kernel, -are ordered in the same way as the {cpp} multi-dimensional subscript operators -(unit stride on the right). - -When specifying a [code]#range# as the global or local size -in a [code]#parallel_for# that invokes an OpenCL interop kernel (through -[code]#cl_kernel# interop), -the highest dimension of the range in SYCL will map to the -lowest dimension within the OpenCL kernel. That statement applies to both -an underlying enqueue operation such as [code]#clEnqueueNDRangeKernel# -in OpenCL, and also ID and size queries within the OpenCL kernel. +OpenCL and SYCL use opposite conventions for the unit stride dimension. +SYCL aligns with {cpp} conventions, which is important to understand from a +performance perspective when porting code to SYCL. +The unit stride dimension, at least for data, is implicit in the linearization +equations in SYCL (<>) and OpenCL. +SYCL aligns with {cpp} array subscript ordering [code]#arr[a][b][c]#, in that +range constructor dimension ordering used to launch a kernel (e.g. +[code]#range<3> R{a,b,c}#) and range and ID queries within a kernel, are ordered +in the same way as the {cpp} multi-dimensional subscript operators (unit stride +on the right). + +When specifying a [code]#range# as the global or local size in a +[code]#parallel_for# that invokes an OpenCL interop kernel (through +[code]#cl_kernel# interop), the highest dimension of the range in SYCL will map +to the lowest dimension within the OpenCL kernel. +That statement applies to both an underlying enqueue operation such as +[code]#clEnqueueNDRangeKernel# in OpenCL, and also ID and size queries within +the OpenCL kernel. For example, a 3D global range specified in SYCL as: [source] @@ -824,8 +842,7 @@ For example, a 3D global range specified in SYCL as: range<3> R { r0, r1, r2 }; ---- -maps to an [code]#clEnqueueNDRangeKernel# [code]#global_work_size# argument -of: +maps to an [code]#clEnqueueNDRangeKernel# [code]#global_work_size# argument of: [source] ---- @@ -839,25 +856,25 @@ Likewise, a 2D global range specified in SYCL as: range<2> R { r0, r1 }; ---- -maps to an [code]#clEnqueueNDRangeKernel# [code]#global_work_size# argument -of: +maps to an [code]#clEnqueueNDRangeKernel# [code]#global_work_size# argument of: [source] ---- size_t cl_interop_range[2] = { r1, r0 }; ---- -The mapping of highest dimension in SYCL to lowest dimension in OpenCL applies to all -operations where a multi-dimensional construct must be mapped, such as when mapping SYCL -explicit memory operations to OpenCL APIs like [code]#clEnqueueCopyBufferRect#. +The mapping of highest dimension in SYCL to lowest dimension in OpenCL applies +to all operations where a multi-dimensional construct must be mapped, such as +when mapping SYCL explicit memory operations to OpenCL APIs like +[code]#clEnqueueCopyBufferRect#. -Work-item and work-group ID and range queries have the same reversed -convention for unit stride dimension between SYCL and OpenCL. For example, -with three, two, or one dimensional SYCL global ranges, OpenCL and SYCL -kernel code queries relate to the range as shown in -<>. The "SYCL kernel query" column -applies for SYCL-defined kernels, and the "OpenCL kernel query" column -applies for kernels defined through OpenCL interop. +Work-item and work-group ID and range queries have the same reversed convention +for unit stride dimension between SYCL and OpenCL. +For example, with three, two, or one dimensional SYCL global ranges, OpenCL and +SYCL kernel code queries relate to the range as shown in +<>. +The "SYCL kernel query" column applies for SYCL-defined kernels, and the "OpenCL +kernel query" column applies for kernels defined through OpenCL interop. // Jon: Need to code-format most of these cells and use gray backgrounds on // column-spanning sub-titles. @@ -915,12 +932,13 @@ applies for kernels defined through OpenCL interop. === Data types -The OpenCL C language standard <> defines its own built-in -scalar data types, and these have additional requirements in terms of size and -signedness on top of what is guaranteed by ISO {cpp}. For the purpose of -interoperability and portability, SYCL defines a set of aliases to {cpp} types -within the [code]#sycl::opencl# namespace using the [code]#cl_# -prefix. These aliases are described in <>. +The OpenCL C language standard <> defines its own +built-in scalar data types, and these have additional requirements in terms of +size and signedness on top of what is guaranteed by ISO {cpp}. +For the purpose of interoperability and portability, SYCL defines a set of +aliases to {cpp} types within the [code]#sycl::opencl# namespace using the +[code]#cl_# prefix. +These aliases are described in <>. [[table.types.aliases]] @@ -1025,29 +1043,28 @@ cl_half == Preprocessor directives and macros - * [code]#SYCL_BACKEND_OPENCL# substitutes to [code]#1# if the OpenCL <> - is active while building the SYCL application. + * [code]#SYCL_BACKEND_OPENCL# substitutes to [code]#1# if the OpenCL + <> is active while building the SYCL application. === Offline linking with OpenCL C libraries -SYCL supports linking <> with OpenCL C libraries -during offline compilation or during online compilation by the +SYCL supports linking <> with OpenCL +C libraries during offline compilation or during online compilation by the <> within a SYCL application. -Linking with OpenCL C kernel functions offline is an optional feature -and is unspecified. Linking with OpenCL C kernel functions online is -performed by using the SYCL [code]#kernel_bundle# class to compile and -link an OpenCL C source; using the [code]#compile_with_source# or -[code]#build_with_source# member functions. +Linking with OpenCL C kernel functions offline is an optional feature and is +unspecified. +Linking with OpenCL C kernel functions online is performed by using the SYCL +[code]#kernel_bundle# class to compile and link an OpenCL C source; using the +[code]#compile_with_source# or [code]#build_with_source# member functions. OpenCL C functions that are linked with, using either offline or online -compilation, must be declared as [code]#extern "C"# function -declarations. The function parameters of these function declarations must be -defined as the OpenCL C interoperability aliases; [code]#pointer# of -the [code]#multi_ptr# class template, [code]#vector_t# of the -[code]#vec# class template and scalar data type aliases described in -<>. +compilation, must be declared as [code]#extern "C"# function declarations. +The function parameters of these function declarations must be defined as the +OpenCL C interoperability aliases; [code]#pointer# of the [code]#multi_ptr# +class template, [code]#vector_t# of the [code]#vec# class template and scalar +data type aliases described in <>. // \include{opencl_extensions} // %%%%%%%%%%%%%%%%%%%%%%%%%%%% begin opencl_extensions %%%%%%%%%%%%%%%%%%%%%%%%%%%% @@ -1058,24 +1075,23 @@ In addition to the OpenCL core features, SYCL also provides support for OpenCL extensions which provide features in OpenCL via khr extensions. Some extensions are natively supported within the SYCL interface, however some -can only be used via the OpenCL interoperability interface. The SYCL interface -required for native extensions must be available. However if the respective -extension is not supported by the executing SYCL [code]#device#, the -<> must throw an [code]#exception# with the +can only be used via the OpenCL interoperability interface. +The SYCL interface required for native extensions must be available. +However if the respective extension is not supported by the executing SYCL +[code]#device#, the <> must throw an [code]#exception# with the [code]#errc::feature_not_supported# or [code]#errc::kernel_not_supported# error codes. The OpenCL backend exposes some khr extensions to SYCL applications through the -[code]#sycl::aspect# enumerated type. Therefore, applications can query -for the existence of these khr extensions by calling the [code]#device::has()# -or [code]#platform::has()# member functions. +[code]#sycl::aspect# enumerated type. +Therefore, applications can query for the existence of these khr extensions by +calling the [code]#device::has()# or [code]#platform::has()# member functions. All OpenCL extensions are available through the OpenCL interoperability interface, but some can also be used through core SYCL APIs. <> shows which these are. -<> also shows the mapping from each OpenCL -extension name to its associated SYCL device [code]#aspect# when one is -available. +<> also shows the mapping from each OpenCL extension +name to its associated SYCL device [code]#aspect# when one is available. [[table.extensionsupport]] @@ -1100,72 +1116,74 @@ available. === Half precision floating-point The half scalar data type: [code]#half# and the half vector data types: -[code]#half1#, [code]#half2#, [code]#half3#, -[code]#half4#, [code]#half8# and [code]#half16# must be -available at compile-time. However a kernel using these types is only -supported on devices that have [code]#aspect::fp16#, as described in -<>. +[code]#half1#, [code]#half2#, [code]#half3#, [code]#half4#, [code]#half8# and +[code]#half16# must be available at compile-time. +However a kernel using these types is only supported on devices that have +[code]#aspect::fp16#, as described in <>. -The conversion rules for half precision types follow the same rules as in -the OpenCL 1.2 extensions specification <>. +The conversion rules for half precision types follow the same rules as in the +OpenCL 1.2 extensions specification <>. The math functions for half precision types follow the same rules as in the -OpenCL 1.2 extensions specification <>. The allowed error in ULP(Unit in the Last Place) is -less than 8192, corresponding to <>. +OpenCL 1.2 extensions specification <>. +The allowed error in ULP(Unit in the Last Place) is less than 8192, +corresponding to <>. === Writing to 3D image memory objects -The [code]#unsampled_image_accessor# class -in SYCL supports member functions for writing -3D image memory objects, but this functionality is only allowed on a device -if the extension [code]#cl_khr_3d_image_writes# is -supported on that <>. +The [code]#unsampled_image_accessor# class in SYCL supports member functions for +writing 3D image memory objects, but this functionality is only allowed on a +device if the extension [code]#cl_khr_3d_image_writes# is supported on that +<>. // TODO: Should opencl::aspect::3d_image_writes be promoted to a core SYCL aspect? === Interoperability with OpenGL -Interoperability between SYCL and OpenGL is not directly provided by the SYCL interface, -however can be achieved via the SYCL OpenCL interoperability interface. +Interoperability between SYCL and OpenGL is not directly provided by the SYCL +interface, however can be achieved via the SYCL OpenCL interoperability +interface. == Correspondence of some OpenCL features to SYCL This section describes the correspondence between some OpenCL features and -features in the <> that provide similar functionality. All content -in this section is non-normative. +features in the <> that provide similar functionality. +All content in this section is non-normative. === Work-item functions -The OpenCL 1.2 specification document <> -defines work-item functions that tell various information about the currently -executing work-item in an OpenCL kernel. SYCL provides equivalent -functionality through the item and group classes that are defined in -<>, <> and <>. +The OpenCL 1.2 specification document <> defines work-item functions that tell various information +about the currently executing work-item in an OpenCL kernel. +SYCL provides equivalent functionality through the item and group classes that +are defined in <>, <> and <>. === Vector data load and store functions The functionality from the OpenCL functions as defined in the OpenCL 1.2 -specification document <> is available in SYCL through -the [code]#vec# class in <>. +specification document <> is available in SYCL through the [code]#vec# class in +<>. === Synchronization functions In SYCL the OpenCL [keyword]#synchronization functions# are available through -the [code]#nd_item# class (<>), as they are applied to -work-items for local or global address spaces. Please -see <>. +the [code]#nd_item# class (<>), as they are applied to work-items +for local or global address spaces. +Please see <>. === [code]#printf# function The functionality of the [code]#printf# function is covered by the -[code]#stream# class (<>), which has the -capability to print to standard output all of the SYCL classes and primitives, -and covers the capabilities defined in the OpenCL 1.2 specification -document <>. +[code]#stream# class (<>), which has the capability to print to +standard output all of the SYCL classes and primitives, and covers the +capabilities defined in the OpenCL 1.2 specification document <>. // %%%%%%%%%%%%%%%%%%%%%%%%%%%% end opencl_extensions %%%%%%%%%%%%%%%%%%%%%%%%%%%% diff --git a/adoc/chapters/programming_interface.adoc b/adoc/chapters/programming_interface.adoc index e0e9c20c..8a35db17 100644 --- a/adoc/chapters/programming_interface.adoc +++ b/adoc/chapters/programming_interface.adoc @@ -1,29 +1,31 @@ [[chapter:sycl-programming-interface]] = SYCL programming interface -The SYCL programming interface provides a common abstracted feature set to -one or more <> APIs. This section describes the {cpp} library -interface to the <> which executes across those <>. +The SYCL programming interface provides a common abstracted feature set to one +or more <> APIs. +This section describes the {cpp} library interface to the <> which +executes across those <>. The entirety of the SYCL interface defined in this section is required to be -available for any <>, with the exception of the interoperability -interface, which is described in general terms in this document, not -pertaining to any particular <>. +available for any <>, with the exception of the +interoperability interface, which is described in general terms in this +document, not pertaining to any particular <>. SYCL guarantees that all the member functions and special member functions of the SYCL classes described are thread safe. The underlying types for all enumerations defined in this specification are -implementation-defined. In addition, all enumerators within an enumeration -have some implementation-defined unique value unless the specification -specifically indicates a values for the enumerator. +implementation-defined. +In addition, all enumerators within an enumeration have some +implementation-defined unique value unless the specification specifically +indicates a values for the enumerator. [[sec:backends]] == Backends -The <> that can be supported by a SYCL implementation are identified -using the [code]#enum class backend#. +The <> that can be supported by a SYCL implementation +are identified using the [code]#enum class backend#. [source,,linenums] ---- @@ -32,26 +34,28 @@ include::{header_dir}/backends.h[lines=4..-1] The [code]#enum class backend# is implementation-defined and must be populated with a unique identifier for each <> that the SYCL implementation can -support. Note that the <> listed in the [code]#enum -class backend# are not guaranteed to be available in a given installation. +support. +Note that the <> listed in the [code]#enum class +backend# are not guaranteed to be available in a given installation. -Each named <> enumerated in the [code]#enum class backend# -must be associated with a <> specification. -Many sections of this specification -will refer to the associated <> specification. +Each named <> enumerated in the [code]#enum class backend# must be +associated with a <> specification. +Many sections of this specification will refer to the associated <> +specification. [[sec:backend-macros]] === Backend macros As the identifiers defined in [code]#enum class backend# are -implementation-defined, and the associated backends not guaranteed to be available, -a SYCL implementation must also define a preprocessor macro for each of -these identifiers. If the <> is defined by the Khronos SYCL group, the -name of the macro has the form [code]#SYCL_BACKEND_#, where -_backend_name_ is the associated identifier from [code]#backend# in -all upper-case. See <> for the name of the macro -if the vendor defines the <> outside of the Khronos SYCL group. +implementation-defined, and the associated backends not guaranteed to be +available, a SYCL implementation must also define a preprocessor macro for each +of these identifiers. +If the <> is defined by the Khronos SYCL group, the name of the macro +has the form [code]#SYCL_BACKEND_#, where _backend_name_ is the +associated identifier from [code]#backend# in all upper-case. +See <> for the name of the macro if the vendor defines the +<> outside of the Khronos SYCL group. If a backend listed in the [code]#enum class backend# is not available, the associated macro must be left undefined. @@ -60,17 +64,17 @@ associated macro must be left undefined. == Generic vs non-generic SYCL The SYCL programming API is split into two categories; generic SYCL and -non-generic SYCL. Almost everything in the SYCL programming API is considered -generic SYCL. However any usage of the [code]#enum class backend# is -considered non-generic SYCL and should only be used for <> specialized -code paths, as the identifiers defined in [code]#backend# are -implementation-defined. +non-generic SYCL. +Almost everything in the SYCL programming API is considered generic SYCL. +However any usage of the [code]#enum class backend# is considered non-generic +SYCL and should only be used for <> specialized code paths, as the +identifiers defined in [code]#backend# are implementation-defined. -In any non-generic SYCL application code where the [code]#backend# enum -class is used, the expression must be guarded with a preprocessor -[code]#{hash}ifdef# guard using the associated preprocessor macro to ensure that -the SYCL application will compile even if the SYCL implementation does not -support that <> being specialized for. +In any non-generic SYCL application code where the [code]#backend# enum class is +used, the expression must be guarded with a preprocessor [code]#{hash}ifdef# +guard using the associated preprocessor macro to ensure that the SYCL +application will compile even if the SYCL implementation does not support that +<> being specialized for. [[sec:headers-and-namespaces]] @@ -79,111 +83,80 @@ support that <> being specialized for. SYCL provides one standard header file: [code]##, which needs to be included in every translation unit that uses the SYCL programming API. -All SYCL classes, constants, types and functions defined by this -specification should exist within the [code]#::sycl# namespace. +All SYCL classes, constants, types and functions defined by this specification +should exist within the [code]#::sycl# namespace. -For compatibility with SYCL 1.2.1, SYCL provides another standard -header file: [code]##, which can be included in -place of [code]##. In that case, all SYCL classes, constants, -types and functions defined by this specification should exist within the -[code]#::cl::sycl# {cpp} namespace. +For compatibility with SYCL 1.2.1, SYCL provides another standard header file: +[code]##, which can be included in place of +[code]##. +In that case, all SYCL classes, constants, types and functions defined by this +specification should exist within the [code]#::cl::sycl# {cpp} namespace. For consistency, the programming API will only refer to the -[code]## header and the [code]#::sycl# namespace, but this -should be considered synonymous with the SYCL 1.2.1 header and namespace. +[code]## header and the [code]#::sycl# namespace, but this should +be considered synonymous with the SYCL 1.2.1 header and namespace. Include paths starting with [code]#"sycl/ext/"# and [code]#"sycl/backend/"# are reserved for extensions to SYCL and for backend interop headers respectively. Other include paths starting with [code]#"sycl/"# and the [code]#sycl::detail# namespace are reserved for implementation details. -When a <> is defined by the Khronos SYCL group, functionality -for that <> is available via the header +When a <> is defined by the Khronos SYCL group, functionality for that +<> is available via the header [code]#"sycl/backend/.hpp"#, and all <>-specific functionality is made available in the namespace [code]#sycl::# where [code]## is the name of the <> as defined in the <> specification. -<> defines the allowable header files and -namespaces for any extensions that a vendor may provide, including any -<> that the vendor may define outside of the Khronos SYCL group. +<> defines the allowable header files and namespaces for any +extensions that a vendor may provide, including any <> that the vendor +may define outside of the Khronos SYCL group. -Unless otherwise specified, the behavior of a SYCL program is undefined -if it adds any entity to namespace [code]#sycl# or to a -namespace within namespace [code]#sycl#. +Unless otherwise specified, the behavior of a SYCL program is undefined if it +adds any entity to namespace [code]#sycl# or to a namespace within namespace +[code]#sycl#. == Class availability In SYCL some <> classes are available to the SYCL application, -some are available within a <> and some are available -on both and can be passed as arguments to a <>. - -Each of the following <> classes: -[code]#buffer#, -[code]#buffer_allocator#, -[code]#context#, -[code]#device#, -[code]#device_image#, -[code]#event#, -[code]#exception#, -[code]#handler#, -[code]#host_accessor#, -[code]#host_sampled_image_accessor#, -[code]#host_unsampled_image_accessor#, -[code]#id#, -[code]#image_allocator#, -[code]#kernel#, -[code]#kernel_id#, -[code]#marray#, -[code]#kernel_bundle#, -[code]#nd_range#, -[code]#platform#, -[code]#queue#, -[code]#range#, -[code]#sampled_image#, -[code]#image_sampler#, -[code]#stream#, -[code]#unsampled_image# and -[code]#vec# -must be available to the host application. - -Each of the following <> classes: -[code]#accessor#, -[code]#atomic_ref#, -[code]#device_event#, -[code]#group#, -[code]#h_item#, -[code]#id#, -[code]#item#, -[code]#local_accessor#, -[code]#marray#, -[code]#multi_ptr#, -[code]#nd_item#, -[code]#range#, -[code]#reducer#, -[code]#sampled_image_accessor#, -[code]#stream#, -[code]#sub_group#, -[code]#unsampled_image_accessor# and -[code]#vec# -must be available within a <>. +some are available within a <> and some are available on +both and can be passed as arguments to a <>. + +Each of the following <> classes: [code]#buffer#, +[code]#buffer_allocator#, [code]#context#, [code]#device#, [code]#device_image#, +[code]#event#, [code]#exception#, [code]#handler#, [code]#host_accessor#, +[code]#host_sampled_image_accessor#, [code]#host_unsampled_image_accessor#, +[code]#id#, [code]#image_allocator#, [code]#kernel#, [code]#kernel_id#, +[code]#marray#, [code]#kernel_bundle#, [code]#nd_range#, [code]#platform#, +[code]#queue#, [code]#range#, [code]#sampled_image#, [code]#image_sampler#, +[code]#stream#, [code]#unsampled_image# and [code]#vec# must be available to the +host application. + +Each of the following <> classes: [code]#accessor#, +[code]#atomic_ref#, [code]#device_event#, [code]#group#, [code]#h_item#, +[code]#id#, [code]#item#, [code]#local_accessor#, [code]#marray#, +[code]#multi_ptr#, [code]#nd_item#, [code]#range#, [code]#reducer#, +[code]#sampled_image_accessor#, [code]#stream#, [code]#sub_group#, +[code]#unsampled_image_accessor# and [code]#vec# must be available within a +<>. == Common interface -When a dimension template parameter is used in SYCL classes, it is -defaulted as 1 in most cases. +When a dimension template parameter is used in SYCL classes, it is defaulted as +1 in most cases. [[sec:backend-interoperability]] === Backend interoperability Many of the <> classes may be implemented such that they -encapsulate an object unique to the <> that underpins the -functionality of that class. Where appropriate, these classes may provide an -interface for interoperating between the <> object and the -<> in order to support interoperability within an -application between SYCL and the associated <>. +encapsulate an object unique to the <> that underpins the functionality +of that class. +Where appropriate, these classes may provide an interface for interoperating +between the <> object and the <> in order +to support interoperability within an application between SYCL and the +associated <>. There are three forms of interoperability with <> classes: interoperability on the <> with the <>, @@ -191,45 +164,30 @@ interoperability within a <> with the equivalent kernel language types of the <>, and interoperability within a <> with the [code]#interop_handle#. -<> interoperability, -<> interoperability and <> interoperability -are provided via different interfaces and may have different behavior for the -same SYCL object. +<> interoperability, <> +interoperability and <> interoperability are provided via different +interfaces and may have different behavior for the same SYCL object. <> interoperability may be provided for -[code]#buffer#, -[code]#context#, -[code]#device#, -[code]#device_image#, -[code]#event#, -[code]#kernel#, -[code]#kernel_bundle#, -[code]#platform#, -[code]#queue#, -[code]#sampled_image#, and -[code]#unsampled_image#. - -<> interoperability may be provided for -[code]#accessor#, -[code]#device_event#, -[code]#local_accessor#, -[code]#sampled_image_accessor#, -[code]#stream# and -[code]#unsampled_image_accessor# -inside <> only and is not available outside of that scope. - -<> interoperability may be provided for -[code]#accessor#, -[code]#sampled_image_accessor#, -[code]#unsampled_image_accessor#, -[code]#queue#, -[code]#device#, -[code]#context# -inside the scope of a <> only, see <>. +[code]#buffer#, [code]#context#, [code]#device#, [code]#device_image#, +[code]#event#, [code]#kernel#, [code]#kernel_bundle#, [code]#platform#, +[code]#queue#, [code]#sampled_image#, and [code]#unsampled_image#. + +<> interoperability may be provided +for [code]#accessor#, [code]#device_event#, [code]#local_accessor#, +[code]#sampled_image_accessor#, [code]#stream# and +[code]#unsampled_image_accessor# inside <> only and is not +available outside of that scope. + +<> interoperability may be provided for [code]#accessor#, +[code]#sampled_image_accessor#, [code]#unsampled_image_accessor#, [code]#queue#, +[code]#device#, [code]#context# inside the scope of a <> only, see +<>. Support for <> interoperability is optional and therefore not required -to be provided by a SYCL implementation. A SYCL application using <> -interoperability is considered to be non-generic SYCL. +to be provided by a SYCL implementation. +A SYCL application using <> interoperability is considered to be +non-generic SYCL. Details on the interoperability for a given <> are available on the <> specification document for that <>. @@ -241,45 +199,43 @@ Details on the interoperability for a given <> are available on the include::{header_dir}/interop/typeTraitsBackendTraits.h[lines=4..-1] ---- -A series of type traits are provided for <> interoperability, -defined in the [code]#backend_traits# class. +A series of type traits are provided for <> interoperability, defined +in the [code]#backend_traits# class. A specialization of [code]#backend_traits# must be provided for each named <> enumerated in the enum class [code]#backend# that is available at compile time. * For each <> class [code]#T# which supports - <> interoperability with the <>, a - specialization of [code]#input_type# must be defined as the type - of <> interoperability <> - associated with [code]#T# for the <>, specified in the - <> specification. - [code]#input_type# is used when constructing SYCL objects - from backend specific native objects. + <> interoperability with the <>, a specialization + of [code]#input_type# must be defined as the type of <> + interoperability <> associated with [code]#T# for the + <>, specified in the <> specification. + [code]#input_type# is used when constructing SYCL objects from backend + specific native objects. See the relevant backend specification for details. * For each <> class [code]#T# which supports - <> interoperability with the <>, a - specialization of [code]#return_type# must be defined as the type - of <> interoperability <> - associated with [code]#T# for the <>, specified in the - <> specification. - [code]#return_type# is used when retrieving - the backend specific native object from a SYCL object. + <> interoperability with the <>, a specialization + of [code]#return_type# must be defined as the type of <> + interoperability <> associated with [code]#T# for the + <>, specified in the <> specification. + [code]#return_type# is used when retrieving the backend specific native + object from a SYCL object. See the relevant backend specification for details. - * For each <> class [code]#T# which supports kernel - function interoperability with the <>, a specialization of - [code]#return_type# within [code]#backend_traits# must be - defined as the type of the kernel function interoperability - <> associated with [code]#T# for the - <>, specified in the backend specification. + * For each <> class [code]#T# which supports kernel function + interoperability with the <>, a specialization of + [code]#return_type# within [code]#backend_traits# must be defined as the + type of the kernel function interoperability <> + associated with [code]#T# for the <>, specified in the backend + specification. See the relevant backend specification for details. -The type alias [code]#backend_input_t# is provided -to enable less verbose access to the [code]#input_type# type -within [code]#backend_traits# for a specific SYCL object of type [code]#T#. -The type alias [code]#backend_return_t# is provided -to enable less verbose access to the [code]#return_type# type -within [code]#backend_traits# for a specific SYCL object of type [code]#T#. +The type alias [code]#backend_input_t# is provided to enable less verbose access +to the [code]#input_type# type within [code]#backend_traits# for a specific SYCL +object of type [code]#T#. +The type alias [code]#backend_return_t# is provided to enable less verbose +access to the [code]#return_type# type within [code]#backend_traits# for a +specific SYCL object of type [code]#T#. ==== Template function [code]#get_native# @@ -288,27 +244,24 @@ within [code]#backend_traits# for a specific SYCL object of type [code]#T#. include::{header_dir}/interop/templateFunctionGetNative.h[lines=4..-1] ---- -For each <> class [code]#T# which supports -<> interoperability, a specialization of -[code]#get_native# must be defined, which takes an instance of -[code]#T# and returns a <> interoperability -<> associated with [code]#syclObject# which -can be used for <> interoperability. The lifetime of the -object returned are backend-defined and specified in the backend -specification. - -For each <> class [code]#T# which supports kernel -function interoperability, a specialization of [code]#get_native# must -be defined, which takes an instance of [code]#T# and returns the kernel -function interoperability <> associated with -[code]#syclObject# which can be used for kernel function -interoperability. The availability and behavior of these template -functions is defined by the <> specification document. - -The [code]#get_native# function -must throw an [code]#exception# with the -[code]#errc::backend_mismatch# error code -if the backend of the SYCL object +For each <> class [code]#T# which supports <> +interoperability, a specialization of [code]#get_native# must be defined, which +takes an instance of [code]#T# and returns a <> +interoperability <> associated with [code]#syclObject# +which can be used for <> interoperability. +The lifetime of the object returned are backend-defined and specified in the +backend specification. + +For each <> class [code]#T# which supports kernel function +interoperability, a specialization of [code]#get_native# must be defined, which +takes an instance of [code]#T# and returns the kernel function interoperability +<> associated with [code]#syclObject# which can be used +for kernel function interoperability. +The availability and behavior of these template functions is defined by the +<> specification document. + +The [code]#get_native# function must throw an [code]#exception# with the +[code]#errc::backend_mismatch# error code if the backend of the SYCL object doesn't match the target backend. [[sec:backend-interoperability-make]] @@ -319,107 +272,91 @@ doesn't match the target backend. include::{header_dir}/interop/templateFunctionMakeX.h[lines=4..-1] ---- -For each <> class [code]#T# which supports -<> interoperability, a specialization of the appropriate -template function [code]#make_{sycl_class}# where -[code]#{sycl_class}# is the class name of [code]#T#, must be -defined, which takes a <> interoperability -<> and constructs and returns an instance of -[code]#T#. The availability and behavior of these template -functions is defined by the <> specification document. - -Overloads of the [code]#make_{sycl_class}# function -which take a SYCL <> object as an argument -must throw an [code]#exception# with the -[code]#errc::backend_mismatch# error code -if the backend of the provided SYCL context -doesn't match the target backend. +For each <> class [code]#T# which supports <> +interoperability, a specialization of the appropriate template function +[code]#make_{sycl_class}# where [code]#{sycl_class}# is the class name of +[code]#T#, must be defined, which takes a <> interoperability +<> and constructs and returns an instance of [code]#T#. +The availability and behavior of these template functions is defined by the +<> specification document. + +Overloads of the [code]#make_{sycl_class}# function which take a SYCL +<> object as an argument must throw an [code]#exception# with the +[code]#errc::backend_mismatch# error code if the backend of the provided SYCL +context doesn't match the target backend. [[sec:reference-semantics]] === Common reference semantics -Each of the following <> classes: -[code]#accessor#, -[code]#buffer#, -[code]#context#, -[code]#device#, -[code]#device_image#, -[code]#event#, -[code]#host_accessor#, -[code]#host_sampled_image_accessor#, -[code]#host_unsampled_image_accessor#, -[code]#kernel#, -[code]#kernel_id#, -[code]#kernel_bundle#, -[code]#local_accessor#, -[code]#platform#, -[code]#queue#, -[code]#sampled_image#, -[code]#sampled_image_accessor#, -[code]#stream#, -[code]#unsampled_image# and -[code]#unsampled_image_accessor# -must obey the following statements, where [code]#T# is the runtime class type: - - * [code]#T# must be copy constructible and copy assignable in the - host application and within SYCL kernel functions in the case that - [code]#T# is a valid kernel argument. Any instance of - [code]#T# that is constructed as a copy of another instance, via - either the copy constructor or copy assignment operator, must behave +Each of the following <> classes: [code]#accessor#, +[code]#buffer#, [code]#context#, [code]#device#, [code]#device_image#, +[code]#event#, [code]#host_accessor#, [code]#host_sampled_image_accessor#, +[code]#host_unsampled_image_accessor#, [code]#kernel#, [code]#kernel_id#, +[code]#kernel_bundle#, [code]#local_accessor#, [code]#platform#, [code]#queue#, +[code]#sampled_image#, [code]#sampled_image_accessor#, [code]#stream#, +[code]#unsampled_image# and [code]#unsampled_image_accessor# must obey the +following statements, where [code]#T# is the runtime class type: + + * [code]#T# must be copy constructible and copy assignable in the host + application and within SYCL kernel functions in the case that [code]#T# is a + valid kernel argument. + Any instance of [code]#T# that is constructed as a copy of another instance, + via either the copy constructor or copy assignment operator, must behave as-if it were the original instance and as-if any action performed on it were also performed on the original instance and must represent the same underlying <> as the original instance where applicable. - * [code]#T# must be destructible in the host application and within - SYCL kernel functions in the case that [code]#T# is a valid kernel - argument. When any instance of [code]#T# is destroyed, including as - a result of the copy assignment operator, any behavior specific to - [code]#T# that is specified as performed on destruction is only - performed if this instance is the last remaining host copy, in - accordance with the above definition of a copy. - * [code]#T# must be move constructible and move assignable in the - host application and within SYCL kernel functions in the case that T is - a valid kernel argument. Any instance of T that is constructed as a move - of another instance, via either the move constructor or move assignment - operator, must replace the original instance rendering said instance - invalid and must represent the same underlying <> as - the original instance where applicable. + * [code]#T# must be destructible in the host application and within SYCL + kernel functions in the case that [code]#T# is a valid kernel argument. + When any instance of [code]#T# is destroyed, including as a result of the + copy assignment operator, any behavior specific to [code]#T# that is + specified as performed on destruction is only performed if this instance is + the last remaining host copy, in accordance with the above definition of a + copy. + * [code]#T# must be move constructible and move assignable in the host + application and within SYCL kernel functions in the case that T is a valid + kernel argument. + Any instance of T that is constructed as a move of another instance, via + either the move constructor or move assignment operator, must replace the + original instance rendering said instance invalid and must represent the + same underlying <> as the original instance where + applicable. * [code]#T# must be equality comparable in the host application. - Equality between two instances of [code]#T# (i.e. [code]#a == b#) must be true if one instance is a copy of the other and non-equality - between two instances of [code]#T# (i.e. [code]#a != b#) must - be true if neither instance is a copy of the other, in accordance with - the above definition of a copy, unless either instance has become - invalidated by a move operation. By extension of the requirements above, - equality on [code]#T# must guarantee to be reflexive (i.e. [code]#a == a#), - symmetric (i.e. [code]#a == b# implies [code]#b == a# and [code]#a != b# - implies [code]#b != a#) and transitive (i.e. [code]#a == b && b == c# - implies [code]#c == a#). - * A specialization of [code]#std::hash# for [code]#T# must exist - in the host application that returns a unique value such that if two - instances of T are equal, in accordance with the above definition, then - their resulting hash values are also equal and subsequently if two hash - values are not equal, then their corresponding instances are also not - equal, in accordance with the above definition. - -Some <> classes will have additional behavior associated -with copy, movement, assignment or destruction semantics. If these are -specified they are in addition to those specified above unless stated -otherwise. - -Each of the runtime classes mentioned above must provide a common -interface of special member functions in order to fulfill the copy, -move, destruction requirements and hidden friend functions in order to -fulfill the equality requirements. - -A hidden friend function is a function first declared via a -[code]#friend# declaration with no additional out of class or namespace -scope declarations. Hidden friend functions are only visible to ADL -(Argument Dependent Lookup) and are hidden from qualified and unqualified -lookup. Hidden friend functions have the benefits of avoiding accidental -implicit conversions and faster compilation. - -These common special member functions and hidden friend functions are -described in <> and + Equality between two instances of [code]#T# (i.e. [code]#a == b#) must be + true if one instance is a copy of the other and non-equality between two + instances of [code]#T# (i.e. [code]#a != b#) must be true if neither + instance is a copy of the other, in accordance with the above definition of + a copy, unless either instance has become invalidated by a move operation. + By extension of the requirements above, equality on [code]#T# must guarantee + to be reflexive (i.e. [code]#a == a#), symmetric (i.e. [code]#a == b# + implies [code]#b == a# and [code]#a != b# implies [code]#b != a#) and + transitive (i.e. [code]#a == b && b == c# implies [code]#c == a#). + * A specialization of [code]#std::hash# for [code]#T# must exist in the host + application that returns a unique value such that if two instances of T are + equal, in accordance with the above definition, then their resulting hash + values are also equal and subsequently if two hash values are not equal, + then their corresponding instances are also not equal, in accordance with + the above definition. + +Some <> classes will have additional behavior associated with +copy, movement, assignment or destruction semantics. +If these are specified they are in addition to those specified above unless +stated otherwise. + +Each of the runtime classes mentioned above must provide a common interface of +special member functions in order to fulfill the copy, move, destruction +requirements and hidden friend functions in order to fulfill the equality +requirements. + +A hidden friend function is a function first declared via a [code]#friend# +declaration with no additional out of class or namespace scope declarations. +Hidden friend functions are only visible to ADL (Argument Dependent Lookup) and +are hidden from qualified and unqualified lookup. +Hidden friend functions have the benefits of avoiding accidental implicit +conversions and faster compilation. + +These common special member functions and hidden friend functions are described +in <> and <> respectively. [source,,linenums] @@ -510,46 +447,43 @@ bool operator!=(const T& lhs, const T& rhs) [[sec:byval-semantics]] === Common by-value semantics -Each of the following <> classes: [code]#id#, -[code]#range#, [code]#item#, [code]#nd_item#, -[code]#h_item#, [code]#group#, [code]#sub_group# and -[code]#nd_range# must follow the following statements, where -[code]#T# is the runtime class type: - - * [code]#T# must be default copy constructible and copy assignable in +Each of the following <> classes: [code]#id#, [code]#range#, +[code]#item#, [code]#nd_item#, [code]#h_item#, [code]#group#, [code]#sub_group# +and [code]#nd_range# must follow the following statements, where [code]#T# is +the runtime class type: + + * [code]#T# must be default copy constructible and copy assignable in the host + application (in the case where T is available on the host) and within SYCL + kernel functions. + * [code]#T# must be default destructible in the host application (in the case + where T is available on the host) and within SYCL kernel functions. + * [code]#T# must be default move constructible and default move assignable in the host application (in the case where T is available on the host) and within SYCL kernel functions. - * [code]#T# must be default destructible in the host application (in - the case where T is available on the host) and within SYCL kernel - functions. - * [code]#T# must be default move constructible and default move - assignable in the host application (in the case where T is available on - the host) and within SYCL kernel functions. - * [code]#T# must be equality comparable in the host application (in - the case where T is available on the host) and within SYCL kernel - functions. Equality between two instances of [code]#T# (i.e. - [code]#a == b#) must be true if the value of all members are equal - and non-equality between two instances of [code]#T# (i.e. - [code]#a != b#) must be true if the value of any members are not - equal, unless either instance has become invalidated by a move - operation. By extension of the requirements above, equality on - [code]#T# must guarantee to be reflexive (i.e. [code]#a == a#), - symmetric (i.e. [code]#a == b# implies [code]#b == a# and [code]#a != b# - implies [code]#b != a#) and transitive (i.e. [code]#a == b && b == c# - implies [code]#c == a#). - -Some <> classes will have additional behavior associated -with copy, movement, assignment or destruction semantics. If these are -specified they are in addition to those specified above unless stated -otherwise. - -Each of the runtime classes mentioned above must provide a common -interface of special member functions and member functions in order to -fulfill the copy, move, destruction and equality requirements, -following the <> and the <>. - -These common special member functions and hidden friend functions are -described in <> and + * [code]#T# must be equality comparable in the host application (in the case + where T is available on the host) and within SYCL kernel functions. + Equality between two instances of [code]#T# (i.e. [code]#a == b#) must be + true if the value of all members are equal and non-equality between two + instances of [code]#T# (i.e. [code]#a != b#) must be true if the value of + any members are not equal, unless either instance has become invalidated by + a move operation. + By extension of the requirements above, equality on [code]#T# must guarantee + to be reflexive (i.e. [code]#a == a#), symmetric (i.e. [code]#a == b# + implies [code]#b == a# and [code]#a != b# implies [code]#b != a#) and + transitive (i.e. [code]#a == b && b == c# implies [code]#c == a#). + +Some <> classes will have additional behavior associated with +copy, movement, assignment or destruction semantics. +If these are specified they are in addition to those specified above unless +stated otherwise. + +Each of the runtime classes mentioned above must provide a common interface of +special member functions and member functions in order to fulfill the copy, +move, destruction and equality requirements, following the <> and +the <>. + +These common special member functions and hidden friend functions are described +in <> and <> respectively. [source,,linenums] @@ -622,70 +556,59 @@ bool operator!=(const T& lhs, const T& rhs) === Properties -Each of the following <> classes: -[code]#accessor#, -[code]#buffer#, -[code]#host_accessor#, -[code]#host_sampled_image_accessor#, -[code]#host_unsampled_image_accessor#, -[code]#context#, -[code]#local_accessor#, -[code]#queue#, -[code]#sampled_image#, -[code]#sampled_image_accessor#, -[code]#stream#, -[code]#unsampled_image#, -[code]#unsampled_image_accessor# and -[code]#usm_allocator# -provide an optional parameter in each of -their constructors to provide a [code]#property_list# which -contains zero or more properties. Each of those properties augments -the semantics of the class with a particular feature. Each of those -classes must also provide [code]#has_property# and -[code]#get_property# member functions for querying for a -particular property. - -The listing below illustrates the usage of various buffer properties, -described in <>. - -The example illustrates how using properties does not affect the type -of the object, thus, does not prevent the usage of SYCL objects in -containers. +Each of the following <> classes: [code]#accessor#, +[code]#buffer#, [code]#host_accessor#, [code]#host_sampled_image_accessor#, +[code]#host_unsampled_image_accessor#, [code]#context#, [code]#local_accessor#, +[code]#queue#, [code]#sampled_image#, [code]#sampled_image_accessor#, +[code]#stream#, [code]#unsampled_image#, [code]#unsampled_image_accessor# and +[code]#usm_allocator# provide an optional parameter in each of their +constructors to provide a [code]#property_list# which contains zero or more +properties. +Each of those properties augments the semantics of the class with a particular +feature. +Each of those classes must also provide [code]#has_property# and +[code]#get_property# member functions for querying for a particular property. + +The listing below illustrates the usage of various buffer properties, described +in <>. + +The example illustrates how using properties does not affect the type of the +object, thus, does not prevent the usage of SYCL objects in containers. [source,,linenums] ---- include::{code_dir}/propertyExample.cpp[lines=4..-1] ---- -Each property is represented by a unique class and an instance of a property -is an instance of that type. Some properties can be default constructed -while others will require an argument on construction. A property may be -applicable to more than one class, however some properties may not be -compatible with each other. See the requirements for the properties of the -SYCL [code]#buffer# class, SYCL [code]#unsampled_image# class and -SYCL [code]#sampled_image# class in <> -and <> respectively. +Each property is represented by a unique class and an instance of a property is +an instance of that type. +Some properties can be default constructed while others will require an argument +on construction. +A property may be applicable to more than one class, however some properties may +not be compatible with each other. +See the requirements for the properties of the SYCL [code]#buffer# class, SYCL +[code]#unsampled_image# class and SYCL [code]#sampled_image# class in +<> and <> respectively. -Properties can be passed to a <> class -via an instance of [code]#property_list#. -These properties get tied to the <> class instance -and copies of the object will contain the same properties. +Properties can be passed to a <> class via an instance of +[code]#property_list#. +These properties get tied to the <> class instance and copies of +the object will contain the same properties. -A SYCL implementation or a <> may provide additional properties -other than those defined here, provided they are defined in accordance with -the requirements described in <>. +A SYCL implementation or a <> may provide additional properties other +than those defined here, provided they are defined in accordance with the +requirements described in <>. ==== Properties interface -Each of the runtime classes mentioned above must provide a common -interface of member functions in order to fulfill the property -interface requirements. +Each of the runtime classes mentioned above must provide a common interface of +member functions in order to fulfill the property interface requirements. -A synopsis of the common properties interface, the SYCL -[code]#property_list# class and the SYCL property classes is provided -below. The member functions of the common properties interface are listed in -<>. The constructors of the SYCL -[code]#property_list# class are listed in +A synopsis of the common properties interface, the SYCL [code]#property_list# +class and the SYCL property classes is provided below. +The member functions of the common properties interface are listed in +<>. +The constructors of the SYCL [code]#property_list# class are listed in <>. [source,,linenums] @@ -801,17 +724,16 @@ Construct a SYCL [code]#property_list# with zero or more properties. [[sec:device-selection]] === Device selection -Since a system can have several SYCL-compatible devices attached, it -is useful to have a way to select a specific device or a set of -devices to construct a specific object such as a -[code]#device# (see <>) or a -[code]#queue# (see <>), or -perform some operations on a device subset. +Since a system can have several SYCL-compatible devices attached, it is useful +to have a way to select a specific device or a set of devices to construct a +specific object such as a [code]#device# (see <>) or a +[code]#queue# (see <>), or perform some operations on +a device subset. -Device selection is done either by already having a specific instance -of a [code]#device# (see <>) or by -providing a <> which is a ranking function that will give -an integer ranking value to all the devices on the system. +Device selection is done either by already having a specific instance of a +[code]#device# (see <>) or by providing a <> +which is a ranking function that will give an integer ranking value to all the +devices on the system. [[sec:device-selector]] @@ -822,20 +744,20 @@ requirement [code]#Callable#, taking a parameter of type [code]#const device &# and returning a value that is implicitly convertible to [code]#int#. At any point where the <> needs to select a SYCL [code]#device# -using a <>, the system queries all -<> from all <> in the -system, calls the <> on each device and selects the one -which returns the highest score. If the highest value is strictly negative no -device is selected. +using a <>, the system queries all <> from all <> in the system, calls the +<> on each device and selects the one which returns the highest +score. +If the highest value is strictly negative no device is selected. -In places where only one device has to be picked and the high score is -obtained by more than one device, then one of the tied devices will be -returned, but which one is not defined and may depend on enumeration -order, for example, outside the control of the SYCL runtime. +In places where only one device has to be picked and the high score is obtained +by more than one device, then one of the tied devices will be returned, but +which one is not defined and may depend on enumeration order, for example, +outside the control of the SYCL runtime. -Some predefined <> are provided by the system as -described on <> in a header file with -some definition similar to the following: +Some predefined <> are provided by the system +as described on <> in a header file with some definition +similar to the following: [[table.device.selectors]] @@ -856,8 +778,8 @@ default_selector_v [NOTE] ==== Implementations may choose to return an emulated device (with -[code]#aspect::emulated#) as a fallback if there is no physical device -available on the system. +[code]#aspect::emulated#) as a fallback if there is no physical device available +on the system. ==== a@ @@ -944,7 +866,8 @@ and to <> for examples. include::{header_dir}/deviceSelector.h[lines=4..-1] ---- -Typical examples of default and user-provided <> could be: +Typical examples of default and user-provided <> could be: [source,,linenums] ---- @@ -1006,58 +929,57 @@ auto dev4 = device{aspect_selector( [NOTE] ==== -In SYCL 1.2.1 the predefined device selectors were actually types -that had to be instantiated to be used. Now they are just -instances. To simplify porting code using the old type -instantiations, a backward-compatible API is still provided, such as -[code]#sycl::default_selector#. The new predefined device -selectors have their new names appended with "_v" to avoid -conflicts, thus following the naming style used by traits in the {cpp} -standard library. There is no requirement for the implementation to -have for example [code]#sycl::gpu_selector_v# being an instance -of [code]#sycl::gpu_selector#. +In SYCL 1.2.1 the predefined device selectors were actually types that had to be +instantiated to be used. +Now they are just instances. +To simplify porting code using the old type instantiations, a +backward-compatible API is still provided, such as +[code]#sycl::default_selector#. +The new predefined device selectors have their new names appended with "_v" to +avoid conflicts, thus following the naming style used by traits in the {cpp} +standard library. +There is no requirement for the implementation to have for example +[code]#sycl::gpu_selector_v# being an instance of [code]#sycl::gpu_selector#. ==== -NOTE: Implementation note: the SYCL API might rely on SFINAE or {cpp20} -concepts to resolve some ambiguity in constructors with default -parameters. +NOTE: Implementation note: the SYCL API might rely on SFINAE or {cpp20} concepts +to resolve some ambiguity in constructors with default parameters. [[sec:platform-class]] === Platform class -The SYCL [code]#platform# class encapsulates a single SYCL platform on -which SYCL kernel functions may be executed. A SYCL platform must be -associated with a single <>. +The SYCL [code]#platform# class encapsulates a single SYCL platform on which +SYCL kernel functions may be executed. +A SYCL platform must be associated with a single <>. A SYCL [code]#platform# is also associated with any number of SYCL -[code]#devices# associated with the same <>. A platform may -contain no devices. +[code]#devices# associated with the same <>. +A platform may contain no devices. -All member functions of the [code]#platform# class are synchronous and -errors are handled by throwing synchronous SYCL exceptions. +All member functions of the [code]#platform# class are synchronous and errors +are handled by throwing synchronous SYCL exceptions. -The execution environment for a SYCL application has a fixed number of -platforms which does not vary as the application executes. The application -can get a list of all these platforms via [code]#platform::get_platforms()#, -and the order of the platform objects is the same each time the application -calls that function. The [code]#platform# class also provides constructors, -but constructing a new [code]#platform# instance merely creates a new object -that is a copy of one of the objects returned by -[code]#platform::get_platforms()#. - -The SYCL [code]#platform# class provides the common reference semantics -(see <>). +The execution environment for a SYCL application has a fixed number of platforms +which does not vary as the application executes. +The application can get a list of all these platforms via +[code]#platform::get_platforms()#, and the order of the platform objects is the +same each time the application calls that function. +The [code]#platform# class also provides constructors, but constructing a new +[code]#platform# instance merely creates a new object that is a copy of one of +the objects returned by [code]#platform::get_platforms()#. + +The SYCL [code]#platform# class provides the common reference semantics (see +<>). ==== Platform interface -A synopsis of the SYCL [code]#platform# class is provided below. The -constructors, member functions and static member functions of the SYCL -[code]#platform# class are listed in -<>, <> and -<> respectively. The additional common -special member functions and common member functions are listed in -<> in +A synopsis of the SYCL [code]#platform# class is provided below. +The constructors, member functions and static member functions of the SYCL +[code]#platform# class are listed in <>, +<> and <> respectively. +The additional common special member functions and common member functions are +listed in <> in <> and <> respectively. @@ -1185,14 +1107,14 @@ static std::vector get_platforms() ==== Platform information descriptors -A <> can be queried for information using the [code]#get_info# -member function of the [code]#platform# class, specifying one of the info -parameters in [code]#info::platform#. The possible values for each info -parameter and any restrictions are defined in the specification of the -<> associated with the <>. All info parameters in -[code]#info::platform# are specified in <> and the -synopsis for [code]#info::platform# is described in -<>. +A <> can be queried for information using the [code]#get_info# member +function of the [code]#platform# class, specifying one of the info parameters in +[code]#info::platform#. +The possible values for each info parameter and any restrictions are defined in +the specification of the <> associated with the <>. +All info parameters in [code]#info::platform# are specified in +<> and the synopsis for [code]#info::platform# is described +in <>. [[table.platform.info]] @@ -1247,37 +1169,36 @@ Returns the extensions supported by this [code]#platform#. Returns an empty list [[sec:interface.context.class]] === Context class -The <> class represents a SYCL <>. A <> -represents the runtime data structures and state required by a <> -API to interact with a group of devices associated with a platform. +The <> class represents a SYCL <>. +A <> represents the runtime data structures and state required by a +<> API to interact with a group of devices associated with a platform. -The SYCL [code]#context# class provides the common reference semantics -(see <>). +The SYCL [code]#context# class provides the common reference semantics (see +<>). ==== Context interface -The constructors and member functions of the SYCL [code]#context# class -are listed in <> and -<>, respectively. The additional common special -member functions and common member functions are listed in -<> in +The constructors and member functions of the SYCL [code]#context# class are +listed in <> and <>, +respectively. +The additional common special member functions and common member functions are +listed in <> in <> and <>, respectively. -All member functions of the <> class are synchronous and errors -are handled by throwing synchronous SYCL exceptions. +All member functions of the <> class are synchronous and errors are +handled by throwing synchronous SYCL exceptions. -All constructors of the SYCL <> class will construct an -instance associated with a particular <>, determined by the -constructor parameters or, in the case of the default constructor, the -SYCL [code]#device# produced by the -[code]#default_selector_v#. +All constructors of the SYCL <> class will construct an instance +associated with a particular <>, determined by the constructor +parameters or, in the case of the default constructor, the SYCL [code]#device# +produced by the [code]#default_selector_v#. A SYCL [code]#context# can optionally be constructed with an -[code]#async_handler# parameter. In this case the -[code]#async_handler# is used to report asynchronous SYCL exceptions, -as described in <>. +[code]#async_handler# parameter. +In this case the [code]#async_handler# is used to report asynchronous SYCL +exceptions, as described in <>. Information about a SYCL <> may be queried through the [code]#get_info()# member function. @@ -1384,13 +1305,14 @@ std::vector get_devices() const ==== Context information descriptors -A <> can be queried for information using the [code]#get_info# -member function of the [code]#context# class, specifying one of the info -parameters in [code]#info::context#. The possible values for each info -parameter and any restrictions are defined in the specification of the -<> associated with the <>. All info parameters in -[code]#info::context# are specified in <> and the synopsis -for [code]#info::context# is described in <>. +A <> can be queried for information using the [code]#get_info# member +function of the [code]#context# class, specifying one of the info parameters in +[code]#info::context#. +The possible values for each info parameter and any restrictions are defined in +the specification of the <> associated with the <>. +All info parameters in [code]#info::context# are specified in +<> and the synopsis for [code]#info::context# is described +in <>. [[table.context.info]] @@ -1506,41 +1428,40 @@ The [code]#property_list# constructor parameters are present for extensibility. [[sec:device-class]] === Device class -The SYCL [code]#device# class encapsulates a single SYCL device on -which <> can be executed. +The SYCL [code]#device# class encapsulates a single SYCL device on which +<> can be executed. -All member functions of the [code]#device# class are synchronous and -errors are handled by throwing synchronous SYCL exceptions. +All member functions of the [code]#device# class are synchronous and errors are +handled by throwing synchronous SYCL exceptions. The execution environment for a SYCL application has a fixed number of <> which does not vary as the application executes. The application can get a list of all these devices via [code]#device::get_devices()#, and the order of the device objects is the same each time the application calls that function (assuming the parameter to that -function is the same for each call). The [code]#device# class also provides -constructors, but constructing a new [code]#device# instance merely creates a -new object that is a copy of one of the objects returned by -[code]#device::get_devices()#. - -A SYCL [code]#device# can be partitioned into multiple SYCL devices, by -calling the [code]#create_sub_devices()# member function template. The -resulting SYCL [code]#devices# are considered sub devices, and it is -valid to partition these sub devices further. The range of support for this -feature is <> and device specific and can be queried for through -[code]#get_info()#. - -The SYCL [code]#device# class provides the common reference semantics -(see <>). +function is the same for each call). +The [code]#device# class also provides constructors, but constructing a new +[code]#device# instance merely creates a new object that is a copy of one of the +objects returned by [code]#device::get_devices()#. + +A SYCL [code]#device# can be partitioned into multiple SYCL devices, by calling +the [code]#create_sub_devices()# member function template. +The resulting SYCL [code]#devices# are considered sub devices, and it is valid +to partition these sub devices further. +The range of support for this feature is <> and device specific and can +be queried for through [code]#get_info()#. + +The SYCL [code]#device# class provides the common reference semantics (see +<>). ==== Device interface -A synopsis of the SYCL [code]#device# class is provided below. The -constructors, member functions and static member functions of the SYCL -[code]#device# class are listed in -<>, <> and -<> respectively. The additional common special -member functions and common member functions are listed in -<> in +A synopsis of the SYCL [code]#device# class is provided below. +The constructors, member functions and static member functions of the SYCL +[code]#device# class are listed in <>, +<> and <> respectively. +The additional common special member functions and common member functions are +listed in <> in <> and <>, respectively. @@ -1781,13 +1702,14 @@ get_devices(info::device_type deviceType = info::device_type::all) ==== Device information descriptors -A <> can be queried for information using the [code]#get_info# -member function of the [code]#device# class, specifying one of the info -parameters in [code]#info::device#. The possible values for each info -parameter and any restriction are defined in the specification of the -<> associated with the <>. All info parameters in -[code]#info::device# are specified in <> and the synopsis -for [code]#info::device# is described in <>. +A <> can be queried for information using the [code]#get_info# member +function of the [code]#device# class, specifying one of the info parameters in +[code]#info::device#. +The possible values for each info parameter and any restriction are defined in +the specification of the <> associated with the <>. +All info parameters in [code]#info::device# are specified in +<> and the synopsis for [code]#info::device# is described in +<>. [[table.device.info]] @@ -2081,25 +2003,23 @@ info::device::half_fp_config -- * [code]#info::fp_config::denorm:# denorms are supported. * [code]#info::fp_config::inf_nan:# INF and quiet NaNs are supported. - * [code]#info::fp_config::round_to_nearest:# round to nearest even - rounding mode is supported. - * [code]#info::fp_config::round_to_zero:# round to zero rounding mode - is supported. - * [code]#info::fp_config::round_to_inf:# round to positive and - negative infinity rounding modes are supported. - * [code]#info::fp_config::fma:# IEEE754-2008 fused multiply add is + * [code]#info::fp_config::round_to_nearest:# round to nearest even rounding + mode is supported. + * [code]#info::fp_config::round_to_zero:# round to zero rounding mode is supported. - * [code]#info::fp_config::correctly_rounded_divide_sqrt:# divide and - sqrt are correctly rounded as defined by the IEEE754 specification. + * [code]#info::fp_config::round_to_inf:# round to positive and negative + infinity rounding modes are supported. + * [code]#info::fp_config::fma:# IEEE754-2008 fused multiply add is supported. + * [code]#info::fp_config::correctly_rounded_divide_sqrt:# divide and sqrt are + correctly rounded as defined by the IEEE754 specification. This property is deprecated. - * [code]#info::fp_config::soft_float:# basic floating-point - operations (such as addition, subtraction, multiplication) are - implemented in software. + * [code]#info::fp_config::soft_float:# basic floating-point operations (such + as addition, subtraction, multiplication) are implemented in software. If half precision is supported by this SYCL [code]#device# (i.e. the -[code]#device# has [code]#aspect::fp16# there is no minimum -floating-point capability. If half support is not supported the returned -[code]#std::vector# must be empty. +[code]#device# has [code]#aspect::fp16# there is no minimum floating-point +capability. +If half support is not supported the returned [code]#std::vector# must be empty. -- a@ @@ -2116,25 +2036,22 @@ info::device::single_fp_config -- * [code]#info::fp_config::denorm:# denorms are supported. * [code]#info::fp_config::inf_nan:# INF and quiet NaNs are supported. - * [code]#info::fp_config::round_to_nearest:# round to nearest even - rounding mode is supported. - * [code]#info::fp_config::round_to_zero:# round to zero rounding mode - is supported. - * [code]#info::fp_config::round_to_inf:# round to positive and - negative infinity rounding modes are supported. - * [code]#info::fp_config::fma:# IEEE754-2008 fused multiply add is + * [code]#info::fp_config::round_to_nearest:# round to nearest even rounding + mode is supported. + * [code]#info::fp_config::round_to_zero:# round to zero rounding mode is supported. - * [code]#info::fp_config::correctly_rounded_divide_sqrt:# divide and - sqrt are correctly rounded as defined by the IEEE754 specification. + * [code]#info::fp_config::round_to_inf:# round to positive and negative + infinity rounding modes are supported. + * [code]#info::fp_config::fma:# IEEE754-2008 fused multiply add is supported. + * [code]#info::fp_config::correctly_rounded_divide_sqrt:# divide and sqrt are + correctly rounded as defined by the IEEE754 specification. This property is deprecated. - * [code]#info::fp_config::soft_float:# basic floating-point - operations (such as addition, subtraction, multiplication) are - implemented in software. - -If this SYCL [code]#device# is not of type -[code]#info::device_type::custom# then the minimum floating-point -capability must be: [code]#info::fp_config::round_to_nearest# and -[code]#info::fp_config::inf_nan#. + * [code]#info::fp_config::soft_float:# basic floating-point operations (such + as addition, subtraction, multiplication) are implemented in software. + +If this SYCL [code]#device# is not of type [code]#info::device_type::custom# +then the minimum floating-point capability must be: +[code]#info::fp_config::round_to_nearest# and [code]#info::fp_config::inf_nan#. -- a@ @@ -2151,29 +2068,25 @@ info::device::double_fp_config -- * [code]#info::fp_config::denorm:# denorms are supported. * [code]#info::fp_config::inf_nan:# INF and NaNs are supported. - * [code]#info::fp_config::round_to_nearest:# round to nearest even - rounding mode is supported. - * [code]#info::fp_config::round_to_zero:# round to zero rounding mode - is supported. - * [code]#info::fp_config::round_to_inf:# round to positive and - negative infinity rounding modes are supported. - * [code]#info::fp_config::fma:# IEEE754-2008 fused multiply-add is + * [code]#info::fp_config::round_to_nearest:# round to nearest even rounding + mode is supported. + * [code]#info::fp_config::round_to_zero:# round to zero rounding mode is supported. - * [code]#info::fp_config::soft_float:# basic floating-point - operations (such as addition, subtraction, multiplication) are - implemented in software. + * [code]#info::fp_config::round_to_inf:# round to positive and negative + infinity rounding modes are supported. + * [code]#info::fp_config::fma:# IEEE754-2008 fused multiply-add is supported. + * [code]#info::fp_config::soft_float:# basic floating-point operations (such + as addition, subtraction, multiplication) are implemented in software. If double precision is supported by this SYCL [code]#device# (i.e. the -[code]#device# has [code]#aspect::fp64# and this SYCL -[code]#device# is not of type [code]#info::device_type::custom# -then the minimum floating-point capability must be: -[code]#info::fp_config::fma#, +[code]#device# has [code]#aspect::fp64# and this SYCL [code]#device# is not of +type [code]#info::device_type::custom# then the minimum floating-point +capability must be: [code]#info::fp_config::fma#, [code]#info::fp_config::round_to_nearest#, -[code]#info::fp_config::round_to_zero#, -[code]#info::fp_config::round_to_inf#, -[code]#info::fp_config::inf_nan# and -[code]#info::fp_config::denorm#. If double support is not supported the -returned [code]#std::vector# must be empty. +[code]#info::fp_config::round_to_zero#, [code]#info::fp_config::round_to_inf#, +[code]#info::fp_config::inf_nan# and [code]#info::fp_config::denorm#. +If double support is not supported the returned [code]#std::vector# must be +empty. -- a@ @@ -2528,9 +2441,9 @@ info::device::extensions @ [.code]#std::vector# a@ Deprecated, use [code]#info::device::aspects# instead. -- -Returns a [code]#std::vector# of extension names (the extension names -do not contain any spaces) supported by this SYCL [code]#device#. The -extension names returned can be vendor supported extension names and one or +Returns a [code]#std::vector# of extension names (the extension names do not +contain any spaces) supported by this SYCL [code]#device#. +The extension names returned can be vendor supported extension names and one or more of the following Khronos approved extension names: * [code]#cl_khr_int64_base_atomics# @@ -2550,17 +2463,16 @@ more of the following Khronos approved extension names: * [code]#cl_khr_context_abort# * [code]#cl_khr_spir# -If this SYCL [code]#device# is an OpenCL device then following approved -Khronos extension names must be returned by all device that support OpenCL C -1.2: +If this SYCL [code]#device# is an OpenCL device then following approved Khronos +extension names must be returned by all device that support OpenCL C 1.2: * [code]#cl_khr_global_int32_base_atomics# * [code]#cl_khr_global_int32_extended_atomics# * [code]#cl_khr_local_int32_base_atomics# * [code]#cl_khr_local_int32_extended_atomics# * [code]#cl_khr_byte_addressable_store# - * [code]#cl_khr_fp64# (for backward compatibility if double precision - is supported) + * [code]#cl_khr_fp64# (for backward compatibility if double precision is + supported) Please refer to the OpenCL 1.2 Extension Specification for a detailed description of these extensions. @@ -2678,9 +2590,9 @@ info::device::partition_type_affinity_domain [[sec:device-aspects]] ==== Device aspects -Every SYCL <> has an associated set of <> which -identify characteristics of the [code]#device#. Aspects are defined via -the [code]#enum class aspect# enumeration: +Every SYCL <> has an associated set of <> which identify +characteristics of the [code]#device#. +Aspects are defined via the [code]#enum class aspect# enumeration: [source,,linenums] ---- @@ -2688,12 +2600,14 @@ include::{header_dir}/deviceEnumClassAspect.h[lines=4..-1] ---- SYCL applications can query the aspects for a [code]#device# via -[code]#device::has()# in order to determine whether the [code]#device# -supports any optional features. <> lists the aspects that -are defined in the <> and tells which optional features correspond -to each. Backends and extensions may provide additional aspects and additional -optional device features. If so, the <> specification document or the -extension document describes them. +[code]#device::has()# in order to determine whether the [code]#device# supports +any optional features. +<> lists the aspects that are defined in the <> +and tells which optional features correspond to each. +Backends and extensions may provide additional aspects and additional optional +device features. +If so, the <> specification document or the extension document +describes them. [[table.device.aspect]] .Device aspects defined by the <> @@ -2748,11 +2662,12 @@ aspect::emulated [NOTE] ==== -As an example, a vendor might support both a hardware FPGA device and a -software emulated FPGA, where the emulated FPGA has all the same features -as the hardware one but runs more slowly and can provide additional profiling -or diagnostic information. In such a case, an application's -<> can use [code]#aspect::emulated# to distinguish the two. +As an example, a vendor might support both a hardware FPGA device and a software +emulated FPGA, where the emulated FPGA has all the same features as the hardware +one but runs more slowly and can provide additional profiling or diagnostic +information. +In such a case, an application's <> can use +[code]#aspect::emulated# to distinguish the two. ==== a@ @@ -2886,37 +2801,40 @@ aspect::usm_system_allocations |==== The implementation also provides two traits that the application can use to -query aspects at compilation time. The traits [code]#any_device_has# -and [code]#all_devices_have# are set according to the collection of -devices _D_ that can possibly execute device code, as determined by the -compilation environment. The trait [code]#any_device_has# inherits -from [code]#std::true_type# only if at least one device in _D_ has the -specified aspect. The trait [code]#all_devices_have# inherits from -[code]#std::true_type# only if all devices in _D_ have the specified aspect. +query aspects at compilation time. +The traits [code]#any_device_has# and [code]#all_devices_have# +are set according to the collection of devices _D_ that can possibly execute +device code, as determined by the compilation environment. +The trait [code]#any_device_has# inherits from [code]#std::true_type# +only if at least one device in _D_ has the specified aspect. +The trait [code]#all_devices_have# inherits from [code]#std::true_type# +only if all devices in _D_ have the specified aspect. [source,,linenums] ---- include::{header_dir}/aspectTraits.h[lines=4..-1] ---- -Applications can use these traits to reduce their code size. The following -example demonstrates one way to use these traits to avoid instantiating a -templated kernel for device features that are not supported by any device. +Applications can use these traits to reduce their code size. +The following example demonstrates one way to use these traits to avoid +instantiating a templated kernel for device features that are not supported by +any device. [source,,linenums] ---- include::{code_dir}/aspectTraitExample.cpp[lines=4..-1] ---- -The kernel function [code]#MyKernel# is templated to use a different -algorithm depending on whether the device has the aspect [code]#aspect::fp16#, -and the call to [code]#dev.has()# chooses the kernel function instantiation -that matches the device's capabilities. However, the use of -[code]#any_device_has_v# and [code]#all_devices_have_v# entirely avoid -useless instantiations of the kernel function. For example, when the -compilation environment does not support any devices with [code]#aspect::fp16#, -[code]#any_device_has_v# is [code]#false#, and the kernel -function is never instantiated with support for the [code]#sycl::half# type. +The kernel function [code]#MyKernel# is templated to use a different algorithm +depending on whether the device has the aspect [code]#aspect::fp16#, and the +call to [code]#dev.has()# chooses the kernel function instantiation that matches +the device's capabilities. +However, the use of [code]#any_device_has_v# and [code]#all_devices_have_v# +entirely avoid useless instantiations of the kernel function. +For example, when the compilation environment does not support any devices with +[code]#aspect::fp16#, [code]#any_device_has_v# is [code]#false#, +and the kernel function is never instantiated with support for the +[code]#sycl::half# type. [NOTE] ==== @@ -2924,29 +2842,32 @@ Like any trait, the definitions of [code]#any_device_has# and [code]#all_devices_have# are uniform across all parts of a SYCL application. If an implementation uses <>, all compiler passes define a particular aspect's specialization of the traits the same way, regardless of whether that -compiler pass' device supports the aspect. Thus, [code]#any_device_has# and -[code]#all_devices_have# cannot be used to determine whether any particular -device supports an aspect. Instead, applications must use -[code]#device::has()# or [code]#platform::has()# for this. +compiler pass' device supports the aspect. +Thus, [code]#any_device_has# and [code]#all_devices_have# cannot be used to +determine whether any particular device supports an aspect. +Instead, applications must use [code]#device::has()# or [code]#platform::has()# +for this. ==== [NOTE] ==== An implementation could choose to provide command line options which affect the -set of devices that it supports. If so, those command line options would also -affect these traits. For example, if an implementation provides a command line -option that disables [code]#aspect::accelerator# devices, the trait +set of devices that it supports. +If so, those command line options would also affect these traits. +For example, if an implementation provides a command line option that disables +[code]#aspect::accelerator# devices, the trait [code]#any_device_has# would inherit from [code]#std::false_type# when that command line option was specified. ==== [NOTE] ==== -These traits only reflect the supported devices at the time the SYCL -application is compiled. It's possible that unsupported devices are still -visible to the application when it runs. However, if a device _D_ is not -supported when the application is compiled, the application will not be able -to submit kernels to that device _D_. +These traits only reflect the supported devices at the time the SYCL application +is compiled. +It's possible that unsupported devices are still visible to the application when +it runs. +However, if a device _D_ is not supported when the application is compiled, the +application will not be able to submit kernels to that device _D_. ==== // %%%%%%%%%%%%%%%%%%%%%%%%%%%% end device_class %%%%%%%%%%%%%%%%%%%%%%%%%%%% @@ -2958,62 +2879,66 @@ to submit kernels to that device _D_. // \input{queue_class} // %%%%%%%%%%%%%%%%%%%%%%%%%%%% begin queue_class %%%%%%%%%%%%%%%%%%%%%%%%%%%% -The SYCL [code]#queue# class encapsulates a single SYCL queue which -schedules kernels on a SYCL device. +The SYCL [code]#queue# class encapsulates a single SYCL queue which schedules +kernels on a SYCL device. -A SYCL [code]#queue# can be used to submit <> to be -executed by the <> using the [code]#submit# member -function. +A SYCL [code]#queue# can be used to submit <> to +be executed by the <> using the [code]#submit# member function. -All member functions of the [code]#queue# class are synchronous and errors -are handled by throwing synchronous SYCL exceptions. The [code]#submit# -member function synchronously invokes the provided +All member functions of the [code]#queue# class are synchronous and errors are +handled by throwing synchronous SYCL exceptions. +The [code]#submit# member function synchronously invokes the provided <> (as described in <>) in the calling thread, thereby scheduling a -<> for asynchronous execution. Any error in the submission of a -<> is handled by throwing a synchronous SYCL exception. +<> for asynchronous execution. +Any error in the submission of a <> is handled by throwing a +synchronous SYCL exception. Any errors from the <> after it has been submitted are handled by passing <> at specific times to an <>, as described in <>. -A SYCL [code]#queue# can wait for all <> that it has -submitted by calling [code]#wait# or [code]#wait_and_throw#. - -The default constructor of the SYCL [code]#queue# class will -construct a queue based on the SYCL [code]#device# returned from -the [code]#default_selector_v# (see <>). -All other constructors construct a queue as determined by the -parameters provided. All constructors will implicitly construct a SYCL -[code]#platform#, [code]#device# and [code]#context# in order to -facilitate the construction of the queue. - -Each constructor takes as the last -parameter an optional SYCL [code]#property_list# to provide properties to -the SYCL [code]#queue#. - -A SYCL [code]#queue# may be destroyed even when there are uncompleted -<> that have been submitted to the queue. Doing so does not -block. Instead, any commands that have been submitted to the queue begin -execution when their requisites are satisfied, just as they would had the queue -not been destroyed. Any event objects for those commands are signaled in the -normal manner when the command completes. Resources associated with the queue -will be freed by the time the last command completes. - -The SYCL [code]#queue# class provides the common reference semantics -(see <>). +A SYCL [code]#queue# can wait for all <> that it +has submitted by calling [code]#wait# or [code]#wait_and_throw#. + +The default constructor of the SYCL [code]#queue# class will construct a queue +based on the SYCL [code]#device# returned from the [code]#default_selector_v# +(see <>). +All other constructors construct a queue as determined by the parameters +provided. +All constructors will implicitly construct a SYCL [code]#platform#, +[code]#device# and [code]#context# in order to facilitate the construction of +the queue. + +Each constructor takes as the last parameter an optional SYCL +[code]#property_list# to provide properties to the SYCL [code]#queue#. + +A SYCL [code]#queue# may be destroyed even when there are uncompleted <> that have been submitted to the queue. +Doing so does not block. +Instead, any commands that have been submitted to the queue begin execution when +their requisites are satisfied, just as they would had the queue not been +destroyed. +Any event objects for those commands are signaled in the normal manner when the +command completes. +Resources associated with the queue will be freed by the time the last command +completes. + +The SYCL [code]#queue# class provides the common reference semantics (see +<>). ==== Queue interface -A synopsis of the SYCL [code]#queue# class is provided below. The -constructors and member functions of the SYCL [code]#queue# class are -listed in <> and <> -respectively. The additional common special member functions and common member -functions are listed in <> in +A synopsis of the SYCL [code]#queue# class is provided below. +The constructors and member functions of the SYCL [code]#queue# class are listed +in <> and <> respectively. +The additional common special member functions and common member functions are +listed in <> in <> and <>, respectively. -Some queue member functions are shortcuts to member functions of the [code]#handler# class. +Some queue member functions are shortcuts to member functions of the +[code]#handler# class. These are listed in <>. // Interface for class: queue @@ -3291,31 +3216,32 @@ requested by the template parameter [code]#Param#. [[sec:queue-shortcuts]] ==== Queue shortcut functions -Queue shortcut functions are member functions of the [code]#queue# class -that implicitly create a command group with an implicit command group [code]#handler# -consisting of a single command, -a call to the member function of the handler object with the same signature -(e.g. [code]#queue::single_task# will call [code]#handler::single_task# with the same arguments), -and submit the command group. -The main signature difference comes from the return type: -member functions of the [code]#handler# return [code]#void#, -whereas corresponding queue shortcut functions return an [code]#event# object -that represents the submitted command group. -Queue shortcuts can additionally take a list of events to wait on, -as if passing the event list to [code]#handler::depends_on# for the implicit command group. +Queue shortcut functions are member functions of the [code]#queue# class that +implicitly create a command group with an implicit command group [code]#handler# +consisting of a single command, a call to the member function of the handler +object with the same signature (e.g. [code]#queue::single_task# will call +[code]#handler::single_task# with the same arguments), and submit the command +group. +The main signature difference comes from the return type: member functions of +the [code]#handler# return [code]#void#, whereas corresponding queue shortcut +functions return an [code]#event# object that represents the submitted command +group. +Queue shortcuts can additionally take a list of events to wait on, as if passing +the event list to [code]#handler::depends_on# for the implicit command group. The full list of queue shortcuts is defined in <>. -The list of handler member functions is defined in <>. - -It is not allowed to capture accessors into the implicitly created command group. -If a queue shortcut function launches a kernel -(via [code]#single_task# or [code]#parallel_for#), -only USM pointers are allowed inside such kernels. -However, queue shortcuts that perform non-kernel operations -can be provided with a valid placeholder accessor as an argument. -In that case there is an additional step performed: -the implicit command group [code]#handler# calls [code]#handler::require# -on each accessor passed in as a function argument. +The list of handler member functions is defined in +<>. + +It is not allowed to capture accessors into the implicitly created command +group. +If a queue shortcut function launches a kernel (via [code]#single_task# or +[code]#parallel_for#), only USM pointers are allowed inside such kernels. +However, queue shortcuts that perform non-kernel operations can be provided with +a valid placeholder accessor as an argument. +In that case there is an additional step performed: the implicit command group +[code]#handler# calls [code]#handler::require# on each accessor passed in as a +function argument. An example of using queue shortcuts is shown below. @@ -3678,13 +3604,14 @@ a@ Equivalent to submitting a command-group containing ==== Queue information descriptors -A <> can be queried for information using the [code]#get_info# -member function of the [code]#queue# class, specifying one of the info -parameters in [code]#info::queue#. The possible values for each info parameter -and any restriction are defined in the specification of the <> -associated with the <>. All info parameters in [code]#info::queue# are -specified in <> and the synopsis for [code]#info::queue# is -described in <>. +A <> can be queried for information using the [code]#get_info# member +function of the [code]#queue# class, specifying one of the info parameters in +[code]#info::queue#. +The possible values for each info parameter and any restriction are defined in +the specification of the <> associated with the <>. +All info parameters in [code]#info::queue# are specified in <> +and the synopsis for [code]#info::queue# is described in +<>. [[table.queue.info]] .Queue information descriptors @@ -3715,9 +3642,8 @@ info::queue::device [[sec:queue-properties]] ==== Queue properties -The properties that can be provided when constructing the SYCL -[code]#queue# class are describe in -<>. +The properties that can be provided when constructing the SYCL [code]#queue# +class are describe in <>. [[table.properties.queue]] @@ -3759,8 +3685,8 @@ property::queue::in_order |==== -The constructors of the [code]#queue# [code]#property# -classes are listed in <>. +The constructors of the [code]#queue# [code]#property# classes are listed in +<>. [[table.constructors.properties.queue]] @@ -3791,22 +3717,21 @@ property::queue::in_order::in_order() Queue errors come in two forms: - * *Synchronous Errors* are those that we would expect to be - reported directly at the point of waiting on an event, and hence waiting - for a queue to complete, as well as any immediate errors reported by - enqueuing work onto a queue. Such errors are reported through {cpp} - exceptions. - * <> are those that are produced or detected after - associated host API calls have returned (so can't be thrown as - exceptions by the API call), and that are handled by an - <> through which the errors are reported. Handling of - asynchronous errors from a queue occurs at specific times, as described - by <>. + * *Synchronous Errors* are those that we would expect to be reported directly + at the point of waiting on an event, and hence waiting for a queue to + complete, as well as any immediate errors reported by enqueuing work onto a + queue. + Such errors are reported through {cpp} exceptions. + * <> are those that are produced or detected + after associated host API calls have returned (so can't be thrown as + exceptions by the API call), and that are handled by an <> + through which the errors are reported. + Handling of asynchronous errors from a queue occurs at specific times, as + described by <>. -Note that if there are <> to be processed when a queue -is destructed, the handler is called and -this might delay or block the destruction, according to the behavior -of the handler. +Note that if there are <> to be processed when +a queue is destructed, the handler is called and this might delay or block the +destruction, according to the behavior of the handler. // %%%%%%%%%%%%%%%%%%%%%%%%%%%% end queue_class %%%%%%%%%%%%%%%%%%%%%%%%%%%% @@ -3818,8 +3743,9 @@ An <> in SYCL is an object that represents the status of an operation that is being executed by the SYCL runtime. Typically in SYCL, data dependency and execution order is handled implicitly by -the SYCL runtime. However, in some circumstances developers want fine grain control -of the execution, or want to retrieve properties of a command that is running. +the SYCL runtime. +However, in some circumstances developers want fine grain control of the +execution, or want to retrieve properties of a command that is running. Note that, although an event represents the status of a particular operation, the dependencies of a certain event can be used to keep track of multiple steps @@ -3830,14 +3756,13 @@ The dependencies of the event returned via the submission of the command group are the implementation-defined commands associated with the <> execution. -The SYCL [code]#event# class provides the common reference semantics -(see <>). +The SYCL [code]#event# class provides the common reference semantics (see +<>). -The constructors and member functions of the SYCL [code]#event# class -are listed in <> and -<>, respectively. The additional common special -member functions and common member functions are listed in -<> and +The constructors and member functions of the SYCL [code]#event# class are listed +in <> and <>, respectively. +The additional common special member functions and common member functions are +listed in <> and <>, respectively. // Interface for class: event.h @@ -3993,13 +3918,14 @@ template typename Param::return_type get_profiling_info() const ==== Event information and profiling descriptors -An <> can be queried for information using the [code]#get_info# -member function of the [code]#event# class, specifying one of the info -parameters in [code]#info::event#. The possible values for each info parameter -and any restrictions are defined in the specification of the <> -associated with the <>. All info parameters in [code]#info::event# are -specified in <> and the synopsis for [code]#info::event# is -described in <>. +An <> can be queried for information using the [code]#get_info# member +function of the [code]#event# class, specifying one of the info parameters in +[code]#info::event#. +The possible values for each info parameter and any restrictions are defined in +the specification of the <> associated with the <>. +All info parameters in [code]#info::event# are specified in <> +and the synopsis for [code]#info::event# is described in +<>. [[table.event.info]] .Event class information descriptors @@ -4056,12 +3982,12 @@ info::event_command_status::complete An <> can be queried for profiling information using the [code]#get_profiling_info# member function of the [code]#event# class, specifying one of the profiling info parameters enumerated in -[code]#info::event_profiling#. The possible values for each info parameter and -any restrictions are defined in the specification of the <> -associated with the <>. All info parameters in -[code]#info::event_profiling# are specified in <> -and the synopsis for [code]#info::event_profiling# is described in -<>. +[code]#info::event_profiling#. +The possible values for each info parameter and any restrictions are defined in +the specification of the <> associated with the <>. +All info parameters in [code]#info::event_profiling# are specified in +<> and the synopsis for [code]#info::event_profiling# +is described in <>. Each profiling descriptor returns a 64-bit timestamp that represents the number of nanoseconds that have elapsed since some implementation-defined timebase. @@ -4126,37 +4052,36 @@ info::event_profiling::command_end [[sec:data.access.and.storage]] == Data access and storage in SYCL -In SYCL, when using <> and <>, -data storage and access are handled by separate classes. -<> and <> handle -storage and ownership of the data, whereas <> handle access to -the data. +In SYCL, when using <> and <>, data storage and +access are handled by separate classes. +<> and <> handle storage and ownership of the +data, whereas <> handle access to the data. Buffers and images in SYCL can be bound to more than one device or context, including across different <>. -They also handle ownership of the -data, while allowing exception handling for blocking -and non-blocking data transfers. Accessors manage data transfers between the host -and all of the devices in the system, as well as tracking of data dependencies. +They also handle ownership of the data, while allowing exception handling for +blocking and non-blocking data transfers. +Accessors manage data transfers between the host and all of the devices in the +system, as well as tracking of data dependencies. Zero-sized buffers and accessors are permitted, but attempting to access data -within them produces undefined behavior, similar to dereferencing a null -pointer in {cpp}. Note that zero-sized accessors can be created in several -ways: by creating an accessor from a zero-sized buffer, by creating an accessor -with a zero-sized buffer sub-range, or by creating an accessor with its default -constructor. +within them produces undefined behavior, similar to dereferencing a null pointer +in {cpp}. +Note that zero-sized accessors can be created in several ways: by creating an +accessor from a zero-sized buffer, by creating an accessor with a zero-sized +buffer sub-range, or by creating an accessor with its default constructor. -When using <> allocations, data storage is managed by USM allocation functions, and -data access is via pointers. See <> for greater detail. +When using <> allocations, data storage is managed by USM allocation +functions, and data access is via pointers. +See <> for greater detail. === Host allocation -A <> may need to allocate temporary objects on the host -to handle some operations (such as copying data from one context to -another). +A <> may need to allocate temporary objects on the host to handle +some operations (such as copying data from one context to another). Allocation on the host is managed using an allocator object, following the standard {cpp} allocator class definition. -The default allocator for memory objects is implementation-defined, -but the user can supply their own allocator class. +The default allocator for memory objects is implementation-defined, but the user +can supply their own allocator class. [source,,linenums] ---- @@ -4165,46 +4090,46 @@ but the user can supply their own allocator class. } ---- -When an allocator returns a [code]#nullptr#, the runtime cannot allocate data on the -host. Note that in this case the runtime will raise an error if it requires -host memory but it is not available (e.g when moving data across <> +When an allocator returns a [code]#nullptr#, the runtime cannot allocate data on +the host. +Note that in this case the runtime will raise an error if it requires host +memory but it is not available (e.g when moving data across <> contexts). -In some cases, the implementation may retain a copy of the allocator object -even after the buffer is destroyed. For example, this can happen when the -buffer object is destroyed before commands using accessors to the buffer have -completed. Therefore, the application must be prepared for calls to the -allocator even after the buffer is destroyed. +In some cases, the implementation may retain a copy of the allocator object even +after the buffer is destroyed. +For example, this can happen when the buffer object is destroyed before commands +using accessors to the buffer have completed. +Therefore, the application must be prepared for calls to the allocator even +after the buffer is destroyed. [NOTE] ==== If the application needs to know when the implementation has destroyed all -copies of the allocator, it can maintain a reference count within the -allocator. +copies of the allocator, it can maintain a reference count within the allocator. ==== -The definition of allocators extends the current functionality of SYCL, -ensuring that users can define allocator functions for specific hardware or -certain complex shared memory mechanisms (e.g. NUMA), and improves -interoperability with STL-based libraries (e.g, Intel's TBB provides an -allocator). +The definition of allocators extends the current functionality of SYCL, ensuring +that users can define allocator functions for specific hardware or certain +complex shared memory mechanisms (e.g. NUMA), and improves interoperability with +STL-based libraries (e.g, Intel's TBB provides an allocator). [[subsec:default.allocators]] ==== Default allocators -A default allocator is always defined by the implementation. For allocations -greater than size zero, it is guaranteed to return non-[code]#nullptr# and -new memory positions every call. -The default allocator for const buffers will remove the const-ness of the -type (therefore, the default allocator for a buffer of type [code]#const int# -will be an [code]#Allocator)#. +A default allocator is always defined by the implementation. +For allocations greater than size zero, it is guaranteed to return +non-[code]#nullptr# and new memory positions every call. +The default allocator for const buffers will remove the const-ness of the type +(therefore, the default allocator for a buffer of type [code]#const int# will be +an [code]#Allocator)#. This implies that host <> will not share memory with the pointer given by the user in the buffer/image constructor, but will use the memory returned by the [code]#Allocator# itself for that purpose. -The user can implement an allocator that returns the same address as the -one passed in the buffer constructor, but it is the responsibility of the -user to handle the potential race conditions. +The user can implement an allocator that returns the same address as the one +passed in the buffer constructor, but it is the responsibility of the user to +handle the potential race conditions. [[table.default.allocators]] @@ -4233,84 +4158,87 @@ image_allocator |==== -See <> for details of using manual synchronization to avoid -data races between host and device. +See <> for details of using manual synchronization to avoid data +races between host and device. [[subsec:buffers]] === Buffers -The [code]#buffer# class defines a shared array of one, two or three -dimensions that can be used by the SYCL <> and has to be accessed using -<> classes. Buffers are templated on both the type of their data, -and the number of dimensions that the data is stored and accessed through. +The [code]#buffer# class defines a shared array of one, two or three dimensions +that can be used by the SYCL <> and has to be accessed using +<> classes. +Buffers are templated on both the type of their data, and the number of +dimensions that the data is stored and accessed through. -A [code]#buffer# does not map to only one underlying backend -object, and all <> memory objects may be temporary for use -within a command group on a specific device. +A [code]#buffer# does not map to only one underlying backend object, and all +<> memory objects may be temporary for use within a command group on a +specific device. The underlying data type of a buffer [code]#T# must be <> as -defined in <>. Some overloads of the [code]#buffer# -constructor initialize the buffer contents by copying objects from host memory -while other overloads construct the buffer without copying objects from the -host. For the overloads that do not copy host objects, the initial state of -the objects in the buffer depends on whether [code]#T# is an implicit-lifetime -type (as defined in the {cpp} core language). If [code]#T# is an -implicit-lifetime type, objects of that type are implicitly created in the -buffer with indeterminate values. For other types, these constructor overloads -merely allocate uninitialized memory, and the application is responsible for -constructing objects by calling placement-new and for destroying them later -by manually calling the object's destructor. +defined in <>. +Some overloads of the [code]#buffer# constructor initialize the buffer contents +by copying objects from host memory while other overloads construct the buffer +without copying objects from the host. +For the overloads that do not copy host objects, the initial state of the +objects in the buffer depends on whether [code]#T# is an implicit-lifetime type +(as defined in the {cpp} core language). +If [code]#T# is an implicit-lifetime type, objects of that type are implicitly +created in the buffer with indeterminate values. +For other types, these constructor overloads merely allocate uninitialized +memory, and the application is responsible for constructing objects by calling +placement-new and for destroying them later by manually calling the object's +destructor. For the overloads that do copy objects from host memory, the [code]#hostData# -pointer must point to at least _N_ bytes of memory where _N_ is -[code]#sizeof(T) * bufferRange.size()#. If _N_ is zero, [code]#hostData# is -permitted to be a null pointer. - -A SYCL [code]#buffer# can construct an instance of a SYCL [code]#buffer# -that reinterprets the original SYCL [code]#buffer# with a different -type, dimensionality and range using the member function -[code]#reinterpret#. The reinterpreted SYCL [code]#buffer# that is -constructed must behave as though it were a copy of the SYCL [code]#buffer# -that constructed it (see <>) with the exception -that the type, dimensionality and range of the reinterpreted SYCL -[code]#buffer# must reflect the type, dimensionality and range specified -when calling the [code]#reinterpret# member function. By extension of this, -the class member types [code]#value_type#, [code]#reference# and -[code]#const_reference#, and the member functions [code]#get_range()# -and [code]#size()# of the reinterpreted SYCL [code]#buffer# must -reflect the new type, dimensionality and range. The data that the original SYCL -[code]#buffer# and the reinterpreted SYCL [code]#buffer# manage -remains unaffected, though the representation of the data when accessed through -the reinterpreted SYCL [code]#buffer# may alter to reflect the new type, -dimensionality and range. It is important to note that a reinterpreted SYCL -[code]#buffer# is a copy of the original SYCL [code]#buffer# only, -and not a new SYCL [code]#buffer#. Constructing more than one SYCL -[code]#buffer# managing the same host pointer is still undefined behavior. - -The SYCL [code]#buffer# class template provides the common reference -semantics (see <>). +pointer must point to at least _N_ bytes of memory where _N_ is [code]#sizeof(T) * +bufferRange.size()#. +If _N_ is zero, [code]#hostData# is permitted to be a null pointer. + +A SYCL [code]#buffer# can construct an instance of a SYCL [code]#buffer# that +reinterprets the original SYCL [code]#buffer# with a different type, +dimensionality and range using the member function [code]#reinterpret#. +The reinterpreted SYCL [code]#buffer# that is constructed must behave as though +it were a copy of the SYCL [code]#buffer# that constructed it (see +<>) with the exception that the type, dimensionality +and range of the reinterpreted SYCL [code]#buffer# must reflect the type, +dimensionality and range specified when calling the [code]#reinterpret# member +function. +By extension of this, the class member types [code]#value_type#, +[code]#reference# and [code]#const_reference#, and the member functions +[code]#get_range()# and [code]#size()# of the reinterpreted SYCL [code]#buffer# +must reflect the new type, dimensionality and range. +The data that the original SYCL [code]#buffer# and the reinterpreted SYCL +[code]#buffer# manage remains unaffected, though the representation of the data +when accessed through the reinterpreted SYCL [code]#buffer# may alter to reflect +the new type, dimensionality and range. +It is important to note that a reinterpreted SYCL [code]#buffer# is a copy of +the original SYCL [code]#buffer# only, and not a new SYCL [code]#buffer#. +Constructing more than one SYCL [code]#buffer# managing the same host pointer is +still undefined behavior. + +The SYCL [code]#buffer# class template provides the common reference semantics +(see <>). ==== Buffer interface -The constructors and member functions of the SYCL [code]#buffer# class -template are listed in <> and -<>, respectively. The additional common special -member functions and common member functions are listed in -<> and +The constructors and member functions of the SYCL [code]#buffer# class template +are listed in <> and <>, +respectively. +The additional common special member functions and common member functions are +listed in <> and <>, respectively. Each constructor takes as the last parameter an optional SYCL -[code]#property_list# to provide properties to the SYCL -[code]#buffer#. +[code]#property_list# to provide properties to the SYCL [code]#buffer#. The SYCL [code]#buffer# class template takes a template parameter -[code]#AllocatorT# for specifying an allocator which is used by -the <> when allocating temporary memory on the -host. If no template argument is provided, then the default allocator -for the SYCL [code]#buffer# class [code]#buffer_allocator# -will be used (see <>). +[code]#AllocatorT# for specifying an allocator which is used by the +<> when allocating temporary memory on the host. +If no template argument is provided, then the default allocator for the SYCL +[code]#buffer# class [code]#buffer_allocator# will be used (see +<>). // Interface for class: buffer @@ -4914,9 +4842,8 @@ reinterpret() const [[sec:buffer-properties]] ==== Buffer properties -The properties that can be provided when constructing the SYCL -[code]#buffer# class are describe in -<>. +The properties that can be provided when constructing the SYCL [code]#buffer# +class are describe in <>. [[table.properties.buffer]] @@ -4963,9 +4890,8 @@ property::buffer::context_bound |==== -The constructors and special member functions of the buffer property -classes are listed in -<> and +The constructors and special member functions of the buffer property classes are +listed in <> and <> respectively. @@ -5027,122 +4953,126 @@ context property::buffer::context_bound::get_context() const [[sec:buf-sync-rules]] ==== Buffer destruction rules -Buffers are reference-counted. When a buffer value is constructed -from another buffer, the two values reference the same buffer and a -reference count is incremented. When a buffer value is destroyed, -the reference count is decremented. Only when there are no more -buffer values that reference a specific buffer is the actual -buffer destroyed and the buffer destruction behavior defined -below is followed. +Buffers are reference-counted. +When a buffer value is constructed from another buffer, the two values reference +the same buffer and a reference count is incremented. +When a buffer value is destroyed, the reference count is decremented. +Only when there are no more buffer values that reference a specific buffer is +the actual buffer destroyed and the buffer destruction behavior defined below is +followed. -If any error occurs on buffer destruction, it is reported -via the associated queue's asynchronous error handling mechanism. +If any error occurs on buffer destruction, it is reported via the associated +queue's asynchronous error handling mechanism. -The basic rule for the blocking behavior of a buffer destructor is -that it blocks if there is some data to write back because a -write accessor on it has been created, or if the buffer was constructed -with attached host memory and is still in use. +The basic rule for the blocking behavior of a buffer destructor is that it +blocks if there is some data to write back because a write accessor on it has +been created, or if the buffer was constructed with attached host memory and is +still in use. More precisely: . A buffer can be constructed from a [code]#range# (and without a - [code]#hostData# pointer). The memory management for this type of buffer - is entirely handled by the SYCL system. The destructor for this type of - buffer does not need to block, even if work on the buffer has not - completed. Instead, the SYCL system frees any storage required for the - buffer asynchronously when it is no longer in use in queues. The initial - contents of the buffer are unspecified. - . A buffer can be constructed from a [code]#hostData# pointer. The buffer - will use this host memory for its full lifetime, but the contents of this - host memory are unspecified for the lifetime of the buffer. If the host - memory is modified on the host or if it is used to construct another - buffer or image during the lifetime of this buffer, then the results are - undefined. The initial contents of the buffer will be the contents of the - host memory at the time of construction. + [code]#hostData# pointer). + The memory management for this type of buffer is entirely handled by the + SYCL system. + The destructor for this type of buffer does not need to block, even if work + on the buffer has not completed. + Instead, the SYCL system frees any storage required for the buffer + asynchronously when it is no longer in use in queues. + The initial contents of the buffer are unspecified. + . A buffer can be constructed from a [code]#hostData# pointer. + The buffer will use this host memory for its full lifetime, but the contents + of this host memory are unspecified for the lifetime of the buffer. + If the host memory is modified on the host or if it is used to construct + another buffer or image during the lifetime of this buffer, then the results + are undefined. + The initial contents of the buffer will be the contents of the host memory + at the time of construction. + -- -When the buffer is destroyed, the destructor will block until all -work in queues on the buffer have completed, then copy the contents -of the buffer back to the host memory (if required) and then -return. +When the buffer is destroyed, the destructor will block until all work in queues +on the buffer have completed, then copy the contents of the buffer back to the +host memory (if required) and then return. .. If the type of the host data is [code]#const#, then the buffer is - read-only; only read accessors are allowed on the buffer and - no-copy-back to host memory is performed (although the host memory must - still be kept available for use by SYCL). When using the default buffer - allocator, the const-ness of the type will be removed in order to allow - host allocation of memory, which will allow temporary host copies of the - data by the <>, for example for speeding up host - accesses. + read-only; only read accessors are allowed on the buffer and no-copy-back to + host memory is performed (although the host memory must still be kept + available for use by SYCL). + When using the default buffer allocator, the const-ness of the type will be + removed in order to allow host allocation of memory, which will allow + temporary host copies of the data by the <>, for example for + speeding up host accesses. + -When the buffer is destroyed, the destructor will block until all work -in queues on the buffer have completed and then return, as there is no -copy of data back to host. - .. If the type of the host data is not [code]#const# but the pointer - to host data is [code]#const#, then the read-only restriction - applies only on host and not on device accesses. +When the buffer is destroyed, the destructor will block until all work in queues +on the buffer have completed and then return, as there is no copy of data back +to host. + .. If the type of the host data is not [code]#const# but the pointer to host + data is [code]#const#, then the read-only restriction applies only on host + and not on device accesses. + -When the buffer is destroyed, the destructor will block until all work -in queues on the buffer have completed. +When the buffer is destroyed, the destructor will block until all work in queues +on the buffer have completed. -- - . A buffer can be constructed using a [code]#shared_ptr# to host - data. This pointer is shared between the SYCL application and the - runtime. In order to allow synchronization between the application and - the runtime a [code]#mutex# is used which will be locked by the - runtime whenever the data is in use, and unlocked when it is no longer - needed. + . A buffer can be constructed using a [code]#shared_ptr# to host data. + This pointer is shared between the SYCL application and the runtime. + In order to allow synchronization between the application and the runtime a + [code]#mutex# is used which will be locked by the runtime whenever the data + is in use, and unlocked when it is no longer needed. + -- -The [code]#shared_ptr# reference counting is used in order to prevent -destroying the buffer host data prematurely. If the [code]#shared_ptr# -is deleted from the user application before buffer destruction, the buffer -can continue securely because the pointer hasn't been destroyed yet. It will -not copy data back to the host before destruction, however, as the +The [code]#shared_ptr# reference counting is used in order to prevent destroying +the buffer host data prematurely. +If the [code]#shared_ptr# is deleted from the user application before buffer +destruction, the buffer can continue securely because the pointer hasn't been +destroyed yet. +It will not copy data back to the host before destruction, however, as the application side has already deleted its copy. -Note that since there is an implicit conversion of a -[code]#std::unique_ptr# to a [code]#std::shared_ptr#, a -[code]#std::unique_ptr# can also be used to pass the ownership to the -<>. +Note that since there is an implicit conversion of a [code]#std::unique_ptr# to +a [code]#std::shared_ptr#, a [code]#std::unique_ptr# can also be used to pass +the ownership to the <>. -- - . A buffer can be constructed from a pair of iterator values. In this - case, the buffer construction will copy the data from the data range - defined by the iterator pair. The destructor will not copy back any data - and does not need to block. + . A buffer can be constructed from a pair of iterator values. + In this case, the buffer construction will copy the data from the data range + defined by the iterator pair. + The destructor will not copy back any data and does not need to block. . A buffer can be constructed from a container on which - [code]#std::data(container)# and [code]#std::size(container)# - are well-formed. The initial contents of the buffer will - be the contents of the container at the time of construction. + [code]#std::data(container)# and [code]#std::size(container)# are + well-formed. + The initial contents of the buffer will be the contents of the container at + the time of construction. + -- -The buffer may use the memory within the container for its full -lifetime, and the contents of this memory are unspecified for the -lifetime of the buffer. If the container memory is modified by the host -during the lifetime of this buffer, then the results are undefined. - -When the buffer is destroyed, the destructor will block until all work in -queues on the buffer have completed. If the return type of -[code]#std::data(container)# is not [code]#const# then the destructor will also -copy the contents of the buffer to the container (if required). +The buffer may use the memory within the container for its full lifetime, and +the contents of this memory are unspecified for the lifetime of the buffer. +If the container memory is modified by the host during the lifetime of this +buffer, then the results are undefined. + +When the buffer is destroyed, the destructor will block until all work in queues +on the buffer have completed. +If the return type of [code]#std::data(container)# is not [code]#const# then the +destructor will also copy the contents of the buffer to the container (if +required). -- -If [code]#set_final_data()# is used to change where to write the -data back to, then the destructor of the buffer will block if a -write accessor on it has been created. +If [code]#set_final_data()# is used to change where to write the data back to, +then the destructor of the buffer will block if a write accessor on it has been +created. -A sub-buffer object can be created which is a sub-range reference to a -base buffer. This sub-buffer can be used to create accessors to the -base buffer, which have access to the range specified at time -of construction of the sub-buffer. Sub-buffers cannot be created from -sub-buffers, but only from a base buffer which is not already a sub-buffer. +A sub-buffer object can be created which is a sub-range reference to a base +buffer. +This sub-buffer can be used to create accessors to the base buffer, which have +access to the range specified at time of construction of the sub-buffer. +Sub-buffers cannot be created from sub-buffers, but only from a base buffer +which is not already a sub-buffer. -Sub-buffers must be constructed from a contiguous region of memory in a -buffer. This requirement is potentially non-intuitive when working with -buffers that have dimensionality larger than one, but maps to -one-dimensional <> native allocations without performance cost due -to index mapping computation. For example: +Sub-buffers must be constructed from a contiguous region of memory in a buffer. +This requirement is potentially non-intuitive when working with buffers that +have dimensionality larger than one, but maps to one-dimensional <> +native allocations without performance cost due to index mapping computation. +For example: [source,,linenums] ---- @@ -5153,18 +5083,18 @@ include::{code_dir}/subbuffer.cpp[lines=4..-1] [[subsec:images]] === Images -The classes [code]#unsampled_image# -(<>) and [code]#sampled_image# -(<>) define shared image data of one, -two or three dimensions, that can be used by kernels in queues and have to be -accessed using the image <> classes. +The classes [code]#unsampled_image# (<>) and +[code]#sampled_image# (<>) define shared image +data of one, two or three dimensions, that can be used by kernels in queues and +have to be accessed using the image <> classes. -The constructors and member functions of the SYCL [code]#unsampled_image# -and [code]#sampled_image# class templates are listed in +The constructors and member functions of the SYCL [code]#unsampled_image# and +[code]#sampled_image# class templates are listed in <>, <>, <> and <>, -respectively. The additional common special member functions and common member -functions are listed in <> and +respectively. +The additional common special member functions and common member functions are +listed in <> and <>, respectively. Where relevant, it is the responsibility of the user to ensure that the format @@ -5174,9 +5104,9 @@ The allocator template parameter of the SYCL [code]#unsampled_image# and [code]#sampled_image# classes can be any allocator type including a custom allocator, however it must allocate in units of [code]#std::byte#. -For any image that is constructed with the range latexmath:[(r1,r2,r3)] with an element -type size in bytes of _s_, the image row pitch and image slice pitch should be -calculated as follows: +For any image that is constructed with the range latexmath:[(r1,r2,r3)] with an +element type size in bytes of _s_, the image row pitch and image slice pitch +should be calculated as follows: [[image-row-pitch]] [latexmath] @@ -5190,26 +5120,24 @@ r1 \cdot s r1 \cdot r2 \cdot s ++++ -The SYCL [code]#unsampled_image# and [code]#sampled_image# class -templates provide the common reference semantics -(see <>). +The SYCL [code]#unsampled_image# and [code]#sampled_image# class templates +provide the common reference semantics (see <>). ==== Unsampled image interface -Each constructor of the [code]#unsampled_image# takes an -[code]#image_format# to describe the data layout of the image data. +Each constructor of the [code]#unsampled_image# takes an [code]#image_format# to +describe the data layout of the image data. Each constructor additionally takes as the last parameter an optional SYCL -[code]#property_list# to provide properties to the SYCL -[code]#unsampled_image#. +[code]#property_list# to provide properties to the SYCL [code]#unsampled_image#. The SYCL [code]#unsampled_image# class template takes a template parameter [code]#AllocatorT# for specifying an allocator which is used by the -<> when allocating temporary memory on the host. If no template -argument is provided, the default allocator for the SYCL -[code]#unsampled_image# class [code]#image_allocator# is used -(see <>). +<> when allocating temporary memory on the host. +If no template argument is provided, the default allocator for the SYCL +[code]#unsampled_image# class [code]#image_allocator# is used (see +<>). // Interface for class: unsampled image [source,,linenums] @@ -5705,15 +5633,13 @@ have any effect. ==== Sampled image interface -Each constructor of the [code]#sampled_image# class requires a -pointer to the host data the image will sample, an -[code]#image_format# to describe the data layout and an -[code]#image_sampler# (<>) to describe -how to sample the image data. +Each constructor of the [code]#sampled_image# class requires a pointer to the +host data the image will sample, an [code]#image_format# to describe the data +layout and an [code]#image_sampler# (<>) to describe how to +sample the image data. Each constructor additionally takes as the last parameter an optional SYCL -[code]#property_list# to provide properties to the SYCL -[code]#sampled_image#. +[code]#property_list# to provide properties to the SYCL [code]#sampled_image#. // Interface for class: sampled image [source,,linenums] @@ -5931,8 +5857,8 @@ host_sampled_image_accessor get_host_access() ==== Image properties The properties that can be provided when constructing the SYCL -[code]#unsampled_image# and [code]#sampled_image# classes are -describe in <>. +[code]#unsampled_image# and [code]#sampled_image# classes are describe in +<>. // Interface for image properties [source,,linenums] @@ -5977,8 +5903,8 @@ property::image::context_bound |==== -The constructors and member functions of the image [code]#property# classes -are listed in <> and +The constructors and member functions of the image [code]#property# classes are +listed in <> and <> @@ -6042,65 +5968,67 @@ context property::image::context_bound::get_context() const The rules are similar to those described in <>. -For the lifetime of the image object, the associated host memory must -be left available to the <> and the contents of the associated -host memory is unspecified until the image object is destroyed. If an -image object value is copied, then only a reference to the underlying -image object is copied. The underlying image object is reference-counted. -Only after all image value references to the underlying image object -have been destroyed is the actual image object itself destroyed. - -If an image object is constructed with associated host memory, then -its destructor blocks until all operations in all SYCL queues on -that image object have completed. Any modifications to the image data -will be copied back, if necessary, to the associated host memory. -Any errors occurring during destruction are reported to any associated -context's asynchronous error handler. If an image object is constructed -with a storage object, then the storage object defines what -blocking or copying behavior occurs on image object destruction. +For the lifetime of the image object, the associated host memory must be left +available to the <> and the contents of the associated host memory +is unspecified until the image object is destroyed. +If an image object value is copied, then only a reference to the underlying +image object is copied. +The underlying image object is reference-counted. +Only after all image value references to the underlying image object have been +destroyed is the actual image object itself destroyed. + +If an image object is constructed with associated host memory, then its +destructor blocks until all operations in all SYCL queues on that image object +have completed. +Any modifications to the image data will be copied back, if necessary, to the +associated host memory. +Any errors occurring during destruction are reported to any associated context's +asynchronous error handler. +If an image object is constructed with a storage object, then the storage object +defines what blocking or copying behavior occurs on image object destruction. [[sec:sharing-host-memory-with-dm]] === Sharing host memory with the SYCL data management classes -In order to allow the <> to do memory management and allow -for data dependencies, there are two classes defined, buffer and image. The -default behavior for them is that a "`raw`" pointer is given during the +In order to allow the <> to do memory management and allow for +data dependencies, there are two classes defined, buffer and image. +The default behavior for them is that a "`raw`" pointer is given during the construction of the data management class, with full ownership to use it until the destruction of the SYCL object. -In this section we go in greater detail on sharing or explicitly not -sharing host memory with the SYCL data classes, and we will use the buffer -class as an example. The same rules will apply to images as well. +In this section we go in greater detail on sharing or explicitly not sharing +host memory with the SYCL data classes, and we will use the buffer class as an +example. +The same rules will apply to images as well. ==== Default behavior When using a SYCL buffer, the ownership of the pointer passed to the constructor -of the class is, by default, passed to <>, and that pointer cannot be used -on the host side until the buffer or image is destroyed. -A SYCL application can access the contents of the memory managed by a SYCL buffer -by using a [code]#host_accessor# as defined in <>. -However, there is no guarantee that the host accessor will copy data back to -the original host address used in its constructor. +of the class is, by default, passed to <>, and that pointer cannot +be used on the host side until the buffer or image is destroyed. +A SYCL application can access the contents of the memory managed by a SYCL +buffer by using a [code]#host_accessor# as defined in <>. +However, there is no guarantee that the host accessor will copy data back to the +original host address used in its constructor. The pointer passed in is the one used to copy data back to the host, if needed, -before buffer destruction. The memory pointed by <> -will not be de-allocated by the runtime, -and the data is copied back from the device if there is -a need for it. +before buffer destruction. +The memory pointed by <> will not be de-allocated by the runtime, +and the data is copied back from the device if there is a need for it. ==== SYCL ownership of the host memory -In the case where there is host memory to be used for initialization of data -but there is no intention of using that host memory after the buffer is -destroyed, then the buffer can take full ownership of that host memory. +In the case where there is host memory to be used for initialization of data but +there is no intention of using that host memory after the buffer is destroyed, +then the buffer can take full ownership of that host memory. -When a buffer owns the <> there is no copy back, by -default. In this situation, the SYCL application may pass a unique -pointer to the host data, which will be then used by the runtime -internally to initialize the data in the device. +When a buffer owns the <> there is no copy back, by default. +In this situation, the SYCL application may pass a unique pointer to the host +data, which will be then used by the runtime internally to initialize the data +in the device. For example, the following could be used: @@ -6114,10 +6042,9 @@ For example, the following could be used: } ---- -However, optionally the [code]#buffer::set_final_data()# can be -set to a [code]#std::weak_ptr# to enable copying data -back, to another host memory address that is going to be valid after -buffer construction. +However, optionally the [code]#buffer::set_final_data()# can be set to a +[code]#std::weak_ptr# to enable copying data back, to another host memory +address that is going to be valid after buffer construction. [source,,linenums] ---- @@ -6134,25 +6061,26 @@ buffer construction. ==== Shared SYCL ownership of the host memory -When an instance of [code]#std::shared_ptr# is passed to the buffer -constructor, then the buffer object and the developer's application share -the memory region. If the shared pointer is still used on the application's -side then the data will be copied back from the buffer or image and will be -available to the application after the buffer or image is destroyed. - -If the [code]#shared_ptr# is not empty, the contents of the referenced -memory are used to initialize the buffer. If the [code]#shared_ptr# is -empty, then the buffer is created with uninitialized memory. - -When the buffer is destroyed and the data have potentially been updated, if -the number of copies of the shared pointer outside the runtime is 0, there -is no user-side shared pointer to read the data. Therefore the data is not -copied out, and the buffer destructor does not need to wait for the data -processes to be finished, as the outcome is not needed on the application's -side. - -This behavior can be overridden using the [code]#set_final_data()# -member function of the buffer class, which will by any means force the buffer +When an instance of [code]#std::shared_ptr# is passed to the buffer constructor, +then the buffer object and the developer's application share the memory region. +If the shared pointer is still used on the application's side then the data will +be copied back from the buffer or image and will be available to the application +after the buffer or image is destroyed. + +If the [code]#shared_ptr# is not empty, the contents of the referenced memory +are used to initialize the buffer. +If the [code]#shared_ptr# is empty, then the buffer is created with +uninitialized memory. + +When the buffer is destroyed and the data have potentially been updated, if the +number of copies of the shared pointer outside the runtime is 0, there is no +user-side shared pointer to read the data. +Therefore the data is not copied out, and the buffer destructor does not need to +wait for the data processes to be finished, as the outcome is not needed on the +application's side. + +This behavior can be overridden using the [code]#set_final_data()# member +function of the buffer class, which will by any means force the buffer destructor to wait until the data is copied to wherever the [code]#set_final_data()# member function has put the data (or not wait nor copy if set final data is [code]#nullptr)#. @@ -6186,14 +6114,14 @@ if set final data is [code]#nullptr)#. [[subsec:mutex]] === Synchronization primitives -When the user wants to use the [code]#buffer# simultaneously in -the <> and their own code (e.g. a multi-threaded -mechanism) and wants to use manual synchronization without using a -[code]#host_accessor#, a [code]#std::mutex# can be passed to the -[code]#buffer# constructor via the right [code]#property#. +When the user wants to use the [code]#buffer# simultaneously in the +<> and their own code (e.g. a multi-threaded mechanism) and wants +to use manual synchronization without using a [code]#host_accessor#, a +[code]#std::mutex# can be passed to the [code]#buffer# constructor via the right +[code]#property#. -The runtime promises to lock the mutex whenever the data is in use and -unlock it when it no longer needs it. +The runtime promises to lock the mutex whenever the data is in use and unlock it +when it no longer needs it. [source,,linenums] ---- @@ -6222,15 +6150,15 @@ changed using the member function [code]#set_final_data()#. // \input{accessors} // %%%%%%%%%%%%%%%%%%%%%%%%%%%% begin accessors %%%%%%%%%%%%%%%%%%%%%%%%%%%% -<> provide three different capabilities: they provide -access to the data managed by a <> or <>, they provide access -to local memory on a <>, and they define the *requirements* to memory -objects which determine the scheduling of <> (see +<> provide three different capabilities: they provide access +to the data managed by a <> or <>, they provide access to local +memory on a <>, and they define the *requirements* to memory objects +which determine the scheduling of <> (see <>). A memory object requirement is created when an accessor is constructed, unless -the accessor is a placeholder in which case the requirement is created when -the accessor is bound to a <> by calling [code]#handler::require()#. +the accessor is a placeholder in which case the requirement is created when the +accessor is bound to a <> by calling [code]#handler::require()#. There are several different {cpp} classes that implement accessors: @@ -6238,8 +6166,8 @@ There are several different {cpp} classes that implement accessors: within a <>. * The [code]#host_accessor# class provides access to data in a [code]#buffer# - from host code that is outside of a <>. These accessors are - typically used in <>. + from host code that is outside of a <>. + These accessors are typically used in <>. * The [code]#local_accessor# class provides access to device local memory from within a <>. @@ -6251,31 +6179,35 @@ There are several different {cpp} classes that implement accessors: * The [code]#host_unsampled_image_accessor# and [code]#host_sampled_image_accessor# classes provide access to data in an [code]#unsampled_image# and [code]#sampled_image# from host code that is - outside of a <>. These accessors are typically used in - <>. + outside of a <>. + These accessors are typically used in <>. Accessor objects must always be constructed in host code, either in -<> or in <>. Whether the constructor -blocks until data is available depends on the type of accessor. Those -accessors which provide access to data within a <> do not block. -Instead, these accessors define a requirement which influences the scheduling -of the <>. Those accessors which provide access to data from host -code do block until the data is available on the host. +<> or in <>. +Whether the constructor blocks until data is available depends on the type of +accessor. +Those accessors which provide access to data within a <> do not block. +Instead, these accessors define a requirement which influences the scheduling of +the <>. +Those accessors which provide access to data from host code do block until the +data is available on the host. For those accessors which provide access to data within a <>, the member functions which access data should only be called from within the -<>. Programs which call these member functions from outside of the -<> are ill formed. The sections below describe exactly which member -functions fall into this category. +<>. +Programs which call these member functions from outside of the <> are +ill formed. +The sections below describe exactly which member functions fall into this +category. ==== Data type All accessors have a [code]#DataT# template parameter which specifies the type -of each element that the accessor accesses. For [code]#accessor# and -[code]#host_accessor#, this type must either match the type of each element in -the underlying [code]#buffer#, or it must be a [code]#const# qualified version -of that type. +of each element that the accessor accesses. +For [code]#accessor# and [code]#host_accessor#, this type must either match the +type of each element in the underlying [code]#buffer#, or it must be a +[code]#const# qualified version of that type. For the image accessors ([code]#unsampled_image_accessor#, [code]#sampled_image_accessor#, [code]#host_unsampled_image_accessor#, and @@ -6293,15 +6225,16 @@ For [code]#local_accessor# see <> for the allowable ==== Access modes Most accessors have an [code]#AccessMode# template parameter which specifies -whether the accessor can read or write the underlying data. This information -is used by the runtime when defining the requirements for the associated -<>, and it tells the runtime whether data needs to be transferred to -or from a device before data can be accessed through the accessor. +whether the accessor can read or write the underlying data. +This information is used by the runtime when defining the requirements for the +associated <>, and it tells the runtime whether data needs to be +transferred to or from a device before data can be accessed through the +accessor. The [code]#access_mode# enumeration, shown in <>, -describes the potential modes of an accessor. However, not all accessor -classes support all modes, so see the description of each class for more -details. +describes the potential modes of an accessor. +However, not all accessor classes support all modes, so see the description of +each class for more details. [source,,linenums] ---- @@ -6340,28 +6273,29 @@ access_mode::read_write ==== Deduction tags Some accessor constructors take a [code]#TagT# parameter, which is used to -deduce template arguments for the constructor's class. Each of the access -modes in <> has an associated tag, but there are -additional tags which set other template parameters in addition to the access -mode. The synopsis below shows the namespace scope variables that the -implementation provides as possible values for the [code]#TagT# parameter. +deduce template arguments for the constructor's class. +Each of the access modes in <> has an associated +tag, but there are additional tags which set other template parameters in +addition to the access mode. +The synopsis below shows the namespace scope variables that the implementation +provides as possible values for the [code]#TagT# parameter. [source,,linenums] ---- include::{header_dir}/accessTags.h[lines=4..-1] ---- -The precise meaning of these tags depends on the specific accessor class -that is being constructed, so they are described more fully below in the -section that pertains to each of the accessor types. +The precise meaning of these tags depends on the specific accessor class that is +being constructed, so they are described more fully below in the section that +pertains to each of the accessor types. ==== Properties All accessor constructors accept a [code]#property_list# parameter, which -affects the semantics of the accessor. <> shows -the set of all possible accessor properties and tells which properties are -allowed when constructing each accessor class. +affects the semantics of the accessor. +<> shows the set of all possible accessor properties +and tells which properties are allowed when constructing each accessor class. [source,,linenums] ---- @@ -6412,15 +6346,16 @@ this range is preserved. [NOTE] ==== -As stated above, the [code]#property::no_init# property requires the -application to construct an object for each accessor element when the element's -type is not an implicit-lifetime type (except in the case when the -corresponding buffer element did not previously contain an object). The reason -for this requirement is to avoid the possibility of overwriting a valid object -with indeterminate bytes, for example, when a <> using the accessor -completes. This means that the implementation can unconditionally copy memory -from the device back to the host when the <> completes, regardless of -whether the [code]#DataT# type is an implicit-lifetime type. +As stated above, the [code]#property::no_init# property requires the application +to construct an object for each accessor element when the element's type is not +an implicit-lifetime type (except in the case when the corresponding buffer +element did not previously contain an object). +The reason for this requirement is to avoid the possibility of overwriting a +valid object with indeterminate bytes, for example, when a <> using the +accessor completes. +This means that the implementation can unconditionally copy memory from the +device back to the host when the <> completes, regardless of whether +the [code]#DataT# type is an implicit-lifetime type. ==== The constructors of the accessor property classes are listed in @@ -6443,79 +6378,86 @@ property::no_init::no_init() ==== Read only accessors -Accessors which have an [code]#AccessMode# template parameter can be declared -as read-only by specifying [code]#access_mode::read# for the template -parameter. A read-only accessor provides read-only access to the underlying -data and provides a "read" requirement for the memory object when it is -constructed. +Accessors which have an [code]#AccessMode# template parameter can be declared as +read-only by specifying [code]#access_mode::read# for the template parameter. +A read-only accessor provides read-only access to the underlying data and +provides a "read" requirement for the memory object when it is constructed. -The [code]#DataT# template parameter for a read-only accessor can optionally -be [code]#const# qualified, and the semantics of the accessor are unchanged. +The [code]#DataT# template parameter for a read-only accessor can optionally be +[code]#const# qualified, and the semantics of the accessor are unchanged. For example, an accessor declared with [code]#const DataT# and [code]#access_mode::read# has the same semantics as an accessor declared with [code]#DataT# and [code]#access_mode::read#. -As detailed in the sections below, some accessor types have a default value -for [code]#AccessMode#, which depends on whether the [code]#DataT# parameter -is [code]#const# qualified. This provides a convenient way to declare a -read-only accessor without explicitly specifying the access mode. +As detailed in the sections below, some accessor types have a default value for +[code]#AccessMode#, which depends on whether the [code]#DataT# parameter is +[code]#const# qualified. +This provides a convenient way to declare a read-only accessor without +explicitly specifying the access mode. A [code]#const# qualified [code]#DataT# is only allowed for a read-only -accessor. Programs which specify a [code]#const# qualified [code]#DataT# and -any access mode other than [code]#access_mode::read# are ill formed, and the -implementation must issue a diagnostic in this case. +accessor. +Programs which specify a [code]#const# qualified [code]#DataT# and any access +mode other than [code]#access_mode::read# are ill formed, and the implementation +must issue a diagnostic in this case. -Each accessor class also provides implicit conversions between the two forms -of read-only accessors. This makes it possible, for example, to assign an -accessor whose type has [code]#const DataT# and [code]#access_mode::read# to an -accessor whose type has [code]#DataT# and [code]#access_mode::read#, so long as -the other template parameters are the same. There is also an implicit -conversion from a read-write accessor to either of the forms of a read-only -accessor. These implicit conversions are described in detail for each accessor -class in the sections that follow. +Each accessor class also provides implicit conversions between the two forms of +read-only accessors. +This makes it possible, for example, to assign an accessor whose type has +[code]#const DataT# and [code]#access_mode::read# to an accessor whose type has +[code]#DataT# and [code]#access_mode::read#, so long as the other template +parameters are the same. +There is also an implicit conversion from a read-write accessor to either of the +forms of a read-only accessor. +These implicit conversions are described in detail for each accessor class in +the sections that follow. ==== Accessing elements of an accessor Accessors of type [code]#accessor#, [code]#host_accessor#, and -[code]#local_accessor# can have zero, one, two, or three Dimensions. A zero -dimension accessor provides access to a single scalar element via an implicit -conversion operator to the underlying type of that element and via an overloaded -copy/move assignment operators from the underlying type of the element. - -One, two, or three dimensional specializations of these accessors provide -access to the elements they contain in two ways. The first way is through a -subscript operator that takes an instance of an [code]#id# class which has the -same dimensionality as the accessor. The second way is by passing a single -[code]#size_t# value to multiple consecutive subscript operators as specified -in <>. +[code]#local_accessor# can have zero, one, two, or three Dimensions. +A zero dimension accessor provides access to a single scalar element via an +implicit conversion operator to the underlying type of that element and via an +overloaded copy/move assignment operators from the underlying type of the +element. + +One, two, or three dimensional specializations of these accessors provide access +to the elements they contain in two ways. +The first way is through a subscript operator that takes an instance of an +[code]#id# class which has the same dimensionality as the accessor. +The second way is by passing a single [code]#size_t# value to multiple +consecutive subscript operators as specified in <>. In all these cases, the reference to the contained element is of type [code]#const DataT&# for read-only accessors and of type [code]#DataT&# for other accessors. Accessors of all types have a range that defines the set of indices that may be -used to access elements. For buffer accessors, this is the range of the -underlying buffer, unless it is a <> in which case the range -comes from the accessor's constructor. For image accessors, this is the range -of the underlying image. Local accessors specify the range when the accessor -is constructed. Any attempt to access an element via an index that is outside -of this range produces undefined behavior. +used to access elements. +For buffer accessors, this is the range of the underlying buffer, unless it is a +<> in which case the range comes from the accessor's +constructor. +For image accessors, this is the range of the underlying image. +Local accessors specify the range when the accessor is constructed. +Any attempt to access an element via an index that is outside of this range +produces undefined behavior. ==== Container interface Accessors of type [code]#accessor#, [code]#host_accessor#, and [code]#local_accessor# meet the {cpp} requirement of -[code]#ReversibleContainer#. The exception to this is that only -[code]#local_accessor# owns the underlying data, meaning that its destructor -destroys elements and frees the memory. The [code]#accessor# and -[code]#host_accessor# types don't destroy any elements or free the memory on -destruction. The iterator for the container interface meets the {cpp} -requirement of [code]#LegacyRandomAccessIterator# and the underlying -pointers/references correspond to the address space specified by the accessor -type. For multidimensional accessors the iterator linearizes the data -according to <>. +[code]#ReversibleContainer#. +The exception to this is that only [code]#local_accessor# owns the underlying +data, meaning that its destructor destroys elements and frees the memory. +The [code]#accessor# and [code]#host_accessor# types don't destroy any elements +or free the memory on destruction. +The iterator for the container interface meets the {cpp} requirement of +[code]#LegacyRandomAccessIterator# and the underlying pointers/references +correspond to the address space specified by the accessor type. +For multidimensional accessors the iterator linearizes the data according to +<>. [[sec:accessors.ranged]] @@ -6523,56 +6465,57 @@ according to <>. Accessors of type [code]#accessor# and [code]#host_accessor# can be constructed from a sub-range of a [code]#buffer# by providing a range and offset to the -constructor. This limits the elements that can be accessed to the specified -sub-range, which allows the implementation to perform certain optimizations such -as reducing the amount of memory that needs to be copied to or from a device. +constructor. +This limits the elements that can be accessed to the specified sub-range, which +allows the implementation to perform certain optimizations such as reducing the +amount of memory that needs to be copied to or from a device. If the ranged accessor is multi-dimensional, the sub-range is allowed to -describe a region of memory in the underlying buffer that is not contiguous -in the linear address space. It is also legal to construct several ranged -accessors for the same underlying buffer, either overlapping or -non-overlapping. +describe a region of memory in the underlying buffer that is not contiguous in +the linear address space. +It is also legal to construct several ranged accessors for the same underlying +buffer, either overlapping or non-overlapping. A ranged accessor still creates a requisite for the entire underlying buffer, -even for the portions not within the range. For example, if one command writes -through a ranged accessor to one region of a buffer and a second command reads -through a ranged accessor from a non-overlapping region of the same buffer, the -second command must still be scheduled after the first because the requisites -for the two commands are on the entire buffer, not on the sub-ranges of the -ranged accessors. +even for the portions not within the range. +For example, if one command writes through a ranged accessor to one region of a +buffer and a second command reads through a ranged accessor from a +non-overlapping region of the same buffer, the second command must still be +scheduled after the first because the requisites for the two commands are on the +entire buffer, not on the sub-ranges of the ranged accessors. Most of the accessor member functions which provide a reference to the -underlying buffer elements are affected by a ranged accessor's offset and -range. For example, calling [code]#operator[](0)# on a one-dimensional ranged -accessor returns a reference to the element at the position specified by the -accessor's offset, which is not necessarily the first element in the buffer. +underlying buffer elements are affected by a ranged accessor's offset and range. +For example, calling [code]#operator[](0)# on a one-dimensional ranged accessor +returns a reference to the element at the position specified by the accessor's +offset, which is not necessarily the first element in the buffer. In addition, the accessor's iterator functions iterate only over the elements that are within the sub-range. -The only exceptions are the [code]#get_pointer# and [code]#get_multi_ptr# -member functions, which return a pointer to the beginning of the underlying -buffer regardless of the accessor's offset. Applications using these functions -must take care to manually add the offset before dereferencing the pointer -because accessing an element that is outside of the accessor's range results -in undefined behavior. +The only exceptions are the [code]#get_pointer# and [code]#get_multi_ptr# member +functions, which return a pointer to the beginning of the underlying buffer +regardless of the accessor's offset. +Applications using these functions must take care to manually add the offset +before dereferencing the pointer because accessing an element that is outside of +the accessor's range results in undefined behavior. [NOTE] ==== There is no change in behavior for ranged accessors with a range of zero. -It still creates a requisite for the entire underlying buffer, and -an attempt to access an element produces undefined behaviour. +It still creates a requisite for the entire underlying buffer, and an attempt to +access an element produces undefined behaviour. ==== ==== Buffer accessor for commands The [code]#accessor# class provides access to data in a [code]#buffer# from -within a <> or from within a <>. When used in -a <>, it accesses the contents of the buffer via the -device's <>. These two forms of the accessor are distinguished -by the [code]#AccessTarget# template parameter as shown in -<>. Both forms support the -following values for the [code]#AccessMode# template parameter: -[code]#access_mode::read#, [code]#access_mode::write# and +within a <> or from within a <>. +When used in a <>, it accesses the contents of the buffer +via the device's <>. +These two forms of the accessor are distinguished by the [code]#AccessTarget# +template parameter as shown in <>. +Both forms support the following values for the [code]#AccessMode# template +parameter: [code]#access_mode::read#, [code]#access_mode::write# and [code]#access_mode::read_write#. [[table.accessors.command.buffer.capabilities]] @@ -6596,19 +6539,21 @@ use the [code]#accessor# from a <> result in undefined behavior. The dimensionality of the accessor must match the underlying buffer, however, -there is a special case if the buffer is one-dimensional. In this case, the -accessor may either be one-dimensional or it may be zero-dimensional. A -zero-dimensional accessor has access to just the first element of the buffer, +there is a special case if the buffer is one-dimensional. +In this case, the accessor may either be one-dimensional or it may be +zero-dimensional. +A zero-dimensional accessor has access to just the first element of the buffer, whereas a one-dimensional accessor has access to the entire buffer. -Certain [code]#accessor# constructors create a "placeholder" accessor. Such -an accessor is bound to a [code]#buffer# and its semantics such as access -target and access mode are defined. However, a placeholder accessor is not -yet bound to a <>. Before such an accessor can be used in a -<>, it must be bound by calling [code]#handler::require()#. Passing a -placeholder accessor as an argument to a <> without first being bound -to a <> with [code]#handler::require()# will result in undefined -behavior. +Certain [code]#accessor# constructors create a "placeholder" accessor. +Such an accessor is bound to a [code]#buffer# and its semantics such as access +target and access mode are defined. +However, a placeholder accessor is not yet bound to a <>. +Before such an accessor can be used in a <>, it must be bound by +calling [code]#handler::require()#. +Passing a placeholder accessor as an argument to a <> without first +being bound to a <> with [code]#handler::require()# will result +in undefined behavior. [NOTE] ==== @@ -6623,23 +6568,24 @@ passed as an argument to or is used inside a <>. A synopsis of the [code]#accessor# class is provided below, showing the interface when it is specialized with [code]#target::device# or -[code]#target::host_task#. Since some of the class types and member functions -have the same name and meaning as other accessors, the common types and -functions are described in <>. The member types -are listed in <> and -<>. The constructors are listed in -<>, and the member functions are -listed in <> and +[code]#target::host_task#. +Since some of the class types and member functions have the same name and +meaning as other accessors, the common types and functions are described in +<>. +The member types are listed in <> and +<>. +The constructors are listed in <>, +and the member functions are listed in <> and <>. The additional common special member functions and common member functions are listed in <> in <> and -<>, respectively. For valid implicit -conversions between accessor types refer to -<>. Additionally, accessors of the -same type must be equality comparable both in the host application and also in -<>. +<>, respectively. +For valid implicit conversions between accessor types refer to +<>. +Additionally, accessors of the same type must be equality comparable both in the +host application and also in <>. [source,,linenums] ---- @@ -7041,9 +6987,10 @@ This function may only be called from within a <>. ===== Deduction tags for buffer command accessors Some [code]#accessor# constructors take a [code]#TagT# parameter, which is used -to deduce template arguments. The permissible values for this parameter are -listed in <> along with the access mode and -accessor target that they imply. +to deduce template arguments. +The permissible values for this parameter are listed in +<> along with the access mode and accessor +target that they imply. [[table.accessors.command.buffer.tags]] .Enumeration of tags available for [code]#accessor# construction @@ -7075,10 +7022,10 @@ accessor target that they imply. ===== Read only buffer command accessors and implicit conversions <> shows the specializations of -[code]#accessor# with [code]#target::device# or -[code]#target::host_task# that are read-only accessors. There is an implicit -conversion between any of these specializations, provided that all other -template parameters are the same. +[code]#accessor# with [code]#target::device# or [code]#target::host_task# that +are read-only accessors. +There is an implicit conversion between any of these specializations, provided +that all other template parameters are the same. [[table.accessors.command.buffer.read-only]] .Specializations of [code]#accessor# that are read-only @@ -7089,8 +7036,8 @@ template parameters are the same. | const-qualified | [code]#access_mode::read# |==== -There is also an implicit conversion from the read-write specialization shown -in <> to any of the read-only +There is also an implicit conversion from the read-write specialization shown in +<> to any of the read-only specializations shown in <>, provided that all other template parameters are the same. @@ -7111,7 +7058,8 @@ removed from a future version of the specification. ====== Aliased names -The enumerated value [code]#target::global_buffer# is an alias for [code]#target:::device#. +The enumerated value [code]#target::global_buffer# is an alias for +[code]#target:::device#. It has the same type and value as its alias. The enumerated type [code]#access::target# is an alias for [code]#target#, and @@ -7135,8 +7083,8 @@ constructed with the property [code]#property::no_init#. The [code]#accessor# template parameter [code]#IsPlaceholder# is allowed to be specified, but it has no bearing on whether the [code]#accessor# instance is a -placeholder. This is determined solely by the constructor used to create the -instance. +placeholder. +This is determined solely by the constructor used to create the instance. The associated type [code]#access::placeholder# is also deprecated. @@ -7172,31 +7120,34 @@ size_t get_count() const The [code]#accessor# class may be specialized with target [code]#target::constant_buffer#, which results in an accessor that can be used -within a <> to access the contents of a buffer through -the device's <>. +within a <> to access the contents of a buffer through the +device's <>. As with other [code]#accessor# specializations, the dimensionality must match the underlying buffer, however there is a special case if the buffer is -one-dimensional. In this case, the accessor may either be one-dimensional or -it may be zero-dimensional. A zero-dimensional accessor has access to just the -first element of the buffer, whereas a one-dimensional accessor has access to the -entire buffer. +one-dimensional. +In this case, the accessor may either be one-dimensional or it may be +zero-dimensional. +A zero-dimensional accessor has access to just the first element of the buffer, +whereas a one-dimensional accessor has access to the entire buffer. This specialization of [code]#accessor# is available only for the access mode [code]#access_mode::read#. -This accessor type can be constructed as a "placeholder" accessor. As with -other [code]#accessor# specializations that are placeholders, +This accessor type can be constructed as a "placeholder" accessor. +As with other [code]#accessor# specializations that are placeholders, [code]#handler::require()# must be called before passing a placeholder accessor -to a <>. Passing a placeholder accessor as an argument to a -<> without first being bound to a <> with -[code]#handler::require()# will result in undefined behavior. +to a <>. +Passing a placeholder accessor as an argument to a <> without first +being bound to a <> with [code]#handler::require()# will result +in undefined behavior. A synopsis for this specialization of [code]#accessor# is provided below. Since some of the class types and member functions have the same name and meaning as other accessors, the common types and functions are described in -<>. The member types are listed in -<>. The constructors are listed in +<>. +The member types are listed in <>. +The constructors are listed in <>, and the member functions are listed in <> and <>. @@ -7204,8 +7155,8 @@ are listed in <> and The additional common special member functions and common member functions are listed in <> in <> and -<>, respectively. Additionally, -accessors of the same type must be equality comparable. +<>, respectively. +Additionally, accessors of the same type must be equality comparable. [source,,linenums] ---- @@ -7391,35 +7342,37 @@ This function may only be called from within a <>. The [code]#accessor# class may be specialized with target [code]#target::host_buffer#, which results in a host accessor similar to -[code]#host_accessor#. This specialization provides access to data in a -[code]#buffer# from host code that is outside of a <>, and -constructors of this specialization block until the requested data is available -on the host. +[code]#host_accessor#. +This specialization provides access to data in a [code]#buffer# from host code +that is outside of a <>, and constructors of this specialization block +until the requested data is available on the host. As with other [code]#accessor# specializations, the dimensionality must match the underlying buffer, however there is a special case if the buffer is -one-dimensional. In this case, the accessor may either be one-dimensional or -it may be zero-dimensional. A zero-dimensional accessor has access to just the -first element of the buffer, whereas a one-dimensional accessor has access to the -entire buffer. +one-dimensional. +In this case, the accessor may either be one-dimensional or it may be +zero-dimensional. +A zero-dimensional accessor has access to just the first element of the buffer, +whereas a one-dimensional accessor has access to the entire buffer. -This specialization of [code]#accessor# is available for all access modes -except for [code]#access_mode::atomic#. +This specialization of [code]#accessor# is available for all access modes except +for [code]#access_mode::atomic#. A synopsis for this specialization of [code]#accessor# is provided below. Since some of the class types and member functions have the same name and meaning as other accessors, the common types and functions are described in -<>. The member types are listed in -<>. The constructors are listed in -<>, and the member functions are -listed in <> and +<>. +The member types are listed in <>. +The constructors are listed in <>, +and the member functions are listed in +<> and <>. The additional common special member functions and common member functions are listed in <> in <> and -<>, respectively. Additionally, -accessors of the same type must be equality comparable. +<>, respectively. +Additionally, accessors of the same type must be equality comparable. [source,,linenums] ---- @@ -7535,9 +7488,9 @@ std::add_pointer_t get_pointer() const noexcept ====== Accessor specialization with [code]#target::local# -The [code]#accessor# class may be specialized with target -[code]#target::local#, which results in a local accessor that has the same -semantics and restrictions as [code]#local_accessor#. +The [code]#accessor# class may be specialized with target [code]#target::local#, +which results in a local accessor that has the same semantics and restrictions +as [code]#local_accessor#. This specialization of [code]#accessor# is only available for access modes [code]#access_mode::read_write# and [code]#access_mode::atomic#. @@ -7545,17 +7498,18 @@ This specialization of [code]#accessor# is only available for access modes A synopsis for this specialization of [code]#accessor# is provided below. Since some of the class types and member functions have the same name and meaning as other accessors, the common types and functions are described in -<>. The member types are listed in -<>. The constructors are listed in -<>, and the member functions -are listed in <> and +<>. +The member types are listed in <>. +The constructors are listed in +<>, and the member functions are +listed in <> and <>. The additional common special member functions and common member functions are listed in <> in <> and -<>, respectively. Additionally, -accessors of the same type must be equality comparable. +<>, respectively. +Additionally, accessors of the same type must be equality comparable. [source,,linenums] ---- @@ -7665,9 +7619,10 @@ This function may only be called from within a <>. Specializations of the [code]#accessor# class with [code]#target::constant_buffer#, [code]#target::host_buffer# and [code]#target::local# have many member types and member functions with the same -name and meaning. <> describes these -common types and <> describes the -common member functions. +name and meaning. +<> describes these common types and +<> describes the common member +functions. [[table.accessors.deprecated.common.types]] @@ -7838,11 +7793,11 @@ When [code]#AccessTarget# is [code]#target::local# or ====== Accessor specialization with [code]#access_mode::atomic# -The [code]#accessor# class may be specialized with target -[code]#target::device# and access mode [code]#access_mode::atomic#. +The [code]#accessor# class may be specialized with target [code]#target::device# +and access mode [code]#access_mode::atomic#. This specialization provides additional member functions beyond those that are -provided for other [code]#target::device# specializations as described -in <>. +provided for other [code]#target::device# specializations as described in +<>. [[table.accessors.deprecated.atomic.members]] @@ -7900,16 +7855,17 @@ the accessor's offset to [code]#index#. ==== Buffer accessor for host code -The [code]#host_accessor# class provides access to data in a [code]#buffer# -from host code that is outside of a <> (i.e. do not use this class to -access a buffer inside a host task). +The [code]#host_accessor# class provides access to data in a [code]#buffer# from +host code that is outside of a <> (i.e. do not use this class to access +a buffer inside a host task). -As with [code]#accessor#, the dimensionality of [code]#host_accessor# must -match the underlying buffer, however, there is a special case if the buffer is -one-dimensional. In this case, the accessor may either be one-dimensional or -it may be zero-dimensional. A zero-dimensional accessor has access to just the -first element of the buffer, whereas a one-dimensional accessor has access to the -entire buffer. +As with [code]#accessor#, the dimensionality of [code]#host_accessor# must match +the underlying buffer, however, there is a special case if the buffer is +one-dimensional. +In this case, the accessor may either be one-dimensional or it may be +zero-dimensional. +A zero-dimensional accessor has access to just the first element of the buffer, +whereas a one-dimensional accessor has access to the entire buffer. The [code]#host_accessor# class supports the following access modes: [code]#access_mode::read#, [code]#access_mode::write# and @@ -7918,22 +7874,22 @@ The [code]#host_accessor# class supports the following access modes: ===== Interface for buffer host accessors -A synopsis of the [code]#host_accessor# class is provided below. Since some of -the class types and member functions have the same name and meaning as other -accessors, the common types and functions are described in -<>. The member types are listed in -<>. -The constructors are listed in <>, -and the member functions are listed in <> and +A synopsis of the [code]#host_accessor# class is provided below. +Since some of the class types and member functions have the same name and +meaning as other accessors, the common types and functions are described in +<>. +The member types are listed in <>. +The constructors are listed in <>, and +the member functions are listed in <> and <>. The additional common special member functions and common member functions are listed in <> in <> and -<>, respectively. For valid implicit -conversions between accessor types refer to -<>. Additionally, accessors of the same -type must be equality comparable. +<>, respectively. +For valid implicit conversions between accessor types refer to +<>. +Additionally, accessors of the same type must be equality comparable. [source,,linenums] ---- @@ -8141,9 +8097,9 @@ Assignment to the single element that is accessed by this accessor. ===== Deduction tags for buffer host accessors Some [code]#host_accessor# constructors take a [code]#TagT# parameter, which is -used to deduce template arguments. The permissible values for this parameter -are listed in <> along with the access mode -that they imply. +used to deduce template arguments. +The permissible values for this parameter are listed in +<> along with the access mode that they imply. [[table.accessors.host.buffer.tags]] .Enumeration of tags available for [code]#host_accessor# construction @@ -8163,9 +8119,9 @@ that they imply. ===== Read only buffer host accessors and implicit conversions <> shows the specializations of -[code]#host_accessor# that are read-only accessors. There is an implicit -conversion between any of these specializations, provided that all other -template parameters are the same. +[code]#host_accessor# that are read-only accessors. +There is an implicit conversion between any of these specializations, provided +that all other template parameters are the same. [[table.accessors.host.buffer.read-only]] .Specializations of [code]#host_accessor# that are read-only @@ -8194,42 +8150,46 @@ template parameters are the same. ==== Local accessor The [code]#local_accessor# class allocates device local memory and provides -access to this memory from within a <>. The -<> that is allocated is shared between all -<> of a <>. If multiple work-groups execute -simultaneously in an implementation, each work-group receives its own -independent copy of the allocated local memory. +access to this memory from within a <>. +The <> that is allocated is shared between all +<> of a <>. +If multiple work-groups execute simultaneously in an implementation, each +work-group receives its own independent copy of the allocated local memory. The underlying [code]#DataT# type can be any {cpp} type that the device -supports. If [code]#DataT# is an implicit-lifetime type (as defined in the -{cpp} core language), the local accessor implicitly creates objects of that -type with indeterminate values. For other types, the local accessor merely -allocates uninitialized memory, and the application is responsible for -constructing objects in that memory (e.g. by calling placement-new). +supports. +If [code]#DataT# is an implicit-lifetime type (as defined in the {cpp} core +language), the local accessor implicitly creates objects of that type with +indeterminate values. +For other types, the local accessor merely allocates uninitialized memory, and +the application is responsible for constructing objects in that memory (e.g. by +calling placement-new). A local accessor must not be used in a <> that is invoked via [code]#single_task# or via the simple form of [code]#parallel_for# that -takes a [code]#range# parameter. In these cases submitting the kernel to -a queue must throw a synchronous [code]#exception# with the -[code]#errc::kernel_argument# error code. +takes a [code]#range# parameter. +In these cases submitting the kernel to a queue must throw a synchronous +[code]#exception# with the [code]#errc::kernel_argument# error code. ===== Interface for local accessors -A synopsis of the [code]#local_accessor# class is provided below. Since some -of the class types and member functions have the same name and meaning as other -accessors, the common types and functions are described in -<>. The member types are listed in -<> and <>. -The constructors are listed in <>, -and the member functions are listed in <> and +A synopsis of the [code]#local_accessor# class is provided below. +Since some of the class types and member functions have the same name and +meaning as other accessors, the common types and functions are described in +<>. +The member types are listed in <> and +<>. +The constructors are listed in <>, and the +member functions are listed in <> and <>. The additional common special member functions and common member functions are listed in <> in <> and -<>, respectively. For valid implicit -conversions between accessor types refer to <>. +<>, respectively. +For valid implicit conversions between accessor types refer to +<>. Additionally, accessors of the same type must be equality comparable. [source,,linenums] @@ -8376,10 +8336,12 @@ This function may only be called from within a <>. Since [code]#local_accessor# has no template parameter for the access mode, the only specialization for a read-only local accessor is by providing a -[code]#const# qualified [code]#DataT# parameter. Specializations with a -non-[code]#const# qualified [code]#DataT# parameter are read-write. There is -an implicit conversion from the read-write specialization to the read-only -specialization, provided that all other template parameters are the same. +[code]#const# qualified [code]#DataT# parameter. +Specializations with a non-[code]#const# qualified [code]#DataT# parameter are +read-write. +There is an implicit conversion from the read-write specialization to the +read-only specialization, provided that all other template parameters are the +same. [[sec:accessor.common.members]] @@ -8740,50 +8702,52 @@ called from within a <>. There are two classes which implement accessors for unsampled images, [code]#unsampled_image_accessor# and [code]#host_unsampled_image_accessor#. -The former provides access from within a <> or from -within a <>. The latter provides access from host code that is -outside of a <>. +The former provides access from within a <> or from within +a <>. +The latter provides access from host code that is outside of a <>. The dimensionality of an unsampled image accessor must match the dimensionality -of the underlying image to which it provides access. Both unsampled image -accessor classes support the [code]#access_mode::read# and -[code]#access_mode::write# access modes. In addition, the -[code]#host_unsampled_image_accessor# class supports +of the underlying image to which it provides access. +Both unsampled image accessor classes support the [code]#access_mode::read# and +[code]#access_mode::write# access modes. +In addition, the [code]#host_unsampled_image_accessor# class supports [code]#access_mode::read_write#. The [code]#AccessTarget# template parameter dictates how the -[code]#unsampled_image_accessor# can be used: [code]#image_target::device# -means the accessor can be used in a <> while +[code]#unsampled_image_accessor# can be used: [code]#image_target::device# means +the accessor can be used in a <> while [code]#image_target::host_task# means the accessor can be used in a -<>. Programs which specify this template parameter as -[code]#image_target::device# and then use the [code]#unsampled_image_accessor# -from a <> are ill formed. Likewise, programs which specify this -template parameter as [code]#image_target::host_task# and then use the -[code]#unsampled_image_accessor# from a <> are ill +<>. +Programs which specify this template parameter as [code]#image_target::device# +and then use the [code]#unsampled_image_accessor# from a <> are ill formed. +Likewise, programs which specify this template parameter as +[code]#image_target::host_task# and then use the +[code]#unsampled_image_accessor# from a <> are ill formed. ===== Interface for unsampled image accessors -A synopsis of the two unsampled image accessor classes is provided below. Both -classes have member types with the same name, which are described in -<>. The constructors for the two -classes are described in <> and -<>. Both classes also have -member functions with the same name, which are described in -<>. +A synopsis of the two unsampled image accessor classes is provided below. +Both classes have member types with the same name, which are described in +<>. +The constructors for the two classes are described in +<> and +<>. +Both classes also have member functions with the same name, which are described +in <>. The additional common special member functions and common member functions are listed in <> in <> and -<>, respectively. For valid implicit -conversions between unsampled accessor types refer to +<>, respectively. +For valid implicit conversions between unsampled accessor types refer to <>. Two [code]#unsampled_image_accessor# objects of the same type must be equality -comparable in both the host code and in SYCL kernel functions. Two -[code]#host_unsampled_image_accessor# objects of the same type must be equality -comparable in the host code. +comparable in both the host code and in SYCL kernel functions. +Two [code]#host_unsampled_image_accessor# objects of the same type must be +equality comparable in the host code. [source,,linenums] ---- @@ -8932,47 +8896,50 @@ parameters are the same. There are two classes which implement accessors for sampled images, [code]#sampled_image_accessor# and [code]#host_sampled_image_accessor#. -The former provides access from within a <> or from -within a <>. The latter provides access from host code that is -outside of a <>. +The former provides access from within a <> or from within +a <>. +The latter provides access from host code that is outside of a <>. -The dimensionality of a sampled image accessor must match the dimensionality -of the underlying image to which it provides access. Sampled image accessors -are always read-only. +The dimensionality of a sampled image accessor must match the dimensionality of +the underlying image to which it provides access. +Sampled image accessors are always read-only. The [code]#AccessTarget# template parameter dictates how the [code]#sampled_image_accessor# can be used: [code]#image_target::device# means the accessor can be used in a <> while [code]#image_target::host_task# means the accessor can be used in a -<>. Programs which specify this template parameter as -[code]#image_target::device# and then use the [code]#sampled_image_accessor# -from a <> are ill formed. Likewise, programs which specify this -template parameter as [code]#image_target::host_task# and then use the -[code]#sampled_image_accessor# from a <> are ill formed. +<>. +Programs which specify this template parameter as [code]#image_target::device# +and then use the [code]#sampled_image_accessor# from a <> are ill +formed. +Likewise, programs which specify this template parameter as +[code]#image_target::host_task# and then use the [code]#sampled_image_accessor# +from a <> are ill formed. ===== Interface for sampled image accessors -A synopsis of the two sampled image accessor classes is provided below. Both -classes have member types with the same name, which are described in -<>. The constructors for the two -classes are described in <> and -<>. Both classes also have -member functions with the same name, which are described in -<>. +A synopsis of the two sampled image accessor classes is provided below. +Both classes have member types with the same name, which are described in +<>. +The constructors for the two classes are described in +<> and +<>. +Both classes also have member functions with the same name, which are described +in <>. The additional common special member functions and common member functions are listed in <> in <> and -<>, respectively. For valid implicit -conversions between sampled accessor types refer to +<>, respectively. +For valid implicit conversions between sampled accessor types refer to <>. Two [code]#sampled_image_accessor# objects of the same type must be equality -comparable in both the host code and in SYCL kernel functions. Two -[code]#host_sampled_image_accessor# objects of the same type must be equality -comparable in the host code. +comparable in both the host code and in SYCL kernel functions. +Two [code]#host_sampled_image_accessor# objects of the same type must be +equality comparable in the host code. [source,,linenums] ---- @@ -9088,9 +9055,10 @@ within a <>. ===== Read only sampled image accessors and implicit conversions All specializations of sampled image accessors are read-only regardless of -whether [code]#DataT# is [code]#const# qualified. There is an implicit -conversion between the [code]#const# qualified and non-[code]#const# qualified -specializations, provided that all other template parameters are the same. +whether [code]#DataT# is [code]#const# qualified. +There is an implicit conversion between the [code]#const# qualified and +non-[code]#const# qualified specializations, provided that all other template +parameters are the same. // %%%%%%%%%%%%%%%%%%%%%%%%%%%% end accessors %%%%%%%%%%%%%%%%%%%%%%%%%%%% @@ -9099,23 +9067,25 @@ specializations, provided that all other template parameters are the same. === Address space classes In SYCL, there are five different address spaces: global, local, constant, -private and generic. In a SYCL generic implementation, types are not -affected by the address spaces. However, there are situations where users -need to explicitly carry address spaces in the type. For example: - - * For performance tuning and genericness. Even if the platform supports - the representation of the generic address space, this may come at some - performance sacrifice. In order to help the target compiler, it can be - useful to track specifically which address space a pointer is - addressing. - * When linking SYCL kernels with <>-specific functions. In this - case, it might be necessary to specify the address space for any pointer - parameters. +private and generic. +In a SYCL generic implementation, types are not affected by the address spaces. +However, there are situations where users need to explicitly carry address +spaces in the type. +For example: + + * For performance tuning and genericness. + Even if the platform supports the representation of the generic address + space, this may come at some performance sacrifice. + In order to help the target compiler, it can be useful to track specifically + which address space a pointer is addressing. + * When linking SYCL kernels with <>-specific functions. + In this case, it might be necessary to specify the address space for any + pointer parameters. Direct declaration of pointers with address spaces is discouraged as the -definition is implementation-defined. Users must rely on the -[code]#multi_ptr# class to handle address space boundaries and -interoperability. +definition is implementation-defined. +Users must rely on the [code]#multi_ptr# class to handle address space +boundaries and interoperability. [[sec:multiptr]] @@ -9124,47 +9094,52 @@ interoperability. The multi-pointer class is the common interface for the explicit pointer classes, defined in <>. -There are situations where a user may want to make their type address space dependent. -This allows performing generic programming that depends on the address space associated -with their data. An example might be wrapping a pointer inside a class, where -a user may need to template the class according to the address space of the -pointer the class is initialized with. In this case, the [code]#multi_ptr# -class enables users to do this in a portable and stable way. +There are situations where a user may want to make their type address space +dependent. +This allows performing generic programming that depends on the address space +associated with their data. +An example might be wrapping a pointer inside a class, where a user may need to +template the class according to the address space of the pointer the class is +initialized with. +In this case, the [code]#multi_ptr# class enables users to do this in a portable +and stable way. The [code]#multi_ptr# class exposes 3 flavors of the same interface. -If the value of [code]#access::decorated# is [code]#access::decorated::no#, -the interface exposes pointers and references type that are not decorated by an address space. -If the value of [code]#access::decorated# is [code]#access::decorated::yes#, -the interface exposes pointers and references type that are decorated by an address space. -The decoration is implementation dependent and relies on device compiler extensions. +If the value of [code]#access::decorated# is [code]#access::decorated::no#, the +interface exposes pointers and references type that are not decorated by an +address space. +If the value of [code]#access::decorated# is [code]#access::decorated::yes#, the +interface exposes pointers and references type that are decorated by an address +space. +The decoration is implementation dependent and relies on device compiler +extensions. The decorated type may be distinct from the non-decorated one. -For interoperability with the <>, users should rely on types exposed -by the decorated version. -If the value of [code]#access::decorated# is [code]#access::decorated::legacy#, +For interoperability with the <>, users should rely on types exposed by +the decorated version. +If the value of [code]#access::decorated# is [code]#access::decorated::legacy#, the 1.2.1 interface is exposed. This interface is deprecated. The template traits [code]#remove_decoration# and type alias -[code]#remove_decoration_t# retrieve the non-decorated pointer or -reference from a decorated one. Using this template trait with a -non-decorated type is safe and returns the same type. +[code]#remove_decoration_t# retrieve the non-decorated pointer or reference from +a decorated one. +Using this template trait with a non-decorated type is safe and returns the same +type. -It is possible to use the [code]#void# type for the [code]#multi_ptr# -class, but in that case some functionality is disabled. +It is possible to use the [code]#void# type for the [code]#multi_ptr# class, but +in that case some functionality is disabled. [code]#multi_ptr# does not provide the [code]#reference# or -[code]#const_reference# types, the access operators -([code]#operator*()#, [code]#+operator->()+#), the arithmetic -operators or [code]#prefetch# member function. -Conversions from [code]#multi_ptr# to [code]#multi_ptr# of the -same address space are allowed, and will occur implicitly. -Conversions from [code]#multi_ptr# to any other -[code]#multi_ptr# type of the same address space -are allowed, but must be explicit. +[code]#const_reference# types, the access operators ([code]#operator*()#, +[code]#+operator->()+#), the arithmetic operators or [code]#prefetch# member +function. +Conversions from [code]#multi_ptr# to [code]#multi_ptr# of the same +address space are allowed, and will occur implicitly. +Conversions from [code]#multi_ptr# to any other [code]#multi_ptr# type of +the same address space are allowed, but must be explicit. The same rules apply to [code]#multi_ptr#. -An overview of the interface provided for the [code]#multi_ptr# class -follows. +An overview of the interface provided for the [code]#multi_ptr# class follows. [source,,linenums] ---- @@ -9828,8 +9803,8 @@ bool operator>=(std::nullptr_t, const multi_ptr& rhs) |==== -The following is the overview of the legacy interface from 1.2.1 provided -for the [code]#multi_ptr# class. +The following is the overview of the legacy interface from 1.2.1 provided for +the [code]#multi_ptr# class. [source,,linenums] ---- @@ -9843,8 +9818,8 @@ include::{header_dir}/multipointerlegacy.h[lines=4..-1] SYCL provides aliases to the [code]#multi_ptr# class template (see <>) for each specialization of [code]#access::address_space#. -A synopsis of the SYCL [code]#multi_ptr# class template -aliases is provided below. +A synopsis of the SYCL [code]#multi_ptr# class template aliases is provided +below. // Interface of the explicit pointer classes [source,,linenums] @@ -9852,9 +9827,8 @@ aliases is provided below. include::{header_dir}/pointer.h[lines=4..-1] ---- -Note that using [code]#global_ptr#, [code]#local_ptr#, -[code]#constant_ptr# or [code]#private_ptr# -without specifying the decoration is deprecated. +Note that using [code]#global_ptr#, [code]#local_ptr#, [code]#constant_ptr# or +[code]#private_ptr# without specifying the decoration is deprecated. The default argument is provided for compatibility with 1.2.1. @@ -9862,8 +9836,8 @@ The default argument is provided for compatibility with 1.2.1. === Image samplers The SYCL [code]#image_sampler# struct contains a configuration for sampling a -[code]#sampled_image#. The members of this struct are defined by the following -tables. +[code]#sampled_image#. +The members of this struct are defined by the following tables. // Interface of the sampler class [source,,linenums] @@ -9974,33 +9948,33 @@ unnormalized [[sec:usm]] == Unified shared memory (USM) -This section describes properties and routines for pointer-based -memory management interfaces in SYCL. These routines augment, rather -than replace, the buffer-based interfaces in SYCL. +This section describes properties and routines for pointer-based memory +management interfaces in SYCL. +These routines augment, rather than replace, the buffer-based interfaces in +SYCL. -Unified Shared Memory (<>) provides a pointer-based alternative to -the buffer programming model. USM enables: +Unified Shared Memory (<>) provides a pointer-based alternative to the +buffer programming model. +USM enables: - * Easier integration into existing code bases by representing allocations - as pointers rather than buffers, with full support for pointer - arithmetic into allocations. + * Easier integration into existing code bases by representing allocations as + pointers rather than buffers, with full support for pointer arithmetic into + allocations. * Fine-grain control over ownership and accessibility of allocations, to optimally choose between performance and programmer convenience. * A simpler programming model, by automatically migrating some allocations between SYCL devices and the host. -To show the differences with the example from <>, the -following source code example shows how shared memory can be used -between host and device: +To show the differences with the example from <>, the following +source code example shows how shared memory can be used between host and device: [source,,linenums] ---- include::{code_dir}/usm_shared.cpp[lines=4..-1] ---- -By comparison, the following source code example uses less capable -device memory, which requires an explicit copy between the device and the -host: +By comparison, the following source code example uses less capable device +memory, which requires an explicit copy between the device and the host: [source,,linenums] ---- include::{code_dir}/usm_device.cpp[lines=4..-1] @@ -10009,22 +9983,22 @@ include::{code_dir}/usm_device.cpp[lines=4..-1] === Unified addressing -Unified Addressing guarantees that all devices will use a unified address -space. Pointer values in the unified address space will always refer to the -same location in memory. The unified address space encompasses the host and -one or more devices. Note that this does not require addresses in the -unified address space to be accessible on all devices, just that pointer -values will be consistent. +Unified Addressing guarantees that all devices will use a unified address space. +Pointer values in the unified address space will always refer to the same +location in memory. +The unified address space encompasses the host and one or more devices. +Note that this does not require addresses in the unified address space to be +accessible on all devices, just that pointer values will be consistent. === Kinds of unified shared memory -<> is a capability that, when available, provides the ability -to create allocations that are visible to both host and device(s). -USM builds upon Unified Addressing to define a shared address space -where pointer values in this space always refer to the same location -in memory. USM defines three types of memory allocations -described in <>. +<> is a capability that, when available, provides the ability to create +allocations that are visible to both host and device(s). +USM builds upon Unified Addressing to define a shared address space where +pointer values in this space always refer to the same location in memory. +USM defines three types of memory allocations described in +<>. [[table.USM.allocation]] .Type of USM allocations @@ -10040,8 +10014,8 @@ described in <>. device |==== -The following [code]#enum# is used to refer to the different types of allocations -inside of a SYCL program: +The following [code]#enum# is used to refer to the different types of +allocations inside of a SYCL program: [source,,linenums] ---- @@ -10060,10 +10034,10 @@ enum class alloc : /* unspecified */ { ---- USM is an optional feature which may not be supported by all devices, and -devices that support USM may not support all types of USM allocation. A SYCL -application can use the [code]#device::has()# function to determine the -level of USM support for a device. See <> in -<> for more details. +devices that support USM may not support all types of USM allocation. +A SYCL application can use the [code]#device::has()# function to determine the +level of USM support for a device. +See <> in <> for more details. The characteristics of USM allocations are summarized in <>. @@ -10086,126 +10060,141 @@ The characteristics of USM allocations are summarized in |==== Each USM allocation has an associated SYCL <>, and any access to that -memory must use the same context. Specifically, any <> -that dereferences a pointer to a USM allocation must be submitted to a -<> that was constructed with the same context that was used to allocate -that memory. The explicit memory operation <> that take USM -pointers have a similar restriction. (See <> for -details.) Violations of these requirements result in undefined behavior. +memory must use the same context. +Specifically, any <> that dereferences a pointer to a USM +allocation must be submitted to a <> that was constructed with the same +context that was used to allocate that memory. +The explicit memory operation <> that take USM pointers have +a similar restriction. +(See <> for details.) Violations of these requirements +result in undefined behavior. [NOTE] ==== There are no similar restrictions for dereferencing a USM pointer in a -<>. This is legal regardless of which <> the host task was -submitted to so long as the USM pointer is accessible on the host. +<>. +This is legal regardless of which <> the host task was submitted to so +long as the USM pointer is accessible on the host. ==== Each type of USM allocation has different rules for where that memory is -accessible. Attempting to dereference a USM pointer on the host or on a device -in violation of these rules results in undefined behavior. Passing a USM -pointer to one of the explicit memory functions where the pointer is not -accessible to the device generally results in undefined behavior. See -<> for the exact rules. +accessible. +Attempting to dereference a USM pointer on the host or on a device in violation +of these rules results in undefined behavior. +Passing a USM pointer to one of the explicit memory functions where the pointer +is not accessible to the device generally results in undefined behavior. +See <> for the exact rules. Device allocations are used for explicitly managing device memory. -Programmers directly allocate device memory and explicitly copy data -between host memory and a device allocation. Device allocations are obtained -through SYCL device USM allocation routines instead of system allocation -routines like [code]#std::malloc# or {cpp} [code]#new#. Device -allocations are not accessible on the host, but the pointer values remain -consistent on account of Unified Addressing. The size of device allocations -will be limited by the amount of memory in a device. Support for device -allocations on a specific device can be queried through +Programmers directly allocate device memory and explicitly copy data between +host memory and a device allocation. +Device allocations are obtained through SYCL device USM allocation routines +instead of system allocation routines like [code]#std::malloc# or {cpp} +[code]#new#. +Device allocations are not accessible on the host, but the pointer values remain +consistent on account of Unified Addressing. +The size of device allocations will be limited by the amount of memory in a +device. +Support for device allocations on a specific device can be queried through [code]#aspect::usm_device_allocations#. Device allocations must be explicitly copied between the host and a device. The member functions to copy and initialize data are found in -<> and <>, and these -functions may be used on device allocations if a device supports +<> and <>, and these functions +may be used on device allocations if a device supports [code]#aspect::usm_device_allocations#. -Host allocations allow devices to directly read and write host memory -inside of a kernel. This can be useful for several reasons, such as when the -overhead of moving a small amount of data is not worth paying over the cost of a -remote access or when the size of a data set exceeds the size of a device's memory. -Host allocations must also be obtained using SYCL routines instead -of system allocation routines. While a device may remotely read and -write a host allocation, the allocation does not migrate to the device - -it remains in host memory. Users should take care to properly synchronize -access to host allocations between host execution and kernels. The total -size of host allocations will be limited by the amount of pinnable-memory -on the host on most systems. Support for host allocations on a specific -device can be queried through [code]#aspect::usm_host_allocations#. -Support for atomic modification of host allocations -on a specific device can be queried through -[code]#aspect::usm_atomic_host_allocations#. - -Shared allocations implicitly share data between the host -and devices. Data may move to where it is being used without the programmer -explicitly informing the runtime. It is up to the runtime and backends -to make sure that a shared allocation is available where it is used. -Shared allocations must also be obtained using SYCL allocation routines -instead of the system allocator. The maximum size of a shared allocation -on a specific device, and the total size of all shared allocations in a -context, are implementation-defined. -Support for shared allocations on a -specific device can be queried through [code]#aspect::usm_shared_allocations#. - -Not all devices may support concurrent access of a shared allocation -with the host. If a device does not support this, -host execution and device code must take turns accessing the allocation, so -the host must not access a shared allocation while a kernel is executing. -Host access to a shared allocation which is also accessed -by an executing kernel on a device that does not support -concurrent access results in undefined behavior. If a device does -support concurrent access, both the host and and the device may atomically -modify the same data inside an allocation. Allocations, or pieces of allocations, -are now free to migrate to different devices in the same context -that also support this capability. Additionally, many devices that support -concurrent access may support a working set of shared allocations -larger than device memory. +Host allocations allow devices to directly read and write host memory inside of +a kernel. +This can be useful for several reasons, such as when the overhead of moving a +small amount of data is not worth paying over the cost of a remote access or +when the size of a data set exceeds the size of a device's memory. +Host allocations must also be obtained using SYCL routines instead of system +allocation routines. +While a device may remotely read and write a host allocation, the allocation +does not migrate to the device - +it remains in host memory. +Users should take care to properly synchronize access to host allocations +between host execution and kernels. +The total size of host allocations will be limited by the amount of +pinnable-memory on the host on most systems. +Support for host allocations on a specific device can be queried through +[code]#aspect::usm_host_allocations#. +Support for atomic modification of host allocations on a specific device can be +queried through [code]#aspect::usm_atomic_host_allocations#. + +Shared allocations implicitly share data between the host and devices. +Data may move to where it is being used without the programmer explicitly +informing the runtime. +It is up to the runtime and backends to make sure that a shared allocation is +available where it is used. +Shared allocations must also be obtained using SYCL allocation routines instead +of the system allocator. +The maximum size of a shared allocation on a specific device, and the total size +of all shared allocations in a context, are implementation-defined. +Support for shared allocations on a specific device can be queried through +[code]#aspect::usm_shared_allocations#. + +Not all devices may support concurrent access of a shared allocation with the +host. +If a device does not support this, host execution and device code must take +turns accessing the allocation, so the host must not access a shared allocation +while a kernel is executing. +Host access to a shared allocation which is also accessed by an executing kernel +on a device that does not support concurrent access results in undefined +behavior. +If a device does support concurrent access, both the host and and the device may +atomically modify the same data inside an allocation. +Allocations, or pieces of allocations, are now free to migrate to different +devices in the same context that also support this capability. +Additionally, many devices that support concurrent access may support a working +set of shared allocations larger than device memory. Users may query whether a device supports concurrent access with atomic modification of shared allocations through the aspect [code]#aspect::usm_atomic_shared_allocations#. See <> in <> for more details. -Performance hints for shared allocations may be specified by the user -by enqueueing [code]#prefetch# operations on a device. These operations -inform the SYCL runtime that the specified shared allocation is -likely to be accessed on the device in the future, and that it is free -to migrate the allocation to the device. +Performance hints for shared allocations may be specified by the user by +enqueueing [code]#prefetch# operations on a device. +These operations inform the SYCL runtime that the specified shared allocation is +likely to be accessed on the device in the future, and that it is free to +migrate the allocation to the device. More about [code]#prefetch# is found in <> and -<>. If a device supports concurrent access to -shared allocations, then [code]#prefetch# operations may be overlapped -with kernel execution. +<>. +If a device supports concurrent access to shared allocations, then +[code]#prefetch# operations may be overlapped with kernel execution. Additionally, users may use the [code]#mem_advise# member function to annotate -shared allocations with [code]#advice#. Valid [code]#advice# is defined by the -device and its associated backend. See <> and -<> for more information. - -In the most capable systems, users do not need to use SYCL USM allocation functions -to create shared allocations. The system allocator ([code]#malloc#/[code]#new#) may -instead be used. Likewise, [code]#std::free# and -[code]#delete# are used instead of [code]#sycl::free#. Note that -host and device allocations are unaffected by this -change and must still be allocated using their respective USM functions in -order to guarantee their behavior. Users may query the device to determine -if system allocations are supported for use on the device, through -[code]#aspect::usm_system_allocations#. +shared allocations with [code]#advice#. +Valid [code]#advice# is defined by the device and its associated backend. +See <> and <> for more +information. + +In the most capable systems, users do not need to use SYCL USM allocation +functions to create shared allocations. +The system allocator ([code]#malloc#/[code]#new#) may instead be used. +Likewise, [code]#std::free# and [code]#delete# are used instead of +[code]#sycl::free#. +Note that host and device allocations are unaffected by this change and must +still be allocated using their respective USM functions in order to guarantee +their behavior. +Users may query the device to determine if system allocations are supported for +use on the device, through [code]#aspect::usm_system_allocations#. === USM allocations -USM provides several allocation functions. These functions accept a -[code]#property_list# parameter, which is provided for future extensibility. +USM provides several allocation functions. +These functions accept a [code]#property_list# parameter, which is provided for +future extensibility. The <> does not yet define any USM allocation properties. -Some of the allocation functions take an explicit alignment parameter. Like -[code]#std::aligned_alloc#, these functions return [code]#nullptr# if the -alignment is not supported by the implementation. Some of the allocation -functions are templated on the allocated type [code]#T# and some are not. The -following table specifies the alignment guarantees for each category. +Some of the allocation functions take an explicit alignment parameter. +Like [code]#std::aligned_alloc#, these functions return [code]#nullptr# if the +alignment is not supported by the implementation. +Some of the allocation functions are templated on the allocated type [code]#T# +and some are not. +The following table specifies the alignment guarantees for each category. [[table.usm.alignment]] .Alignment guarantees of USM allocation functions @@ -10236,27 +10225,26 @@ a@ Pointer is suitably aligned for an object of type [code]#T# or it is aligned ==== {cpp} allocator interface SYCL defines an allocator class named [code]#usm_allocator# that satisfies the -{cpp} named requirement [code]#Allocator#. The [code]#AllocKind# template -parameter can be either [code]#usm::alloc::host# or [code]#usm::alloc::shared#, -causing the allocator to make either host USM allocations or shared USM -allocations. +{cpp} named requirement [code]#Allocator#. +The [code]#AllocKind# template parameter can be either [code]#usm::alloc::host# +or [code]#usm::alloc::shared#, causing the allocator to make either host USM +allocations or shared USM allocations. [NOTE] ==== There is no specialization for [code]#usm::alloc::device# because an -[code]#Allocator# is required to allocate memory that is accessible on the -host. +[code]#Allocator# is required to allocate memory that is accessible on the host. ==== -The [code]#usm_allocator# class has a template argument [code]#Alignment#, -which specifies the minimum alignment for memory that it allocates. This -alignment is used even if the allocator is rebound to a different type. Memory -allocated by this allocator is suitably aligned for objects of its underlying -[code]#value_type# or at the alignment specified by [code]#Alignment#, -whichever is greater. +The [code]#usm_allocator# class has a template argument [code]#Alignment#, which +specifies the minimum alignment for memory that it allocates. +This alignment is used even if the allocator is rebound to a different type. +Memory allocated by this allocator is suitably aligned for objects of its +underlying [code]#value_type# or at the alignment specified by +[code]#Alignment#, whichever is greater. -A synopsis of the [code]#usm_allocator# class is provided below. The -constructors are listed in <>. +A synopsis of the [code]#usm_allocator# class is provided below. +The constructors are listed in <>. [source,,linenums] ---- @@ -10350,17 +10338,20 @@ a@ Simplified constructor form where [code]#syclQueue# provides the ==== Device allocation functions -The functions in <> allocate device USM. On success, -these functions return a pointer to the newly allocated memory, which must -eventually be deallocated with [code]#sycl::free# in order to avoid a memory -leak. If there are not enough resources to allocate the requested memory, -these functions return [code]#nullptr#. +The functions in <> allocate device USM. +On success, these functions return a pointer to the newly allocated memory, +which must eventually be deallocated with [code]#sycl::free# in order to avoid a +memory leak. +If there are not enough resources to allocate the requested memory, these +functions return [code]#nullptr#. When the allocation size is zero bytes ([code]#numBytes# or [code]#count# is zero), these functions behave in a manner consistent with {cpp} -[code]#std::malloc#. The value returned is unspecified in this case, and the -returned pointer may not be used to access storage. If this pointer is not -null, it must be passed to [code]#sycl::free# to avoid a memory leak. +[code]#std::malloc#. +The value returned is unspecified in this case, and the returned pointer may not +be used to access storage. +If this pointer is not null, it must be passed to [code]#sycl::free# to avoid a +memory leak. [[table.usm.device.allocs]] .Device USM Allocation Functions @@ -10482,17 +10473,20 @@ a@ Simplified form where [code]#syclQueue# provides the [code]#device# and ==== Host allocation functions -The functions in <> allocate host USM. On success, -these functions return a pointer to the newly allocated memory, which must -eventually be deallocated with [code]#sycl::free# in order to avoid a memory -leak. If there are not enough resources to allocate the requested memory, -these functions return [code]#nullptr#. +The functions in <> allocate host USM. +On success, these functions return a pointer to the newly allocated memory, +which must eventually be deallocated with [code]#sycl::free# in order to avoid a +memory leak. +If there are not enough resources to allocate the requested memory, these +functions return [code]#nullptr#. When the allocation size is zero bytes ([code]#numBytes# or [code]#count# is zero), these functions behave in a manner consistent with {cpp} -[code]#std::malloc#. The value returned is unspecified in this case, and the -returned pointer may not be used to access storage. If this pointer is not -null, it must be passed to [code]#sycl::free# to avoid a memory leak. +[code]#std::malloc#. +The value returned is unspecified in this case, and the returned pointer may not +be used to access storage. +If this pointer is not null, it must be passed to [code]#sycl::free# to avoid a +memory leak. [[table.usm.host.allocs]] .Host USM Allocation Functions @@ -10589,17 +10583,20 @@ a@ Simplified form where [code]#syclQueue# provides the [code]#context#. ==== Shared allocation functions -The functions in <> allocate shared USM. On success, -these functions return a pointer to the newly allocated memory, which must -eventually be deallocated with [code]#sycl::free# in order to avoid a memory -leak. If there are not enough resources to allocate the requested memory, -these functions return [code]#nullptr#. +The functions in <> allocate shared USM. +On success, these functions return a pointer to the newly allocated memory, +which must eventually be deallocated with [code]#sycl::free# in order to avoid a +memory leak. +If there are not enough resources to allocate the requested memory, these +functions return [code]#nullptr#. When the allocation size is zero bytes ([code]#numBytes# or [code]#count# is zero), these functions behave in a manner consistent with {cpp} -[code]#std::malloc#. The value returned is unspecified in this case, and the -returned pointer may not be used to access storage. If this pointer is not -null, it must be passed to [code]#sycl::free# to avoid a memory leak. +[code]#std::malloc#. +The value returned is unspecified in this case, and the returned pointer may not +be used to access storage. +If this pointer is not null, it must be passed to [code]#sycl::free# to avoid a +memory leak. [[table.usm.shared.allocs]] .Shared USM Allocation Functions @@ -10722,26 +10719,29 @@ a@ Simplified form where [code]#syclQueue# provides the [code]#device# and ==== Parameterized allocation functions The functions in <> take a [code]#kind# parameter that -specifies the type of USM to allocate. When [code]#kind# is -[code]#usm::alloc::device#, then the allocation device must have -[code]#aspect::usm_device_allocations#. When [code]#kind# is -[code]#usm::alloc::host#, at least one device in the allocation context must -have [code]#aspect::usm_host_allocations#. When [code]#kind# is -[code]#usm::alloc::shared#, the allocation device must have -[code]#aspect::usm_shared_allocations#. If these requirements are -violated, the allocation function throws a synchronous [code]#exception# with -the [code]#errc::feature_not_supported# error code. +specifies the type of USM to allocate. +When [code]#kind# is [code]#usm::alloc::device#, then the allocation device must +have [code]#aspect::usm_device_allocations#. +When [code]#kind# is [code]#usm::alloc::host#, at least one device in the +allocation context must have [code]#aspect::usm_host_allocations#. +When [code]#kind# is [code]#usm::alloc::shared#, the allocation device must have +[code]#aspect::usm_shared_allocations#. +If these requirements are violated, the allocation function throws a synchronous +[code]#exception# with the [code]#errc::feature_not_supported# error code. On success, these functions return a pointer to the newly allocated memory, -which must eventually be deallocated with [code]#sycl::free# in order to avoid -a memory leak. If there are not enough resources to allocate the requested -memory, these functions return [code]#nullptr#. +which must eventually be deallocated with [code]#sycl::free# in order to avoid a +memory leak. +If there are not enough resources to allocate the requested memory, these +functions return [code]#nullptr#. When the allocation size is zero bytes ([code]#numBytes# or [code]#count# is zero), these functions behave in a manner consistent with {cpp} -[code]#std::malloc#. The value returned is unspecified in this case, and the -returned pointer may not be used to access storage. If this pointer is not -null, it must be passed to [code]#sycl::free# to avoid a memory leak. +[code]#std::malloc#. +The value returned is unspecified in this case, and the returned pointer may not +be used to access storage. +If this pointer is not null, it must be passed to [code]#sycl::free# to avoid a +memory leak. [[table.usm.param.allocs]] .Parameterized USM Allocation Functions @@ -10885,11 +10885,11 @@ a@ Alternate form where [code]#syclQueue# provides the [code]#context#. === Unified shared memory pointer queries -Since USM pointers look like raw {cpp} pointers, users cannot deduce what kind of -USM allocation a given pointer may be from examining its type. However, two -functions are defined that let users query the type of a USM allocation and, if -applicable, the [code]#device# on which it was allocated. These query functions -are only supported on the host. +Since USM pointers look like raw {cpp} pointers, users cannot deduce what kind +of USM allocation a given pointer may be from examining its type. +However, two functions are defined that let users query the type of a USM +allocation and, if applicable, the [code]#device# on which it was allocated. +These query functions are only supported on the host. [[table.usm.ptr.query]] .USM Pointer Query Functions @@ -10931,15 +10931,13 @@ USM allocation from [code]#syclContext#. [[ranges-identifiers]] === Ranges and index space identifiers -The data parallelism of the SYCL kernel execution model requires -instantiation of a parallel execution over a -range of iteration space coordinates. To achieve this, SYCL exposes types -to define the range of execution and to identify a given execution -instance's point in the iteration space. +The data parallelism of the SYCL kernel execution model requires instantiation +of a parallel execution over a range of iteration space coordinates. +To achieve this, SYCL exposes types to define the range of execution and to +identify a given execution instance's point in the iteration space. -The following types are defined: [code]#range#, -[code]#nd_range#, [code]#id#, [code]#item#, [code]#h_item#, -[code]#nd_item# and [code]#group#. +The following types are defined: [code]#range#, [code]#nd_range#, [code]#id#, +[code]#item#, [code]#h_item#, [code]#nd_item# and [code]#group#. When constructing multi-dimensional ids or ranges from integers, the elements are written such that the right-most element varies fastest in a linearization @@ -11020,23 +11018,20 @@ group [[range-class]] ==== [code]#range# class -[code]#range# -is a 1D, 2D or 3D vector that defines -the iteration domain of either a single work-group in a parallel -dispatch, or the overall Dimensions of the dispatch. It can be -constructed from integers. - -The SYCL [code]#range# class template provides the common by-value -semantics (see <>). - -A synopsis of the SYCL [code]#range# class is provided below. The -constructors, member functions and non-member functions of the SYCL -[code]#range# class are listed in -<>, <> and -<> respectively. The additional common -special member functions and common member functions are listed in -<> in -<> and +[code]#range# is a 1D, 2D or 3D vector that defines the +iteration domain of either a single work-group in a parallel dispatch, or the +overall Dimensions of the dispatch. +It can be constructed from integers. + +The SYCL [code]#range# class template provides the common by-value semantics +(see <>). + +A synopsis of the SYCL [code]#range# class is provided below. +The constructors, member functions and non-member functions of the SYCL +[code]#range# class are listed in <>, +<> and <> respectively. +The additional common special member functions and common member functions are +listed in <> in <> and <> respectively. [source,,linenums] @@ -11270,23 +11265,21 @@ Then return the initial copy of the [code]#range#. include::{header_dir}/ndRange.h[lines=4..-1] ---- -[code]#nd_range# -defines the iteration domain of both -the work-groups and the overall dispatch. To define this the -[code]#nd_range# comprises two ranges: the whole range over which -the kernel is to be executed, and the range of each work -group. +[code]#nd_range# defines the iteration domain of both the +work-groups and the overall dispatch. +To define this the [code]#nd_range# comprises two ranges: the whole range over +which the kernel is to be executed, and the range of each work group. -The SYCL [code]#nd_range# class template provides the common by-value -semantics (see <>). +The SYCL [code]#nd_range# class template provides the common by-value semantics +(see <>). -A synopsis of the SYCL [code]#nd_range# class is provided below. The -constructors and member functions of the SYCL [code]#nd_range# class -are listed in <> and -<> respectively. The additional common special -member functions and common member functions are listed in -<> in <> -and <> respectively. +A synopsis of the SYCL [code]#nd_range# class is provided below. +The constructors and member functions of the SYCL [code]#nd_range# class are +listed in <> and <> +respectively. +The additional common special member functions and common member functions are +listed in <> in <> and +<> respectively. [[table.constructors.ndrange]] @@ -11355,22 +11348,21 @@ id get_offset() const [[id-class]] ==== [code]#id# class -[code]#id# is a vector of Dimensions that is used to -represent an <> into a global or local -[code]#range#. It can be used as an index in an accessor of the -same rank. The subscript operator ([code]#operator[](n)#) returns the component -[code]#n# as a [code]#size_t#. +[code]#id# is a vector of Dimensions that is used to represent +an <> into a global or local [code]#range#. +It can be used as an index in an accessor of the same rank. +The subscript operator ([code]#operator[](n)#) returns the component [code]#n# +as a [code]#size_t#. -The SYCL [code]#id# class template provides the common by-value semantics -(see <>). +The SYCL [code]#id# class template provides the common by-value semantics (see +<>). -A synopsis of the SYCL [code]#id# class is provided below. The -constructors, member functions and non-member functions of the SYCL -[code]#id# class are listed in <>, -<> and <> respectively. The -additional common special member functions and common member functions are -listed in <> in -<> and +A synopsis of the SYCL [code]#id# class is provided below. +The constructors, member functions and non-member functions of the SYCL +[code]#id# class are listed in <>, <> +and <> respectively. +The additional common special member functions and common member functions are +listed in <> in <> and <> respectively. [source,,linenums] @@ -11602,24 +11594,25 @@ Then return the initial copy of the [code]#id#. [[subsec:item.class]] ==== [code]#item# class -<> identifies an instance of the function object -executing at each point in a [code]#range#. It is passed to a -[code]#parallel_for# call or returned by member functions of [code]#h_item#. -It encapsulates enough information to identify the work-item's range -of possible values and its ID in that range. It can optionally carry the offset of the -range if provided to the [code]#parallel_for#; note this is deprecated in SYCL 2020. -Instances of the [code]#item# class are -not user-constructible and are passed by the runtime to each instance -of the function object. - -The SYCL [code]#item# class template provides the common by-value semantics -(see <>). - -A synopsis of the SYCL [code]#item# class is provided below. The member -functions of the SYCL [code]#item# class are listed in -<>. The additional common special member functions -and common member functions are listed in <> in -<> and +<> identifies an instance of the function object executing at each point +in a [code]#range#. +It is passed to a [code]#parallel_for# call or returned by member functions of +[code]#h_item#. +It encapsulates enough information to identify the work-item's range of possible +values and its ID in that range. +It can optionally carry the offset of the range if provided to the +[code]#parallel_for#; note this is deprecated in SYCL 2020. +Instances of the [code]#item# class are not user-constructible and are passed by +the runtime to each instance of the function object. + +The SYCL [code]#item# class template provides the common by-value semantics (see +<>). + +A synopsis of the SYCL [code]#item# class is provided below. +The member functions of the SYCL [code]#item# class are listed in +<>. +The additional common special member functions and common member functions are +listed in <> in <> and <> respectively. // Interface for class: item @@ -11724,21 +11717,23 @@ size_t get_linear_id() const [code]#nd_item# identifies an instance of the function object executing at each point in an [code]#nd_range# passed to a -[code]#parallel_for# call. It encapsulates enough -information to identify the <>'s local and global <>, the -<> and also provides access to the [code]#group# and -[code]#sub_group# classes. Instances of the [code]#nd_item# class are not user-constructible and are passed by the runtime to -each instance of the function object. - -The SYCL [code]#nd_item# class template provides the common by-value -semantics (see <>). - -A synopsis of the SYCL [code]#nd_item# class is provided below. The -member functions of the SYCL [code]#nd_item# class are listed in -<>. The additional common special member -functions and common member functions are listed in -<> in <> -and <> respectively. +[code]#parallel_for# call. +It encapsulates enough information to identify the <>'s local and +global <>, the <> and also provides access to the +[code]#group# and [code]#sub_group# classes. +Instances of the [code]#nd_item# class are not +user-constructible and are passed by the runtime to each instance of the +function object. + +The SYCL [code]#nd_item# class template provides the common by-value semantics +(see <>). + +A synopsis of the SYCL [code]#nd_item# class is provided below. +The member functions of the SYCL [code]#nd_item# class are listed in +<>. +The additional common special member functions and common member functions are +listed in <> in <> and +<> respectively. % interface for nd_item class [source,,linenums] @@ -12037,28 +12032,27 @@ template void wait_for(EventTN... events) const ==== [code]#h_item# class [code]#h_item# identifies an instance of a -[code]#group::parallel_for_work_item# function object executing at each -point in a local [code]#range# passed to a -[code]#parallel_for_work_item# call or to the corresponding -[code]#parallel_for_work_group# call if no [code]#range# is passed -to the [code]#parallel_for_work_item# call. It encapsulates enough -information to identify the <>'s local and global <> -according to the information given to [code]#parallel_for_work_group# -(physical ids) as well as the <>'s logical local <> -in the logical local range. All returned <> objects are -offset-less. Instances of the [code]#h_item# class are -not user-constructible and are passed by the runtime to each instance of the -function object. - -The SYCL [code]#h_item# class template provides the common by-value -semantics (see <>). +[code]#group::parallel_for_work_item# function object executing at each point in +a local [code]#range# passed to a [code]#parallel_for_work_item# +call or to the corresponding [code]#parallel_for_work_group# call if no +[code]#range# is passed to the [code]#parallel_for_work_item# call. +It encapsulates enough information to identify the <>'s local and +global <> according to the information given to +[code]#parallel_for_work_group# (physical ids) as well as the <>'s +logical local <> in the logical local range. +All returned <> objects are offset-less. +Instances of the [code]#h_item# class are not user-constructible +and are passed by the runtime to each instance of the function object. + +The SYCL [code]#h_item# class template provides the common by-value semantics +(see <>). -A synopsis of the SYCL [code]#h_item# class is provided below. The -member functions of the SYCL [code]#h_item# class are listed in -<>. The additional common special member -functions and common member functions are listed in -<> in <> -and <> respectively. +A synopsis of the SYCL [code]#h_item# class is provided below. +The member functions of the SYCL [code]#h_item# class are listed in +<>. +The additional common special member functions and common member functions are +listed in <> in <> and +<> respectively. [source,,linenums] ---- @@ -12230,26 +12224,26 @@ size_t get_physical_local_id(int dimension) const [[group-class]] ==== [code]#group# class -The [code]#group# encapsulates all functionality -required to represent a particular <> within a -parallel execution. It is not user-constructible. +The [code]#group# encapsulates all functionality required to +represent a particular <> within a parallel execution. +It is not user-constructible. -The local range stored in the group class is provided either by -the programmer, when it is passed as an optional parameter to -[code]#parallel_for_work_group#, or by the runtime system when it -selects the optimal work-group size. This allows the developer to -always know how many work-items are in each executing work-group, even through -the abstracted iteration range of the [code]#parallel_for_work_item# loops. +The local range stored in the group class is provided either by the programmer, +when it is passed as an optional parameter to [code]#parallel_for_work_group#, +or by the runtime system when it selects the optimal work-group size. +This allows the developer to always know how many work-items are in each +executing work-group, even through the abstracted iteration range of the +[code]#parallel_for_work_item# loops. -The SYCL [code]#group# class template provides the common by-value -semantics (see <>). +The SYCL [code]#group# class template provides the common by-value semantics +(see <>). -A synopsis of the SYCL [code]#group# class is provided below. The -member functions of the SYCL [code]#group# class are listed in -<>. The additional common special member -functions and common member functions are listed in -<> in <> -and <> respectively. +A synopsis of the SYCL [code]#group# class is provided below. +The member functions of the SYCL [code]#group# class are listed in +<>. +The additional common special member functions and common member functions are +listed in <> in <> and +<> respectively. // Interface for class: group [source,,linenums] @@ -12561,19 +12555,19 @@ template void wait_for(EventTN... events) const [[sub-group-class]] ==== [code]#sub_group# class -The [code]#sub_group# class encapsulates all functionality -required to represent a particular <> within a -parallel execution. It is not user-constructible. +The [code]#sub_group# class encapsulates all functionality required to represent +a particular <> within a parallel execution. +It is not user-constructible. -The SYCL [code]#sub_group# class provides the common by-value -semantics (see <>). +The SYCL [code]#sub_group# class provides the common by-value semantics (see +<>). -A synopsis of the SYCL [code]#sub_group# class is provided below. The -member functions of the SYCL [code]#sub_group# class are listed in -<>. The additional common special member -functions and common member functions are listed in -<> in <> -and <> respectively. +A synopsis of the SYCL [code]#sub_group# class is provided below. +The member functions of the SYCL [code]#sub_group# class are listed in +<>. +The additional common special member functions and common member functions are +listed in <> in <> and +<> respectively. // Interface for class: subgroup [source,,linenums] @@ -12687,12 +12681,12 @@ local id of 0. All functionality related to <> is captured by the [code]#reducer# class and the [code]#reduction# function. -The example below demonstrates how to write a <> -kernel that performs two reductions simultaneously on the same input values, -computing both the sum of all values in a buffer and the maximum value in the -buffer. For each reduction variable passed to [code]#parallel_for#, a -reference to a [code]#reducer# object is passed as a parameter to the kernel -function in the same order. +The example below demonstrates how to write a <> kernel that performs +two reductions simultaneously on the same input values, computing both the sum +of all values in a buffer and the maximum value in the buffer. +For each reduction variable passed to [code]#parallel_for#, a reference to a +[code]#reducer# object is passed as a parameter to the kernel function in the +same order. [source,,linenums] ---- @@ -12700,25 +12694,28 @@ include::{code_dir}/reduction.cpp[lines=4..-1] ---- Reductions are supported for all trivially copyable types (as defined by the -{cpp} core language). If the reduction operator is non-associative or -non-commutative, the behavior of a reduction may be non-deterministic. If -multiple reductions reference the same reduction variable, or a reduction +{cpp} core language). +If the reduction operator is non-associative or non-commutative, the behavior of +a reduction may be non-deterministic. +If multiple reductions reference the same reduction variable, or a reduction variable is accessed directly during the lifetime of a reduction (e.g. via an [code]#accessor# or USM pointer), the behavior is undefined. Some of the overloads for the [code]#reduction# function take an identity value -and some do not. An implementation is required to compute a correct reduction -even when the application does not specify an identity value. However, the -implementation may be more efficient when the identity value is either provided -by the application or is known by the implementation. For reductions using -standard binary operators and fundamental types (e.g. [code]#plus# and -arithmetic types), an implementation can determine the correct identity value -automatically in order to avoid performance penalties. +and some do not. +An implementation is required to compute a correct reduction even when the +application does not specify an identity value. +However, the implementation may be more efficient when the identity value is +either provided by the application or is known by the implementation. +For reductions using standard binary operators and fundamental types (e.g. +[code]#plus# and arithmetic types), an implementation can determine the correct +identity value automatically in order to avoid performance penalties. If an implementation can identify an identity value for a given combination of accumulator type and function object type, the value is defined as a member of -the [code]#known_identity# trait class. Whether this member value exists can -be tested using the [code]#has_known_identity# trait class. +the [code]#known_identity# trait class. +Whether this member value exists can be tested using the +[code]#has_known_identity# trait class. [source,,linenums] ---- @@ -12906,54 +12903,54 @@ a@ |==== The reduction interface is limited to reduction variables whose size can be -determined at compile-time. As such, [code]#buffer# and USM pointer arguments -are interpreted by the reduction interface as describing a single variable. +determined at compile-time. +As such, [code]#buffer# and USM pointer arguments are interpreted by the +reduction interface as describing a single variable. A reduction operation associated with a [code]#span# represents an array -reduction. An array reduction of size _N_ is functionally equivalent to -specifying _N_ independent scalar reductions. The combination operations -performed by an array reduction are limited to the extent of a USM allocation -described by a [code]#span#, and access to elements outside of these regions -results in undefined behavior. +reduction. +An array reduction of size _N_ is functionally equivalent to specifying _N_ +independent scalar reductions. +The combination operations performed by an array reduction are limited to the +extent of a USM allocation described by a [code]#span#, and access to elements +outside of these regions results in undefined behavior. [NOTE] ==== Since a [code]#span# is one-dimensional, there is currently no way to describe -an array reduction with more than one dimension. This is expected to change in -a future version of the SYCL specification, but depends on the introduction of -a multi-dimensional [code]#span#. +an array reduction with more than one dimension. +This is expected to change in a future version of the SYCL specification, but +depends on the introduction of a multi-dimensional [code]#span#. ==== [[reduction-interface]] ==== [code]#reduction# interface -The [code]#reduction# interface is used to attach <> semantics -to a variable, by specifying: the reduction variable, the -reduction operator and an optional identity value associated with the operator. +The [code]#reduction# interface is used to attach <> semantics to a +variable, by specifying: the reduction variable, the reduction operator and an +optional identity value associated with the operator. The overloads of the interface are described in <>. -The return value of the [code]#reduction# interface is an -implementation-defined object of unspecified type, which is interpreted by -[code]#parallel_for# to construct an appropriate [code]#reducer# -type as detailed in <>. +The return value of the [code]#reduction# interface is an implementation-defined +object of unspecified type, which is interpreted by [code]#parallel_for# to +construct an appropriate [code]#reducer# type as detailed in <>. -An implementation may use an unspecified number of temporary variables inside -of any [code]#reducer# objects it creates. If an identity value is supplied to -a reduction, an implementation will use that value to initialize any such -temporary variables. +An implementation may use an unspecified number of temporary variables inside of +any [code]#reducer# objects it creates. +If an identity value is supplied to a reduction, an implementation will use that +value to initialize any such temporary variables. [NOTE] ==== Since the number of temporary variables is unspecified, supplying an identity -value different to the identity value associated with the reduction operator -may lead to unexpected results. +value different to the identity value associated with the reduction operator may +lead to unexpected results. ==== -The initial value of the reduction variable is included -in the reduction operation, unless the [code]#property::reduction::initialize_to_identity# +The initial value of the reduction variable is included in the reduction +operation, unless the [code]#property::reduction::initialize_to_identity# property was specified when the [code]#reduction# interface was invoked. -The reduction variable -is updated so as to contain the result of the reduction when the kernel finishes -execution. +The reduction variable is updated so as to contain the result of the reduction +when the kernel finishes execution. [source,,linenums] ---- @@ -13103,34 +13100,37 @@ property::reduction::initialize_to_identity::initialize_to_identity() The [code]#reducer# class defines the interface between a work-item and a reduction variable during the execution of a SYCL kernel, restricting access to -the underlying reduction variable. The intermediate values of a reduction -variable cannot be inspected during kernel execution, and the variable cannot -be updated using anything other than the reduction's specified combination -operation. The combination order of different reducers is unspecified, as are -when and how the value of each reducer is combined with the original reduction -variable. +the underlying reduction variable. +The intermediate values of a reduction variable cannot be inspected during +kernel execution, and the variable cannot be updated using anything other than +the reduction's specified combination operation. +The combination order of different reducers is unspecified, as are when and how +the value of each reducer is combined with the original reduction variable. To enable compile-time specialization of reduction algorithms, the -implementation of the [code]#reducer# class is unspecified, -except for the functions and operators defined in <> -and <>. As such, developers should not specify the -template arguments of a [code]#reducer# directly, and should instead employ -generic programming techniques that allow kernel functions to accept a -reference to a variable of any [code]#reducer# type. Kernels written as -lambdas should employ [code]#auto&# or [code]#+auto&...+#, and kernels written as -function objects should employ template parameters or template parameter packs. +implementation of the [code]#reducer# class is unspecified, except for the +functions and operators defined in <> and +<>. +As such, developers should not specify the template arguments of a +[code]#reducer# directly, and should instead employ generic programming +techniques that allow kernel functions to accept a reference to a variable of +any [code]#reducer# type. +Kernels written as lambdas should employ [code]#auto&# or [code]#+auto&...+#, +and kernels written as function objects should employ template parameters or +template parameter packs. An implementation must guarantee that it is safe for multiple work-items in a -kernel to call the combine function of a [code]#reducer# concurrently. An -implementation is free to re-use reducer variables (e.g. across work-groups +kernel to call the combine function of a [code]#reducer# concurrently. +An implementation is free to re-use reducer variables (e.g. across work-groups scheduled to the same compute unit) if it can guarantee that it is safe to do so. The type aliases and constant static members of the [code]#reducer# class are listed in <> and its member functions are listed in -<>. Additional shorthand operators may be made -available for certain combinations of reduction variable type and combination -operation, as described in <>. +<>. +Additional shorthand operators may be made available for certain combinations of +reduction variable type and combination operation, as described in +<>. [source,,linenums] ---- @@ -13277,70 +13277,73 @@ reducer& operator++(reducer& accum) [[sec:command.group.scope]] === Command group scope -A <>, as defined in <>, -may execute a single <> such as invoking a kernel, copying memory, -or executing a host task. It is legal for a <> to -statically contain more than one call to a <> function, but any -single execution of the <> may execute no more -than one <>. If an application fails to do this, the function that -submits the <> (i.e., [code]#queue::submit#) -must throw a synchronous [code]#exception# with the [code]#errc::invalid# error -code. The statements that call <> together with -the statements that define the requirements for a kernel form the -<>. The command group -function object takes as a parameter an instance of the <> class which -encapsulates all the member functions executed in the command group scope. -The member functions and objects defined in this scope will define the requirements for the -kernel execution or explicit memory operation, and will be used by the <> -to evaluate if the operation is ready for execution. +A <>, as defined in <>, may execute a +single <> such as invoking a kernel, copying memory, or executing a +host task. +It is legal for a <> to statically contain more than one +call to a <> function, but any single execution of the +<> may execute no more than one <>. +If an application fails to do this, the function that submits the +<> (i.e., [code]#queue::submit#) must throw a +synchronous [code]#exception# with the [code]#errc::invalid# error code. +The statements that call <> together with the statements that +define the requirements for a kernel form the <>. +The command group function object takes as a parameter an instance of the +<> class which encapsulates all the member functions executed in the +command group scope. +The member functions and objects defined in this scope will define the +requirements for the kernel execution or explicit memory operation, and will be +used by the <> to evaluate if the operation is ready for +execution. Host code within a <> (typically setting up requirements) is executed once, before the command group submit call returns. -This abstraction of the kernel -execution unifies the data with its processing, and consequently allows more -abstraction and flexibility in the parallel programming models that can be -implemented on top of SYCL. - -The <> and the [code]#handler# class -serve as an interface for the encapsulation of <>. -A <> is defined as a function object. All the device data accesses are -defined inside this group and any transfers are managed by the <>. The -rules for the data transfers regarding device and -host data accesses are better described in <>, -where buffers (<>) and accessor (<>) classes -are described. The overall memory model of the SYCL application is described in +This abstraction of the kernel execution unifies the data with its processing, +and consequently allows more abstraction and flexibility in the parallel +programming models that can be implemented on top of SYCL. + +The <> and the [code]#handler# class serve as an +interface for the encapsulation of <>. +A <> is defined as a function object. +All the device data accesses are defined inside this group and any transfers are +managed by the <>. +The rules for the data transfers regarding device and host data accesses are +better described in <>, where buffers +(<>) and accessor (<>) classes are described. +The overall memory model of the SYCL application is described in <>. -It is possible for a <> to fail to enqueue to a queue, -or for it to fail to execute correctly. A user can therefore supply a secondary -queue when submitting a command group to the primary queue. If the <> -fails to enqueue or execute a command group on a primary queue, it can attempt -to run the command group on the secondary queue. The circumstances in which it -is, or is not, possible for a <> to fall-back from primary to -secondary queue are unspecified in the specification. Even if a command group -is run on the secondary queue, the requirement that host code within the command group -is executed exactly once remains, regardless of whether the fallback queue is used for -execution. - -The command group [code]#handler# class provides the interface -for all of the member functions that are able to be executed inside the command group -scope, and it is also provided as a scoped object to all of the data access -requests. The <> class provides the interface -in which every command in the command group scope will be submitted to a queue. +It is possible for a <> to fail to enqueue to a +queue, or for it to fail to execute correctly. +A user can therefore supply a secondary queue when submitting a command group to +the primary queue. +If the <> fails to enqueue or execute a command group on a primary +queue, it can attempt to run the command group on the secondary queue. +The circumstances in which it is, or is not, possible for a <> to +fall-back from primary to secondary queue are unspecified in the specification. +Even if a command group is run on the secondary queue, the requirement that host +code within the command group is executed exactly once remains, regardless of +whether the fallback queue is used for execution. + +The command group [code]#handler# class provides the interface for all of the +member functions that are able to be executed inside the command group scope, +and it is also provided as a scoped object to all of the data access requests. +The <> class provides the interface in which every command in the +command group scope will be submitted to a queue. [[sec:handlerClass]] === Command group [code]#handler# class -A <> object can only be constructed by the SYCL -runtime. All of the accessors defined in <> take as a -parameter an instance of the <>, and all the -kernel invocation functions are member functions of this class. +A <> object can only be constructed by the SYCL runtime. +All of the accessors defined in <> take as a parameter an +instance of the <>, and all the kernel invocation functions are member +functions of this class. The constructors of the SYCL [code]#handler# class are described in <>. -It is disallowed for an instance of the SYCL [code]#handler# class to -be moved or copied. +It is disallowed for an instance of the SYCL [code]#handler# class to be moved +or copied. // Interface for class: handler [source,,linenums] @@ -13367,23 +13370,24 @@ handler(___unspecified___) [[sub.section.requirement]] ==== SYCL functions for adding requirements -When an accessor is created from a <>, a *requirement* is -implicitly added to the <> for the accessor's data. However, -this does not happen when creating a [keyword]#placeholder# accessor. In order -to create a *requirement* for a [keyword]#placeholder# accessor, code +When an accessor is created from a <>, a *requirement* is implicitly +added to the <> for the accessor's data. +However, this does not happen when creating a [keyword]#placeholder# accessor. +In order to create a *requirement* for a [keyword]#placeholder# accessor, code must call the [code]#handler::require()# member function. Note that the default constructed [code]#accessor# is not a placeholder, so it may be passed to a <> without calling -[code]#handler::require()#. However, this accessor also has no underlying -memory object, so such an accessor does not create any *requirement* for the -command group, and attempting to access data elements from it produces -undefined behavior. +[code]#handler::require()#. +However, this accessor also has no underlying memory object, so such an accessor +does not create any *requirement* for the command group, and attempting to +access data elements from it produces undefined behavior. SYCL events may also be used to create requirements for a <>. -Such requirements state that the actions represented by the events must -complete before the <> may execute. Such requirements -are added when code calls the [code]#handler::depends_on()# member function. +Such requirements state that the actions represented by the events must complete +before the <> may execute. +Such requirements are added when code calls the [code]#handler::depends_on()# +member function. [[table.members.handler.requirements]] .Member functions of the [code]#handler# class @@ -13433,15 +13437,15 @@ by each event in [code]#depEvents# must complete before executing this [keyword]#data-parallel# <>, <> in <>, or [keyword]#hierarchical parallelism#. -Each function takes an optional kernel name template parameter. The user -may optionally provide a <>, otherwise an implementation-defined name -will be generated for the kernel. +Each function takes an optional kernel name template parameter. +The user may optionally provide a <>, otherwise an +implementation-defined name will be generated for the kernel. All the functions for invoking kernels are member functions of the command group -[code]#handler# class (<>), which -is used to encapsulate all the member functions provided in a command group scope. -<> lists all the members of the -[code]#handler# class related to the kernel invocation. +[code]#handler# class (<>), which is used to encapsulate all +the member functions provided in a command group scope. +<> lists all the members of the [code]#handler# +class related to the kernel invocation. [[table.members.handler.kernel]] @@ -13684,17 +13688,17 @@ associated with the secondary queue (if specified). ===== [code]#single_task# invoke -SYCL provides a simple interface to enqueue a kernel that will be -sequentially executed on a device. Only one instance of the -kernel will be executed. This interface is useful as a primitive for more -complicated parallel algorithms, as it can easily create a chain of -sequential tasks on a SYCL device with each of them managing its -own data transfers. +SYCL provides a simple interface to enqueue a kernel that will be sequentially +executed on a device. +Only one instance of the kernel will be executed. +This interface is useful as a primitive for more complicated parallel +algorithms, as it can easily create a chain of sequential tasks on a SYCL device +with each of them managing its own data transfers. This function can only be called inside a command group using the [code]#handler# object created by the runtime. -Any accessors that are used in a kernel should be defined inside the -same command group. +Any accessors that are used in a kernel should be defined inside the same +command group. Local accessors are disallowed for single task invocations. @@ -13703,13 +13707,12 @@ Local accessors are disallowed for single task invocations. include::{code_dir}/singletask.cpp[lines=4..-1] ---- -For single tasks, the kernel member function takes no parameters, as there -is no need for <> in a unary index space. +For single tasks, the kernel member function takes no parameters, as there is no +need for <> in a unary index space. -A [code]#kernel_handler# can optionally be passed as a parameter -to the <> that is invoked by -[code]#single_task# for the purpose explained -in <>. +A [code]#kernel_handler# can optionally be passed as a parameter to the +<> that is invoked by [code]#single_task# for the purpose +explained in <>. [source,,linenums] ---- @@ -13719,92 +13722,90 @@ include::{code_dir}/singleTaskWithKernelHandler.cpp[lines=4..-1] ===== [code]#parallel_for# invoke -The [code]#parallel_for# member function of the SYCL -[code]#handler# class provides an interface to define and invoke a SYCL -kernel function in a command group, to execute in parallel execution over a -3 dimensional index space. There are three overloads of the -[code]#parallel_for# member function which provide variations of this -interface, each with a different level of complexity and providing a -different set of features. +The [code]#parallel_for# member function of the SYCL [code]#handler# class +provides an interface to define and invoke a SYCL kernel function in a command +group, to execute in parallel execution over a 3 dimensional index space. +There are three overloads of the [code]#parallel_for# member function which +provide variations of this interface, each with a different level of complexity +and providing a different set of features. For the simplest case, users need only provide the global range (the total -number of work-items in the index space) via a SYCL [code]#range# -parameter. In this case the function object that represents the SYCL kernel -function must take one of: -1) a single SYCL [code]#item# parameter, 2) a single generic parameter -([code]#template# parameter or [code]#auto#) that will be treated as -an [code]#item# parameter, 3) any other type -implicitly converted from SYCL [code]#item#, representing the currently -executing work-item within the range specified by the [code]#range# -parameter. +number of work-items in the index space) via a SYCL [code]#range# parameter. +In this case the function object that represents the SYCL kernel function must +take one of: 1) a single SYCL [code]#item# parameter, 2) a single generic +parameter ([code]#template# parameter or [code]#auto#) that will be treated as +an [code]#item# parameter, 3) any other type implicitly converted from SYCL +[code]#item#, representing the currently executing work-item within the range +specified by the [code]#range# parameter. [NOTE] ==== Case 3) above allows the kernel function to take an argument of type [code]#id# -because [code]#item# is implicitly convertible to [code]#id#. It also allows -a 1-D kernel function to take an integral argument (e.g. [code]#int# or -[code]#size_t#) because a 1-D [code]#item# is implicitly convertible to these -types. Finally, it allows the kernel function to take a user-defined argument -type that can be constructed from [code]#item#, enabling users to layer their -own abstractions on top of SYCL. +because [code]#item# is implicitly convertible to [code]#id#. +It also allows a 1-D kernel function to take an integral argument (e.g. +[code]#int# or [code]#size_t#) because a 1-D [code]#item# is implicitly +convertible to these types. +Finally, it allows the kernel function to take a user-defined argument type that +can be constructed from [code]#item#, enabling users to layer their own +abstractions on top of SYCL. ==== -The execution of the kernel function is the same whether the parameter to -the SYCL kernel function is a SYCL [code]#id# or a SYCL -[code]#item#. What differs is the functionality that is available to -the SYCL kernel function via the respective interfaces. +The execution of the kernel function is the same whether the parameter to the +SYCL kernel function is a SYCL [code]#id# or a SYCL [code]#item#. +What differs is the functionality that is available to the SYCL kernel function +via the respective interfaces. -Below is an example of invoking a SYCL kernel function with -[code]#parallel_for# using a lambda function, and passing a SYCL -[code]#id# parameter. In this case, only the global id is available. -This variant of [code]#parallel_for# is designed for when it is not -necessary to query the global range of the index space being executed across. +Below is an example of invoking a SYCL kernel function with [code]#parallel_for# +using a lambda function, and passing a SYCL [code]#id# parameter. +In this case, only the global id is available. +This variant of [code]#parallel_for# is designed for when it is not necessary to +query the global range of the index space being executed across. [source,,linenums] ---- include::{code_dir}/basicparallelfor.cpp[lines=4..-1] ---- -Below is an example of invoking a SYCL kernel function with -[code]#parallel_for# using a lambda function and passing a SYCL -[code]#item# parameter. In this case, both the global id and global -range are queryable. This variant of [code]#parallel_for# is designed -for when it is necessary to query the global range of the index space -being executed across. +Below is an example of invoking a SYCL kernel function with [code]#parallel_for# +using a lambda function and passing a SYCL [code]#item# parameter. +In this case, both the global id and global range are queryable. +This variant of [code]#parallel_for# is designed for when it is necessary to +query the global range of the index space being executed across. [source,,linenums] ---- include::{code_dir}/basicParallelForItem.cpp[lines=4..-1] ---- -Below is an example of invoking a SYCL kernel function with -[code]#parallel_for# using a lambda function and passing -[code]#auto# parameter, treated as [code]#item#. In this case, both -the global id and global range are queryable. The same effect can be -achieved using class with templatized [code]#operator()#. This variant -of [code]#parallel_for# is designed for when it is necessary to query -the global range within which the global id will vary. +Below is an example of invoking a SYCL kernel function with [code]#parallel_for# +using a lambda function and passing [code]#auto# parameter, treated as +[code]#item#. +In this case, both the global id and global range are queryable. +The same effect can be achieved using class with templatized [code]#operator()#. +This variant of [code]#parallel_for# is designed for when it is necessary to +query the global range within which the global id will vary. [source,,linenums] ---- include::{code_dir}/basicParallelForGeneric.cpp[lines=4..-1] ---- -Below is an example of invoking a SYCL kernel function with -[code]#parallel_for# using a lambda function and passing an integral type -parameter. This example is only valid when calling [code]#parallel_for# with -[code]#range<1>#. In this case only the global id is available. This variant of -[code]#parallel_for# is designed for when it is not necessary to query -the global range of the index space being executed across. +Below is an example of invoking a SYCL kernel function with [code]#parallel_for# +using a lambda function and passing an integral type parameter. +This example is only valid when calling [code]#parallel_for# with +[code]#range<1>#. +In this case only the global id is available. +This variant of [code]#parallel_for# is designed for when it is not necessary to +query the global range of the index space being executed across. [source,,linenums] ---- include::{code_dir}/basicParallelForIntegral.cpp[lines=4..-1] ---- -The [code]#parallel_for# overload without an offset can be called with -either a number or a [code]#braced-init-list# with 1-3 elements. In that -case the following calls are equivalent: +The [code]#parallel_for# overload without an offset can be called with either a +number or a [code]#braced-init-list# with 1-3 elements. +In that case the following calls are equivalent: * [code]#parallel_for(N, some_kernel)# has same effect as [code]#parallel_for(range<1>(N), some_kernel)# @@ -13812,11 +13813,11 @@ case the following calls are equivalent: [code]#parallel_for(range<1>(N), some_kernel)# * [code]#parallel_for({N1, N2}, some_kernel)# has same effect as [code]#parallel_for(range<2>(N1, N2), some_kernel)# - * [code]#parallel_for({N1, N2, N3}, some_kernel)# has same effect - as [code]#parallel_for(range<3>(N1, N2, N3), some_kernel)# + * [code]#parallel_for({N1, N2, N3}, some_kernel)# has same effect as + [code]#parallel_for(range<3>(N1, N2, N3), some_kernel)# -Below is an example of invoking [code]#parallel_for# with a number -instead of an explicit [code]#range# object. +Below is an example of invoking [code]#parallel_for# with a number instead of an +explicit [code]#range# object. [source,,linenums] ---- @@ -13824,38 +13825,40 @@ include::{code_dir}/basicParallelForNumber.cpp[lines=4..-1] ---- For SYCL kernel functions invoked via the above described overload of the -[code]#parallel_for# member function, it is disallowed to use local -accessors or to use a <>. +[code]#parallel_for# member function, it is disallowed to use local accessors or +to use a <>. The following two examples show how a kernel function object can be launched -over a 3D grid, with 3 elements in each dimension. In the first case -work-item ids range from 0 to 2 inclusive, and in the second case -work-item ids run from 1 to 3. +over a 3D grid, with 3 elements in each dimension. +In the first case work-item ids range from 0 to 2 inclusive, and in the second +case work-item ids run from 1 to 3. [source,,linenums] ---- include::{code_dir}/parallelfor.cpp[lines=4..-1] ---- -The last case of a [code]#parallel_for# invocation enables low-level functionality -of work-items and work-groups. This becomes valuable when an execution -requires groups of work-items to coordinate with one another. These are -exposed in SYCL through [code]#+parallel_for (nd_range,...)+# and the -[code]#nd_item# class. In this case, the developer needs to define the -[code]#nd_range# that the kernel will execute on in order to have fine -grained control of the enqueuing of the kernel. This variation of -parallel_for expects an [code]#nd_range#, specifying both local and -global ranges, defining the global number of work-items and the number in -each cooperating work-group. The function object that represents the SYCL -kernel function must take one of: +The last case of a [code]#parallel_for# invocation enables low-level +functionality of work-items and work-groups. +This becomes valuable when an execution requires groups of work-items to +coordinate with one another. +These are exposed in SYCL through [code]#+parallel_for (nd_range,...)+# and the +[code]#nd_item# class. +In this case, the developer needs to define the [code]#nd_range# that the kernel +will execute on in order to have fine grained control of the enqueuing of the +kernel. +This variation of parallel_for expects an [code]#nd_range#, specifying both +local and global ranges, defining the global number of work-items and the number +in each cooperating work-group. +The function object that represents the SYCL kernel function must take one of: 1) a single SYCL [code]#nd_item# parameter, 2) a single generic parameter -([code]#template# parameter or [code]#auto#) that will be treated as -an [code]#nd_item# parameter, 3) any other type converted -from SYCL [code]#nd_item#, representing the currently executing work-item -within the range specified by the [code]#nd_range# parameter. The -[code]#nd_item# parameter makes all information about the work-item and -its position in the range available, and provides access to functions -enabling the use of a <>. +([code]#template# parameter or [code]#auto#) that will be treated as an +[code]#nd_item# parameter, 3) any other type converted from SYCL +[code]#nd_item#, representing the currently executing work-item within the range +specified by the [code]#nd_range# parameter. +The [code]#nd_item# parameter makes all information about the work-item and its +position in the range available, and provides access to functions enabling the +use of a <>. [NOTE] ==== @@ -13863,33 +13866,37 @@ Case 3) above includes user-defined types that can be constructed from [code]#nd_item#, enabling users to layer their own abstractions on top of SYCL. ==== -The following example shows how sixty-four work-items may be launched -in a three-dimensional grid with four in each dimension, and divided -into eight work-groups. Each group of work-items uses a -<> for coordination. +The following example shows how sixty-four work-items may be launched in a +three-dimensional grid with four in each dimension, and divided into eight +work-groups. +Each group of work-items uses a <> for coordination. [source,,linenums] ---- include::{code_dir}/parallelforbarrier.cpp[lines=4..-1] ---- -In all of these cases the underlying <> will be created -and the kernel defined as a function object will be created and enqueued -as part of the command group scope. +In all of these cases the underlying <> will be created and the kernel +defined as a function object will be created and enqueued as part of the command +group scope. Some forms of [code]#parallel_for# accept an offset parameter of type -[code]#id#, where the number of dimensions of the [code]#id# is the same -as the number of dimensions of the [code]#range# that determines the iteration space. -These forms of [code]#parallel_for# execute the same number of iterations as the form -with no offset. The difference is that the [code]#id# or [code]#item# parameter passed -to the kernel function has the value of [code]#offset# implicitly added. +[code]#id#, where the number of dimensions of the [code]#id# is the +same as the number of dimensions of the [code]#range# that determines the +iteration space. +These forms of [code]#parallel_for# execute the same number of iterations as the +form with no offset. +The difference is that the [code]#id# or [code]#item# parameter passed to the +kernel function has the value of [code]#offset# implicitly added. This offset parameter is deprecated in SYCL 2020. An offset can also be passed to the forms of [code]#parallel_for# that accept an -[code]#nd_range# via the third parameter to the [code]#nd_range# constructor. These -forms of [code]#parallel_for# also execute the same number of iterations as if no offset -was specified. The difference is that the [code]#nd_item# parameter passed to the kernel -function has the value of the offset implicitly added to the constituent <>. +[code]#nd_range# via the third parameter to the [code]#nd_range# constructor. +These forms of [code]#parallel_for# also execute the same number of iterations +as if no offset was specified. +The difference is that the [code]#nd_item# parameter passed to the kernel +function has the value of the offset implicitly added to the constituent +<>. This offset parameter is deprecated in SYCL 2020. @@ -13906,37 +13913,38 @@ include::{code_dir}/parallelForWithKernelHandler.cpp[lines=4..-1] ===== Parallel for hierarchical invoke The hierarchical parallel kernel execution interface provides the same -functionality as is available from the <> interface, but -exposed differently. To execute the same sixty-four work-items in -eight work-groups that we saw in a previous example, we execute an -outer [code]#parallel_for_work_group# call to create the -groups. The member function -[code]#handler::parallel_for_work_group# is parameterized by the -number of work-groups, such that the size of each group is chosen by -the runtime, or by the number of work-groups and number of work-items -for users who need more control. - -The body of the outer [code]#parallel_for_work_group# call -consists of a lambda function or function object. The body of this -function object contains code that is executed only once for the -entire work-group. If the code has no side-effects and the compiler -heuristic suggests that it is more efficient to do so, this code will be -executed for each work-item. +functionality as is available from the <> interface, but exposed +differently. +To execute the same sixty-four work-items in eight work-groups that we saw in a +previous example, we execute an outer [code]#parallel_for_work_group# call to +create the groups. +The member function [code]#handler::parallel_for_work_group# is parameterized by +the number of work-groups, such that the size of each group is chosen by the +runtime, or by the number of work-groups and number of work-items for users who +need more control. + +The body of the outer [code]#parallel_for_work_group# call consists of a lambda +function or function object. +The body of this function object contains code that is executed only once for +the entire work-group. +If the code has no side-effects and the compiler heuristic suggests that it is +more efficient to do so, this code will be executed for each work-item. Within this region any variable declared will have the semantics of <>, shared between all <> in the -<>. If the -device compiler can prove that an array of such variables is accessed only by -a single work-item throughout the lifetime of the work-group, for +<>. +If the device compiler can prove that an array of such variables is accessed +only by a single work-item throughout the lifetime of the work-group, for example if access is derived from the id of the work-item with no -transformation, then it can allocate the data in private memory or -registers instead. +transformation, then it can allocate the data in private memory or registers +instead. -To guarantee use of private per-work-item memory, the -[code]#private_memory# class can be used to wrap the data. -This class simply constructs private data for a given group across the -entire group. The id of the current work-item is passed to any access -to grab the correct data. +To guarantee use of private per-work-item memory, the [code]#private_memory# +class can be used to wrap the data. +This class simply constructs private data for a given group across the entire +group. +The id of the current work-item is passed to any access to grab the correct +data. The [code]#private_memory# class has the following interface: @@ -13977,22 +13985,21 @@ T& operator()(const h_item& id) |==== -<> is allocated per underlying <>, not per -iteration of the [code]#parallel_for_work_item# loop. The number -of instances of a private memory object is only under direct control -if a work-group size is passed to the -[code]#parallel_for_work_group# call. If the underlying -work-group size is chosen by the runtime, the number of private memory -instances is opaque to the program. Explicit private memory -declarations should therefore be used with care and with a full -understanding of which instances of a -[code]#parallel_for_work_item# loop will share the same -underlying variable. +<> is allocated per underlying <>, not +per iteration of the [code]#parallel_for_work_item# loop. +The number of instances of a private memory object is only under direct control +if a work-group size is passed to the [code]#parallel_for_work_group# call. +If the underlying work-group size is chosen by the runtime, the number of +private memory instances is opaque to the program. +Explicit private memory declarations should therefore be used with care and with +a full understanding of which instances of a [code]#parallel_for_work_item# loop +will share the same underlying variable. Also within the lambda body can be a sequence of calls to -[code]#parallel_for_work_item#. No work-item can begin executing a -[code]#parallel_for_work_item# until all work-items in the group have -completed executing the previous [code]#parallel_for_work_item#. +[code]#parallel_for_work_item#. +No work-item can begin executing a [code]#parallel_for_work_item# until all +work-items in the group have completed executing the previous +[code]#parallel_for_work_item#. As a result the pair of [code]#parallel_for_work_item# calls in the code below is equivalent to the parallel execution with a <> in the earlier example. @@ -14002,28 +14009,29 @@ earlier example. include::{code_dir}/parallelforworkgroup.cpp[lines=4..-1] ---- -It is valid to use more flexible dimensions of the work-item loops. In -the following example we issue 8 work-groups but let the runtime -choose their size, by not passing a work-group size to the -[code]#parallel_for_work_group# call. The -[code]#parallel_for_work_item# loops may also vary in size, with -their execution ranges unrelated to the dimensions of the work-group, -and the compiler generating an appropriate iteration space to fill the -gap. In this case, the [code]#h_item# provides access to local ids and -ranges that reflect both kernel and [code]#parallel_for_work_item# invocation ranges. +It is valid to use more flexible dimensions of the work-item loops. +In the following example we issue 8 work-groups but let the runtime choose their +size, by not passing a work-group size to the [code]#parallel_for_work_group# +call. +The [code]#parallel_for_work_item# loops may also vary in size, with their +execution ranges unrelated to the dimensions of the work-group, and the compiler +generating an appropriate iteration space to fill the gap. +In this case, the [code]#h_item# provides access to local ids and ranges that +reflect both kernel and [code]#parallel_for_work_item# invocation ranges. [source,,linenums] ---- include::{code_dir}/parallelforworkgroup2.cpp[lines=4..-1] ---- -This interface offers a more intuitive way for tiling parallel -programming paradigms. In summary, the hierarchical model allows a -developer to distinguish the execution at work-group level and at -work-item level using the [code]#parallel_for_work_group# and the nested -[code]#parallel_for_work_item# functions. It also provides this visibility -to the compiler without the need for difficult loop fission such that -host execution may be more efficient. +This interface offers a more intuitive way for tiling parallel programming +paradigms. +In summary, the hierarchical model allows a developer to distinguish the +execution at work-group level and at work-item level using the +[code]#parallel_for_work_group# and the nested [code]#parallel_for_work_item# +functions. +It also provides this visibility to the compiler without the need for difficult +loop fission such that host execution may be more efficient. A [code]#kernel_handler# can optionally be passed as a parameter to the <> that is invoked by any variant of @@ -14042,52 +14050,53 @@ include::{code_dir}/parallelForWorkGroupWithKernelHandler.cpp[lines=4..-1] ==== SYCL functions for explicit memory operations In addition to <>, <> objects can also be used to -perform manual operations on host and device memory by using the -[keyword]#copy# API of the <>. +perform manual operations on host and device memory by using the [keyword]#copy# +API of the <>. Manual copy operations can be seen as specialized kernels executing on the -device, except that typically this operations will be implemented using a -host API that exists as part of a backend (e.g, OpenCL enqueue copy operations). - -These explicit copy operations have a source and a destination. When an -accessor is the _source_ of the operation, the destination can be a host -pointer or another accessor. The _source_ accessor must have either -[code]#access_mode::read# or [code]#access_mode::read_write# access mode. When -an accessor is the _destination_ of the explicit copy operation, the source can -be a host pointer or another accessor. The _destination_ accessor must have -either [code]#access_mode::write#, [code]#access_mode::read_write#, -[code]#access_mode::discard_write# or [code]#access_mode::discard_read_write# -access mode. +device, except that typically this operations will be implemented using a host +API that exists as part of a backend (e.g, OpenCL enqueue copy operations). + +These explicit copy operations have a source and a destination. +When an accessor is the _source_ of the operation, the destination can be a host +pointer or another accessor. +The _source_ accessor must have either [code]#access_mode::read# or +[code]#access_mode::read_write# access mode. +When an accessor is the _destination_ of the explicit copy operation, the source +can be a host pointer or another accessor. +The _destination_ accessor must have either [code]#access_mode::write#, +[code]#access_mode::read_write#, [code]#access_mode::discard_write# or +[code]#access_mode::discard_read_write# access mode. When an accessor is used as a parameter to one of these explicit copy operations, the target must be either [code]#target::device# or [code]#target::constant_buffer#. -When accessors are both the source and the destination, -the operation is executed on objects controlled by the SYCL runtime. -The SYCL runtime is allowed to not perform an explicit in-copy operation -if a different path to update the data is available according to -the SYCL application memory model. +When accessors are both the source and the destination, the operation is +executed on objects controlled by the SYCL runtime. +The SYCL runtime is allowed to not perform an explicit in-copy operation if a +different path to update the data is available according to the SYCL application +memory model. The most recent copy of the memory object may reside on any context controlled -by the SYCL runtime, or on the host in a pointer controlled by the -SYCL runtime. The SYCL runtime will ensure that data is copied to the destination -once the <> has completed execution. +by the SYCL runtime, or on the host in a pointer controlled by the SYCL runtime. +The SYCL runtime will ensure that data is copied to the destination once the +<> has completed execution. Whenever a host pointer is used as either the source or the destination of these -explicit memory operations, it is the responsibility -of the user for that pointer to have at least as much memory allocated as -the accessor is giving access to, e.g: if an accessor accesses a range -of 10 elements of [code]#int# type, the host pointer must at least have -[code]#10 * sizeof(int)# bytes of memory allocated. +explicit memory operations, it is the responsibility of the user for that +pointer to have at least as much memory allocated as the accessor is giving +access to, e.g: if an accessor accesses a range of 10 elements of [code]#int# +type, the host pointer must at least have [code]#10 * sizeof(int)# bytes of +memory allocated. A special case is the [code]#update_host# member function. -This member function only requires an accessor, and instructs the runtime to update -the internal copy of the data in the host, if any. This is particularly -useful when used in conjunction with the [code]#buffer# constructor overloads -which accept mutex objects. +This member function only requires an accessor, and instructs the runtime to +update the internal copy of the data in the host, if any. +This is particularly useful when used in conjunction with the [code]#buffer# +constructor overloads which accept mutex objects. -<> describes the interface for the -explicit copy operations. +<> describes the interface for the explicit copy +operations. [[table.members.handler.copy]] @@ -14265,10 +14274,9 @@ the default behavior. For more detail on USM, please see <>. |==== -The listing below illustrates how to use explicit copy -operations in SYCL. The example copies half of the contents of -a [code]#std::vector# into the device, leaving the rest of the -contents of the buffer on the device unchanged. +The listing below illustrates how to use explicit copy operations in SYCL. +The example copies half of the contents of a [code]#std::vector# into the +device, leaving the rest of the contents of the buffer on the device unchanged. [source,,linenums] ---- @@ -14289,18 +14297,19 @@ include::{header_dir}/handler/useKernelBundle.h[lines=4..-1] _Effects:_ The <> associated with the [code]#handler# will use <> of the [code]#kernel_bundle# [code]#execBundle# -in any of its <>. If the -[code]#kernel_bundle# contains multiple <> that are -compatible with the <> to which the kernel is submitted, then the -<> chosen is implementation-defined. - -If the <> attempts to invoke a kernel that is not contained by -a compatible device image in [code]#execBundle#, the -<> throws a synchronous [code]#exception# with the -[code]#errc::kernel_not_supported# error code. If the <> has a -secondary queue, then the [code]#execBundle# must contain a kernel that is -compatible with both the primary queue's device and the secondary queue's -device, otherwise the <> throws this exception. +in any of its <>. +If the [code]#kernel_bundle# contains multiple <> +that are compatible with the <> to which the kernel is submitted, then +the <> chosen is implementation-defined. + +If the <> attempts to invoke a kernel that is not contained by a +compatible device image in [code]#execBundle#, the <> +throws a synchronous [code]#exception# with the +[code]#errc::kernel_not_supported# error code. +If the <> has a secondary queue, then the [code]#execBundle# must +contain a kernel that is compatible with both the primary queue's device and the +secondary queue's device, otherwise the <> throws +this exception. Since the handler method for setting specialization constants is incompatible with the kernel bundle method, applications should not call this function if @@ -14324,38 +14333,43 @@ _Throws:_ Device code can make use of <> which represent constants whose values can be set dynamically during execution -of the <>. The values of these constants are fixed when a -<> is invoked, and they do not change during the -execution of the kernel. However, the application is able to set a new value -for a specialization constant each time a kernel is invoked, so the values can -be tuned differently for each invocation. +of the <>. +The values of these constants are fixed when a <> is +invoked, and they do not change during the execution of the kernel. +However, the application is able to set a new value for a specialization +constant each time a kernel is invoked, so the values can be tuned differently +for each invocation. There are two methods for an application to use specialization constants, one method requires creating a [code]#kernel_bundle# object and the other does not. -The syntax for both methods is mostly the same. Both methods declare -specialization constants in the same way, and kernels read their values in the -same way. The main difference is whether their values are set via +The syntax for both methods is mostly the same. +Both methods declare specialization constants in the same way, and kernels read +their values in the same way. +The main difference is whether their values are set via [code]#handler::set_specialization_constant()# or via -[code]#kernel_bundle::set_specialization_constant()#. These two methods are -incompatible with one another, so they may not both be used by the same -<>. +[code]#kernel_bundle::set_specialization_constant()#. +These two methods are incompatible with one another, so they may not both be +used by the same <>. [NOTE] ==== Implementations that support online compilation of kernel bundles will likely implement both methods of specialization constants using kernel bundles. Therefore, applications should expect that there is some overhead associated -with invoking a kernel with new values for its specialization constants. A -typical implementation records the values of specialization constants set via +with invoking a kernel with new values for its specialization constants. +A typical implementation records the values of specialization constants set via [code]#handler::set_specialization_constant()# and remembers these values until -a kernel is invoked (e.g. via [code]#parallel_for()#). At this point, the -implementation determines the bundle that contains the invoked kernel. If -that bundle has already been compiled for the handler's device and compiled +a kernel is invoked (e.g. via [code]#parallel_for()#). +At this point, the implementation determines the bundle that contains the +invoked kernel. +If that bundle has already been compiled for the handler's device and compiled with the correct values for the specialization constants, the kernel is -scheduled for invocation. Otherwise, the implementation compiles the -bundle before scheduling the kernel for invocation. Therefore, applications -that frequently change the values of specialization constants may see an -overhead associated with recompilation of the kernel's bundle. +scheduled for invocation. +Otherwise, the implementation compiles the bundle before scheduling the kernel +for invocation. +Therefore, applications that frequently change the values of specialization +constants may see an overhead associated with recompilation of the kernel's +bundle. ==== @@ -14382,11 +14396,11 @@ class with the following restrictions: [NOTE] ==== -The expectation is that some implementations may conceptually insert code at -the end of a translation unit which references each `specialization_id` -variable that is declared in that translation unit. The restrictions listed -above make this possible by ensuring that these variables are accessible at the -end of the translation unit. +The expectation is that some implementations may conceptually insert code at the +end of a translation unit which references each `specialization_id` variable +that is declared in that translation unit. +The restrictions listed above make this possible by ensuring that these +variables are accessible at the end of the translation unit. ==== The following example illustrates some of these restrictions: @@ -14438,15 +14452,16 @@ specialization_id& operator=(specialization_id&& rhs) = delete; // (4) If the application uses specialization constants without creating a [code]#kernel_bundle# object, it can set and get their values from <> by calling member functions of the [code]#handler# -class. These member functions have a template parameter [code]#SpecName# whose -value must be a reference to a variable of type [code]#specialization_id#, -which defines the type and default value of the specialization constant. +class. +These member functions have a template parameter [code]#SpecName# whose value +must be a reference to a variable of type [code]#specialization_id#, which +defines the type and default value of the specialization constant. When not using a kernel bundle, the value of a specialization constant that is used in a kernel invoked from a <> is affected by calls to set -its value from that same <>, but it is not affected by calls -from other <> even if those calls are from -another invocation of the same <>. +its value from that same <>, but it is not affected by calls from +other <> even if those calls are from another +invocation of the same <>. [source] ---- @@ -14456,18 +14471,18 @@ void set_specialization_constant( ---- _Effects:_ Sets the value of the specialization constant whose address is -[code]#SpecName# for this handler's <>. If the specialization -constant's value was previously set in this same <>, the value -is overwritten. +[code]#SpecName# for this handler's <>. +If the specialization constant's value was previously set in this same +<>, the value is overwritten. -This function may be called even if the specialization constant -[code]#SpecName# isn't used by the kernel that is invoked by this handler's -<>. Doing so has no effect on the invoked kernel. +This function may be called even if the specialization constant [code]#SpecName# +isn't used by the kernel that is invoked by this handler's <>. +Doing so has no effect on the invoked kernel. _Throws:_ - * An [code]#exception# with the [code]#errc::invalid# error code if - a kernel bundle has been bound to the [code]#handler# via + * An [code]#exception# with the [code]#errc::invalid# error code if a kernel + bundle has been bound to the [code]#handler# via [code]#use_kernel_bundle()#. [source] @@ -14478,14 +14493,15 @@ get_specialization_constant(); ---- _Returns:_ The value of the specialization constant whose address is -[code]#SpecName# for this handler's <>. If the value was -previously set in this handler's <>, that value is returned. +[code]#SpecName# for this handler's <>. +If the value was previously set in this handler's <>, that value +is returned. Otherwise, the specialization constant's default value is returned. _Throws:_ - * An [code]#exception# with the [code]#errc::invalid# error code if - a kernel bundle has been bound to the [code]#handler# via + * An [code]#exception# with the [code]#errc::invalid# error code if a kernel + bundle has been bound to the [code]#handler# via [code]#use_kernel_bundle()#. @@ -14494,9 +14510,10 @@ _Throws:_ In order to read the value of a specialization constant from device code, the <> must be declared to take an object of type -[code]#kernel_handler# as its last parameter. The <> constructs -this object, which has a member function for reading the specialization -constant's value. A synopsis of this class is shown below. +[code]#kernel_handler# as its last parameter. +The <> constructs this object, which has a member function for +reading the specialization constant's value. +A synopsis of this class is shown below. [source,,linenums] ---- @@ -14514,13 +14531,15 @@ get_specialization_constant(); ---- _Returns:_ The value of the <> whose address is -[code]#SpecName#. For a kernel invoked from a <> that was not -bound to a kernel bundle, the value is the same as what would have been -returned if [code]#handler::get_specialization_constant()# was called -immediately before invoking the kernel. For a kernel invoked from a -<> that was bound to a kernel bundle, the value is the same as -what would be returned if [code]#kernel_bundle::get_specialization_constant()# -was called on the bound bundle. +[code]#SpecName#. +For a kernel invoked from a <> that was not bound to a kernel +bundle, the value is the same as what would have been returned if +[code]#handler::get_specialization_constant()# was called immediately before +invoking the kernel. +For a kernel invoked from a <> that was bound to a kernel bundle, +the value is the same as what would be returned if +[code]#kernel_bundle::get_specialization_constant()# was called on the bound +bundle. ==== Example usage @@ -14543,17 +14562,18 @@ include::{code_dir}/usingSpecConstants.cpp[lines=4..-1] === Overview A <> is a native {cpp} callable which is scheduled by the -<>. A <> is submitted to a <> via a -<> by a <>. +<>. +A <> is submitted to a <> via a <> by a +<>. -When a <> is submitted to a <> it is scheduled -based on its data dependencies with other <> including -<> and asynchronous copies, resolving any -requisites created by <> attached to the <> as -defined in <>. +When a <> is submitted to a <> it is scheduled based +on its data dependencies with other <> including +<> and asynchronous +copies, resolving any requisites created by <> attached to +the <> as defined in <>. -Since a <> is invoked directly by the <> rather -than being compiled as a <>, it does not have the same +Since a <> is invoked directly by the <> rather than +being compiled as a <>, it does not have the same restrictions as a <>, and can therefore contain any arbitrary {cpp} code. @@ -14565,8 +14585,8 @@ A <> can be enqueued on any <> and the callable will be invoked directly by the SYCL runtime, regardless of which <> the <> is associated with. -A <> is enqueued on a <> via the [code]#host_task# -member function of the [code]#handler# class. +A <> is enqueued on a <> via the [code]#host_task# member +function of the [code]#handler# class. The <> returned by the submission of the associated <> enters the completed state (corresponding to a status of [code]#info::event_command_status::complete#) once the invocation of the @@ -14576,10 +14596,10 @@ turned into an <> that can be handled as described in <>. A <> can optionally be used to interoperate with the -<> associated with the <> executing the -<>, the <> that the <> is associated with, the -<> that the <> is associated with and the <> -that have been captured in the callable, via an optional +<> associated with the <> +executing the <>, the <> that the <> is associated +with, the <> that the <> is associated with and the +<> that have been captured in the callable, via an optional [code]#interop_handle# parameter. This allows <> to be used for two purposes: either as a @@ -14589,15 +14609,16 @@ within the scheduling of the <>. For the former use case, construct a buffer accessor with [code]#target::host_task# or an image accessor with -[code]#image_target::host_task#. This makes the buffer or image available -on the host during execution of the <>. +[code]#image_target::host_task#. +This makes the buffer or image available on the host during execution of the +<>. -For the latter case, construct a buffer accessor with -[code]#target::device# or [code]#target::constant_buffer#, or construct -an image accessor with [code]#image_target::device#. This makes the buffer or -image available on the device that is associated with the queue used to submit -the <>, so that it can be accessed via interoperability member -functions provided by the [code]#interop_handle# class. +For the latter case, construct a buffer accessor with [code]#target::device# or +[code]#target::constant_buffer#, or construct an image accessor with +[code]#image_target::device#. +This makes the buffer or image available on the device that is associated with +the queue used to submit the <>, so that it can be accessed via +interoperability member functions provided by the [code]#interop_handle# class. Local <> cannot be used within a <>. @@ -14612,16 +14633,17 @@ include::{header_dir}/hostTask/hostTaskSynopsis.h[lines=4..-1] [[subsec:interfaces.hosttasks.interophandle]] === Class [code]#interop_handle# -The [code]#interop_handle# class is an abstraction over the <> -which is being used to invoke the <> and its associated -<> and <>. It also represents the state of the -<> dependency model at the point the <> is invoked. +The [code]#interop_handle# class is an abstraction over the <> which is +being used to invoke the <> and its associated <> and +<>. +It also represents the state of the <> dependency model at the +point the <> is invoked. The [code]#interop_handle# class provides access to the -<> associated with the <>, <>, -<> and any <> or <> that are captured in -the callable being invoked in order to allow a <> to be used -for interoperability purposes. +<> associated with the <>, <>, <> +and any <> or <> that are captured in the callable +being invoked in order to allow a <> to be used for interoperability +purposes. An [code]#interop_handle# cannot be constructed by user-code, only by the <>. @@ -14663,102 +14685,100 @@ include::{header_dir}/hostTask/classInteropHandle/getbackend.h[lines=4..-1] include::{header_dir}/hostTask/classInteropHandle/getnativeX.h[lines=4..-1] ---- - . _Constraints:_ Available only if the optional interoperability - function [code]#get_native# taking a [code]#buffer# is - available and if [code]#accTarget# is - [code]#target::device#. + . _Constraints:_ Available only if the optional interoperability function + [code]#get_native# taking a [code]#buffer# is available and if + [code]#accTarget# is [code]#target::device#. + -- _Returns:_ The <> associated with the underlying -<> of <> [code]#bufferAcc#. The <> -returned must be in a state where it represents the memory in its current state -within the <> dependency model and is capable of being used in a -way appropriate for the associated <>. It is undefined behavior to use -the <> outside of the scope of the <>. +<> of <> [code]#bufferAcc#. +The <> returned must be in a state where it represents +the memory in its current state within the <> dependency model and +is capable of being used in a way appropriate for the associated <>. +It is undefined behavior to use the <> outside of the +scope of the <>. _Throws:_ An [code]#exception# with the [code]#errc::invalid# error code if the -<> [code]#bufferAcc# was not registered with the -<> which contained the <>. Must throw an [code]#exception# with the -[code]#errc::backend_mismatch# error code if [code]#Backend != get_backend()#. +<> [code]#bufferAcc# was not registered with the <> +which contained the <>. +Must throw an [code]#exception# with the [code]#errc::backend_mismatch# error +code if [code]#Backend != get_backend()#. -- - . _Constraints:_ Available only if the optional interoperability - function [code]#get_native# taking an [code]#unsampled_image# - is available. + . _Constraints:_ Available only if the optional interoperability function + [code]#get_native# taking an [code]#unsampled_image# is available. + -- _Returns:_ The <> associated with with the underlying -[code]#unsampled_image# of <> [code]#imageAcc#. The -<> returned must be in a state where it represents the -memory in its current state within the <> dependency model and is -capable of being used in a way appropriate for the associated <>. It -is undefined behavior to use the <> outside of the scope -of the <>. +[code]#unsampled_image# of <> [code]#imageAcc#. +The <> returned must be in a state where it represents +the memory in its current state within the <> dependency model and +is capable of being used in a way appropriate for the associated <>. +It is undefined behavior to use the <> outside of the +scope of the <>. _Throws:_ An [code]#exception# with the [code]#errc::invalid# error code if the -<> [code]#imageAcc# was not registered with the -<> which contained the <>. +<> [code]#imageAcc# was not registered with the <> +which contained the <>. -- - . _Constraints:_ Available only if the optional interoperability - function [code]#get_native# taking an [code]#sampled_image# - is available. + . _Constraints:_ Available only if the optional interoperability function + [code]#get_native# taking an [code]#sampled_image# is available. + -- _Returns:_ The <> associated with with the underlying -[code]#sampled_image# of <> [code]#imageAcc#. The -<> returned must be in a state where it represents the -memory in its current state within the <> dependency model and is -capable of being used in a way appropriate for the associated <>. It -is undefined behavior to use the <> outside of the scope -of the <>. +[code]#sampled_image# of <> [code]#imageAcc#. +The <> returned must be in a state where it represents +the memory in its current state within the <> dependency model and +is capable of being used in a way appropriate for the associated <>. +It is undefined behavior to use the <> outside of the +scope of the <>. _Throws:_ An [code]#exception# with the [code]#errc::invalid# error code if the -<> [code]#imageAcc# was not registered with the -<> which contained the <>. Must throw an [code]#exception# with the -[code]#errc::backend_mismatch# error code if [code]#Backend != get_backend()#. +<> [code]#imageAcc# was not registered with the <> +which contained the <>. +Must throw an [code]#exception# with the [code]#errc::backend_mismatch# error +code if [code]#Backend != get_backend()#. -- - . _Constraints:_ Available only if the optional interoperability - function [code]#get_native# taking a [code]#queue# is - available. + . _Constraints:_ Available only if the optional interoperability function + [code]#get_native# taking a [code]#queue# is available. + -- _Returns:_ The <> associated with the <> that the -<> was submitted to. If the <> was submitted with a -secondary <> and the fall-back was triggered, the <> that is -associated with the [code]#interop_handle# must be the fall-back <>. The -<> returned must be in a state where it is capable of -being used in a way appropriate for the associated <>. It is undefined -behavior to use the <> outside of the scope of the -<>. +<> was submitted to. +If the <> was submitted with a secondary <> and the +fall-back was triggered, the <> that is associated with the +[code]#interop_handle# must be the fall-back <>. +The <> returned must be in a state where it is capable of +being used in a way appropriate for the associated <>. +It is undefined behavior to use the <> outside of the +scope of the <>. _Throws:_ Must throw an [code]#exception# with the [code]#errc::backend_mismatch# error code if [code]#Backend != get_backend()#. -- - . _Constraints:_ Available only if the optional interoperability - function [code]#get_native# taking a [code]#device# is - available. + . _Constraints:_ Available only if the optional interoperability function + [code]#get_native# taking a [code]#device# is available. + -- _Returns:_ The <> associated with the <> that is -associated with the <> that the <> was submitted to. The -<> returned must be in a state where it is capable of -being used in a way appropriate for the associated <>. It is -undefined behavior to use the <> outside of the scope of -the <>. +associated with the <> that the <> was submitted to. +The <> returned must be in a state where it is capable of +being used in a way appropriate for the associated <>. +It is undefined behavior to use the <> outside of the +scope of the <>. _Throws:_ Must throw an [code]#exception# with the [code]#errc::backend_mismatch# error code if [code]#Backend != get_backend()#. -- - . _Constraints:_ Available only if the optional interoperability - function [code]#get_native# taking a [code]#context# is - available. + . _Constraints:_ Available only if the optional interoperability function + [code]#get_native# taking a [code]#context# is available. + -- -_Returns:_ The <> associated with the <> that -is associated with the <> that the <> was submitted to. The -<> returned must be in a state where it is capable of -being used in a way appropriate for the associated <>. It is -undefined behavior to use the <> outside of the scope of -the <>. +_Returns:_ The <> associated with the <> that is +associated with the <> that the <> was submitted to. +The <> returned must be in a state where it is capable of +being used in a way appropriate for the associated <>. +It is undefined behavior to use the <> outside of the +scope of the <>. _Throws:_ Must throw an [code]#exception# with the [code]#errc::backend_mismatch# error code if [code]#Backend != get_backend()#. @@ -14768,8 +14788,8 @@ _Throws:_ Must throw an [code]#exception# with the [[subsec:interfaces.hosttask.handler]] === Additions to the [code]#handler# class -This section describes member functions in the <> class that are -used with host tasks. +This section describes member functions in the <> class that are used +with host tasks. [source,,linenums] ---- @@ -14777,123 +14797,128 @@ include::{header_dir}/hostTask/classHandler/hostTask.h[lines=4..-1] ---- . _Effects:_ Enqueues an implementation-defined command to the - <> to invoke [code]#hostTaskCallable# exactly once. The - scheduling of the invocation of [code]#hostTaskCallable# in relation to - other <> enqueued to the <> must be in accordance - with the dependency model described in <>. + <> to invoke [code]#hostTaskCallable# exactly once. + The scheduling of the invocation of [code]#hostTaskCallable# in relation to + other <> enqueued to the <> must be in + accordance with the dependency model described in + <>. Initializes an [code]#interop_handle# object and passes it to - [code]#hostTaskCallable# when it is invoked if - [code]#std::is_invocable_v# evaluates to - [code]#true#, otherwise invokes [code]#hostTaskCallable# as a - nullary function. + [code]#hostTaskCallable# when it is invoked if [code]#std::is_invocable_v# evaluates to [code]#true#, otherwise invokes + [code]#hostTaskCallable# as a nullary function. [[sec:interfaces.bundles]] == Kernel bundles -Kernel bundles provide several features to a <>. For -implementations that support an online compiler, they provide fine grained -control over the online compilation of device code. For example, an -application can use a kernel bundle to compile its <> at a -specific time during the application's execution (such as during its -initialization), rather than relying on the implementation's default behavior -(which may not compile kernels until they are submitted). +Kernel bundles provide several features to a <>. +For implementations that support an online compiler, they provide fine grained +control over the online compilation of device code. +For example, an application can use a kernel bundle to compile its +<> at a specific time during the application's execution (such +as during its initialization), rather than relying on the implementation's +default behavior (which may not compile kernels until they are submitted). Kernel bundles also provide a way for the application to set the values of -specialization constants in many kernels before any of them are submitted to -a device, which could potentially be more efficient in some cases. +specialization constants in many kernels before any of them are submitted to a +device, which could potentially be more efficient in some cases. Kernel bundles provide a way for the application to introspect its kernels. -For example, an application can use a bundle to query a kernel's work-group -size when it is run on a specific device. +For example, an application can use a bundle to query a kernel's work-group size +when it is run on a specific device. Finally, kernel bundles provide an extension point to interoperate with backend -and device specific features. Some examples of this include invocation of -device specific built-in kernels, online compilation of kernel code with vendor -specific options, or interoperation with kernels created with backend APIs. +and device specific features. +Some examples of this include invocation of device specific built-in kernels, +online compilation of kernel code with vendor specific options, or +interoperation with kernels created with backend APIs. === Overview A kernel bundle is a high-level abstraction which represents a set of -<> that are associated with a <> and can be executed -on a number of <>, where each device is associated with that -same context. Depending on how a bundle is obtained, it could represent all of -the <> in the <>, -or a certain subset of them. - -A kernel bundle is composed of one or more <>, -where each device image is an indivisible unit of compilation and/or linking. +<> that are associated with a <> and can be executed on +a number of <>, where each device is associated with that same +context. +Depending on how a bundle is obtained, it could represent all of the +<> in the <>, or a +certain subset of them. + +A kernel bundle is composed of one or more <>, where +each device image is an indivisible unit of compilation and/or linking. When the <> compiles or links one of the kernels represented by the device image, it must also compile or link any other kernels the device -image represents. Once a device image is compiled and linked, any of the other -kernels which that device image represents may be invoked without further -compilation or linking. - -Each <> a bundle represents must reside in at least one -of the bundle's device images. However, it is not necessary for each device -image to contain all of the kernel functions that the bundle represents. The -granularity in which kernel functions are grouped into device images is an +image represents. +Once a device image is compiled and linked, any of the other kernels which that +device image represents may be invoked without further compilation or linking. + +Each <> a bundle represents must reside in at least one of +the bundle's device images. +However, it is not necessary for each device image to contain all of the kernel +functions that the bundle represents. +The granularity in which kernel functions are grouped into device images is an implementation detail. [NOTE] ==== To illustrate the intent of device images, a hypothetical implementation could represent an application's kernel functions in both the SPIR-V format and also -in a native device code format. The implementation's ahead-of-time compiler -in this example produces device images with native code for certain devices and -also produces SPIR-V device images for use with other devices. Note that in -such an implementation, a particular kernel function could be represented in -more than one device image. - -An implementation could choose to have all kernel functions from all -translation units grouped together in a single device image, to have each -kernel function represented in its own device image, or to group kernel -functions in some other way. +in a native device code format. +The implementation's ahead-of-time compiler in this example produces device +images with native code for certain devices and also produces SPIR-V device +images for use with other devices. +Note that in such an implementation, a particular kernel function could be +represented in more than one device image. + +An implementation could choose to have all kernel functions from all translation +units grouped together in a single device image, to have each kernel function +represented in its own device image, or to group kernel functions in some other +way. ==== Each device associated with a kernel bundle must have at least one compatible device image, meaning that the implementation can either invoke the image's -kernel functions directly on the device or that the implementation can -translate the device image into a format that allows it to invoke the kernel -functions. +kernel functions directly on the device or that the implementation can translate +the device image into a format that allows it to invoke the kernel functions. An outcome of this definition is that each kernel function in a bundle must be -invocable on at least one of the devices associated with the bundle. However, -it is not necessary for every kernel function in the bundle to be invocable on -every associated device. +invocable on at least one of the devices associated with the bundle. +However, it is not necessary for every kernel function in the bundle to be +invocable on every associated device. [NOTE] ==== One common reason why a kernel function might not be invocable on every device -associated with a bundle is if the kernel uses optional device features. It's -possible that these features are available to only some devices in the bundle. +associated with a bundle is if the kernel uses optional device features. +It's possible that these features are available to only some devices in the +bundle. The use of optional device features could affect how the implementation groups kernels into device images, depending on how these features are represented. For example, consider an implementation where the optional feature is represented in SPIR-V but translation of that SPIR-V into native code will fail -if the target device does not support the feature. In such an implementation, -kernels that use optional features should not be grouped into the same device -image as kernels that do not use these features. Since a device image is an -indivisible unit of compilation, doing so would cause a compilation failure if -a kernel K1 is invoked on a device D1 if K1 happened to reside in the same -device image as another kernel K2 that used a feature which is not supported on -device D1. +if the target device does not support the feature. +In such an implementation, kernels that use optional features should not be +grouped into the same device image as kernels that do not use these features. +Since a device image is an indivisible unit of compilation, doing so would cause +a compilation failure if a kernel K1 is invoked on a device D1 if K1 happened to +reside in the same device image as another kernel K2 that used a feature which +is not supported on device D1. See <> for more about optional device features. ==== A <> can obtain a kernel bundle by calling one of the -overloads of the [code]#get_kernel_bundle()# free function. Certain backends -may provide additional mechanisms for obtaining bundles with other -representations. If this is supported, the backend specification document will -describe the details. +overloads of the [code]#get_kernel_bundle()# free function. +Certain backends may provide additional mechanisms for obtaining bundles with +other representations. +If this is supported, the backend specification document will describe the +details. Once a kernel bundle has been obtained there are a number of free functions for -performing compilation, linking and joining. Once a bundle is compiled and -linked, the application can invoke kernels from the bundle by calling -[code]#handler::use_kernel_bundle()# as described in +performing compilation, linking and joining. +Once a bundle is compiled and linked, the application can invoke kernels from +the bundle by calling [code]#handler::use_kernel_bundle()# as described in <>. @@ -14909,51 +14934,53 @@ include::{header_dir}/bundle/freeFunctions.h[lines=4..-1] === Fixed-function built-in kernels SYCL allows a <> to expose fixed functionality as non-programmable -built-in kernels. The availability and behavior of these built-in kernels are -backend specific and are not required to follow the SYCL execution and memory -models. However, the basic interface is common to all backends. +built-in kernels. +The availability and behavior of these built-in kernels are backend specific and +are not required to follow the SYCL execution and memory models. +However, the basic interface is common to all backends. [[sec:interfaces.bundles.bundlestate]] === Bundle states -A <> can be in one of three different -<> which are represented by an enum class called -[code]#bundle_state#. <> describes the semantics of -these three states. +A <> can be in one of three different <> which are represented by an enum class called [code]#bundle_state#. +<> describes the semantics of these three states. -The states form a progression. A bundle in [code]#bundle_state::input# can -be translated into [code]#bundle_state::object# by online compilation of the -bundle. A bundle in [code]#bundle_state::object# can be translated into +The states form a progression. +A bundle in [code]#bundle_state::input# can be translated into +[code]#bundle_state::object# by online compilation of the bundle. +A bundle in [code]#bundle_state::object# can be translated into [code]#bundle_state::executable# by online linking. [NOTE] ==== Each implementation is free to define the "online compilation" and "online -linking" operations as it sees fit, so long as this progression of bundle -states is preserved and so long as the bundles in each state behave as -specified. +linking" operations as it sees fit, so long as this progression of bundle states +is preserved and so long as the bundles in each state behave as specified. ==== There is no requirement that an implementation must expose kernels in -[code]#bundle_state::input# or [code]#bundle_state::object#. In fact, an -implementation could expose some kernels in these states but not others. For -example, this behavior could be controlled by implementation specific options -to the ahead-of-time compiler. Kernels that are not exposed in these states -cannot be online compiled or online linked by the application. +[code]#bundle_state::input# or [code]#bundle_state::object#. +In fact, an implementation could expose some kernels in these states but not +others. +For example, this behavior could be controlled by implementation specific +options to the ahead-of-time compiler. +Kernels that are not exposed in these states cannot be online compiled or online +linked by the application. All kernels defined in the <>, however, must be exposed in [code]#bundle_state::executable# because this is the only state that allows a -kernel to be invoked on a device. Device built-in kernels are also exposed -in [code]#bundle_state::executable#. +kernel to be invoked on a device. +Device built-in kernels are also exposed in [code]#bundle_state::executable#. If an application exposes a bundle in [code]#bundle_state::input# for a device D, then the implementation must also provide an online compiler for device D. Therefore, an application need not explicitly test for [code]#aspect::online_compiler# if it successfully obtains a bundle in -[code]#bundle_state::input# for that device. Likewise, an implementation must -provide an online linker for device D if it exposes a bundle in -[code]#bundle_state::object# for device D. +[code]#bundle_state::input# for that device. +Likewise, an implementation must provide an online linker for device D if it +exposes a bundle in [code]#bundle_state::object# for device D. [[table.bundles.states]] .Enumeration of possible bundle states @@ -14995,17 +15022,18 @@ bundle_state::executable === Kernel identifiers Some of the functions related to kernel bundles take an input parameter of type -[code]#kernel_id# which identifies a kernel. A synopsis of the -[code]#kernel_id# class is shown below along with a description of its member -functions. Additionally, this class provides the common special member -functions and common member functions that are listed in -<> in <> and +[code]#kernel_id# which identifies a kernel. +A synopsis of the [code]#kernel_id# class is shown below along with a +description of its member functions. +Additionally, this class provides the common special member functions and common +member functions that are listed in <> in +<> and <>, respectively. As with all SYCL objects that have the common reference semantics, kernel -identifiers are equality comparable. Two [code]#kernel_id# objects compare -equal if and only if they refer to the same application kernel or to the same -device built-in kernel. +identifiers are equality comparable. +Two [code]#kernel_id# objects compare equal if and only if they refer to the +same application kernel or to the same device built-in kernel. There is no public default constructor for this class. @@ -15019,18 +15047,20 @@ include::{header_dir}/bundle/kernelIdClass.h[lines=4..-1] const char* get_name() const noexcept; ---- -_Returns:_ An implementation-defined null-terminated string containing the -name of the kernel. There is no guarantee that this name is unique amongst -all the kernels, nor is there a guarantee that the name is stable from one -run of the application to another. The lifetime of the memory containing the -name is unspecified. +_Returns:_ An implementation-defined null-terminated string containing the name +of the kernel. +There is no guarantee that this name is unique amongst all the kernels, nor is +there a guarantee that the name is stable from one run of the application to +another. +The lifetime of the memory containing the name is unspecified. [NOTE] ==== In practice, the lifetime of the memory containing the name will typically -extend until the application terminates, unless the kernel associated with -the name comes from a dynamic library. In this case, the lifetime of the -memory may end if the dynamic library is unloaded. +extend until the application terminates, unless the kernel associated with the +name comes from a dynamic library. +In this case, the lifetime of the memory may end if the dynamic library is +unloaded. ==== @@ -15051,9 +15081,10 @@ _Preconditions:_ The template parameter [code]#KernelName# must be the Since lambda functions have no standard type name, kernels defined as lambda functions must specify a [code]#KernelName# in their <> in order to obtain their identifier via this -function. Applications which call [code]#get_kernel_id()# for a -[code]#KernelName# that is not defined are ill formed, and the implementation -must issue a diagnostic in this case. +function. +Applications which call [code]#get_kernel_id()# for a [code]#KernelName# that is +not defined are ill formed, and the implementation must issue a diagnostic in +this case. _Returns:_ The identifier of the kernel associated with [code]#KernelName#. @@ -15063,18 +15094,19 @@ std::vector get_kernel_ids(); ---- _Returns:_ A vector with the identifiers for all kernels defined in the -<>. This does not include identifiers for any device -built-in kernels. +<>. +This does not include identifiers for any device built-in kernels. === Obtaining a kernel bundle A <> can obtain a kernel bundle by calling one of the -overloads of the free function [code]#get_kernel_bundle()#. The implementation -may return a bundle that consists of device images that were created by the -ahead-of-time compiler, or it may call the online compiler or linker to create -the bundle's device images in the requested state. A bundle may also contain -device images that represent a device's built-in kernels. +overloads of the free function [code]#get_kernel_bundle()#. +The implementation may return a bundle that consists of device images that were +created by the ahead-of-time compiler, or it may call the online compiler or +linker to create the bundle's device images in the requested state. +A bundle may also contain device images that represent a device's built-in +kernels. When [code]#get_kernel_bundle()# is used to obtain a kernel bundle in [code]#bundle_state::object# or [code]#bundle_state::executable#, any @@ -15089,7 +15121,8 @@ kernel_bundle get_kernel_bundle(const context& ctxt, _Returns:_ A kernel bundle in state [code]#State# which contains all of the <> in the application which are compatible with at least one of -the devices in [code]#devs#. This does not include any device built-in kernels. +the devices in [code]#devs#. +This does not include any device built-in kernels. The bundle's set of associated devices is [code]#devs# (with any duplicate devices removed). @@ -15100,8 +15133,8 @@ the application's kernels. _Throws:_ - * An [code]#exception# with the [code]#errc::invalid# error code if any of - the devices in [code]#devs# is not one of devices contained by the context + * An [code]#exception# with the [code]#errc::invalid# error code if any of the + devices in [code]#devs# is not one of devices contained by the context [code]#ctxt# or is not a <> of some device in [code]#ctxt#. * An [code]#exception# with the [code]#errc::invalid# error code if the @@ -15128,26 +15161,27 @@ kernel_bundle get_kernel_bundle(const context& ctxt, _Returns:_ A kernel bundle in state [code]#State# which contains all of the device images that are compatible with at least one of the devices in [code]#devs#, further filtered to contain only those device images that contain -at least one of the kernels with the given identifiers. These identifiers may -represent kernels that are defined in the application, device built-in kernels, -or a mixture of the two. Since the device images may group many kernels -together, the returned bundle may contain additional kernels beyond those that -are requested in [code]#kernelIds#. The bundle's set of associated devices is -[code]#devs# (with duplicate devices removed). +at least one of the kernels with the given identifiers. +These identifiers may represent kernels that are defined in the application, +device built-in kernels, or a mixture of the two. +Since the device images may group many kernels together, the returned bundle may +contain additional kernels beyond those that are requested in [code]#kernelIds#. +The bundle's set of associated devices is [code]#devs# (with duplicate devices +removed). Since the implementation may not represent all kernels in [code]#bundle_state::input# or [code]#bundle_state::object#, calling this function with one of those states may return a bundle that is missing some of -the kernels in [code]#kernelIds#. The application can test for this via -[code]#kernel_bundle::has_kernel()#. +the kernels in [code]#kernelIds#. +The application can test for this via [code]#kernel_bundle::has_kernel()#. _Throws:_ - * An [code]#exception# with the [code]#errc::invalid# error code if any of - the kernels identified by [code]#kernelIds# are incompatible with all - devices in [code]#devs#. - * An [code]#exception# with the [code]#errc::invalid# error code if any of - the devices in [code]#devs# is not one of devices contained by the context + * An [code]#exception# with the [code]#errc::invalid# error code if any of the + kernels identified by [code]#kernelIds# are incompatible with all devices in + [code]#devs#. + * An [code]#exception# with the [code]#errc::invalid# error code if any of the + devices in [code]#devs# is not one of devices contained by the context [code]#ctxt# or is not a <> of some device in [code]#ctxt#. * An [code]#exception# with the [code]#errc::invalid# error code if the @@ -15172,25 +15206,27 @@ kernel_bundle get_kernel_bundle(const context& ctxt, ---- _Preconditions:_ The [code]#selector# must be a unary predicate whose return -value is convertible to [code]#bool# and whose parameter is -[code]#const device_image&#. +value is convertible to [code]#bool# and whose parameter is [code]#const +device_image&#. _Effects:_ The predicate function [code]#selector# is called once for every -device image in the application of state [code]#State# which is compatible -with at least one of the devices in [code]#devs#. The function's return value -determines whether a device image is included in the new kernel bundle. The -[code]#selector# is called only for device images that contain kernels defined -in the application, not for device images that contain device built-in kernels. +device image in the application of state [code]#State# which is compatible with +at least one of the devices in [code]#devs#. +The function's return value determines whether a device image is included in the +new kernel bundle. +The [code]#selector# is called only for device images that contain kernels +defined in the application, not for device images that contain device built-in +kernels. _Returns:_ A kernel bundle in state [code]#State# which contains all of the -device images for which the [code]#selector# returns [code]#true#. The -bundle's set of associated devices is [code]#devs# (with duplicate devices +device images for which the [code]#selector# returns [code]#true#. +The bundle's set of associated devices is [code]#devs# (with duplicate devices removed). _Throws:_ - * An [code]#exception# with the [code]#errc::invalid# error code if any of - the devices in [code]#devs# is not one of devices contained by the context + * An [code]#exception# with the [code]#errc::invalid# error code if any of the + devices in [code]#devs# is not one of devices contained by the context [code]#ctxt# or is not a <> of some device in [code]#ctxt#. * An [code]#exception# with the [code]#errc::invalid# error code if the @@ -15199,8 +15235,8 @@ _Throws:_ [code]#State# is [code]#bundle_state::input# and any device in [code]#devs# does not have [code]#aspect::online_compiler#. * An [code]#exception# with the [code]#errc::invalid# error code if - [code]#State# is [code]#bundle_state::object# and any device in - [code]#devs# does not have [code]#aspect::online_linker#. + [code]#State# is [code]#bundle_state::object# and any device in [code]#devs# + does not have [code]#aspect::online_linker#. [NOTE] ==== @@ -15209,10 +15245,11 @@ that allow the application to choose device images based on backend specific criteria. This function does not call the online compiler or linker to translate device -images into state [code]#State#. If the application wants to select specific -device images and also compile or link them into the desired state, it can do -this by calling [code]#compile()# or [code]#link()# and then optionally joining -several bundles together with [code]#join()#. +images into state [code]#State#. +If the application wants to select specific device images and also compile or +link them into the desired state, it can do this by calling [code]#compile()# or +[code]#link()# and then optionally joining several bundles together with +[code]#join()#. ==== [source] @@ -15229,10 +15266,10 @@ kernel_bundle get_kernel_bundle(const context& ctxt, Selector selector); ---- . Equivalent to [code]#get_kernel_bundle(ctxt, ctxt.get_devices())#. - . Equivalent to - [code]#get_kernel_bundle(ctxt, ctxt.get_devices(), kernelIds)#. - . Equivalent to - [code]#get_kernel_bundle(ctxt, ctxt.get_devices(), selector)#. + . Equivalent to [code]#get_kernel_bundle(ctxt, ctxt.get_devices(), + kernelIds)#. + . Equivalent to [code]#get_kernel_bundle(ctxt, ctxt.get_devices(), + selector)#. [source] ---- @@ -15248,21 +15285,22 @@ _Preconditions:_ The template parameter [code]#KernelName# must be the <> of a kernel that is defined in the <>. Since lambda functions have no standard type name, kernels defined as lambda functions must specify a [code]#KernelName# in their -<> in order to use these functions. Applications -which call these functions for a [code]#KernelName# that is not defined are ill -formed, and the implementation must issue a diagnostic in this case. +<> in order to use these functions. +Applications which call these functions for a [code]#KernelName# that is not +defined are ill formed, and the implementation must issue a diagnostic in this +case. - . Equivalent to - [code]#get_kernel_bundle(ctxt, ctxt.get_devices(), {get_kernel_id()})#. - . Equivalent to - [code]#get_kernel_bundle(ctxt, devs, {get_kernel_id()})#. + . Equivalent to [code]#get_kernel_bundle(ctxt, ctxt.get_devices(), + {get_kernel_id()})#. + . Equivalent to [code]#get_kernel_bundle(ctxt, devs, + {get_kernel_id()})#. === Querying if a kernel bundle exists Most overloads of [code]#get_kernel_bundle()# have a matching overload of the -free function [code]#has_kernel_bundle()# which checks to see if a kernel -bundle with the requested characteristics exists. +free function [code]#has_kernel_bundle()# which checks to see if a kernel bundle +with the requested characteristics exists. [source] ---- @@ -15272,18 +15310,18 @@ bool has_kernel_bundle(const context& ctxt, const std::vector& devs); _Returns:_ [code]#true# only if all of the following are true: - * The application defines at least one <> that is compatible with - at least one of the devices in [code]#devs#, and that kernel can be - represented in a device image of state [code]#State#. - * If [code]#State# is [code]#bundle_state::input#, all devices in - [code]#devs# have [code]#aspect::online_compiler#. + * The application defines at least one <> that is compatible with at + least one of the devices in [code]#devs#, and that kernel can be represented + in a device image of state [code]#State#. + * If [code]#State# is [code]#bundle_state::input#, all devices in [code]#devs# + have [code]#aspect::online_compiler#. * If [code]#State# is [code]#bundle_state::object#, all devices in [code]#devs# have [code]#aspect::online_linker#. _Throws:_ - * An [code]#exception# with the [code]#errc::invalid# error code if any of - the devices in [code]#devs# is not one of devices contained by the context + * An [code]#exception# with the [code]#errc::invalid# error code if any of the + devices in [code]#devs# is not one of devices contained by the context [code]#ctxt# or is not a <> of some device in [code]#ctxt#. * An [code]#exception# with the [code]#errc::invalid# error code if the @@ -15300,17 +15338,17 @@ _Returns:_ [code]#true# only if all of the following are true: * Each of the kernels in [code]#kernelIds# can be represented in a device image of state [code]#State#. - * Each of the kernels in [code]#kernelIds# is compatible with at least one - of the devices in [code]#devs#. - * If [code]#State# is [code]#bundle_state::input#, all devices in - [code]#devs# have [code]#aspect::online_compiler#. + * Each of the kernels in [code]#kernelIds# is compatible with at least one of + the devices in [code]#devs#. + * If [code]#State# is [code]#bundle_state::input#, all devices in [code]#devs# + have [code]#aspect::online_compiler#. * If [code]#State# is [code]#bundle_state::object#, all devices in [code]#devs# have [code]#aspect::online_linker#. _Throws:_ - * An [code]#exception# with the [code]#errc::invalid# error code if any of - the devices in [code]#devs# is not one of devices contained by the context + * An [code]#exception# with the [code]#errc::invalid# error code if any of the + devices in [code]#devs# is not one of devices contained by the context [code]#ctxt# or is not a <> of some device in [code]#ctxt#. * An [code]#exception# with the [code]#errc::invalid# error code if the @@ -15327,8 +15365,8 @@ bool has_kernel_bundle(const context& ctxt, ---- . Equivalent to [code]#has_kernel_bundle(ctxt, ctxt.get_devices())#. - . Equivalent to - [code]#has_kernel_bundle(ctxt, ctxt.get_devices(), kernelIds)#. + . Equivalent to [code]#has_kernel_bundle(ctxt, ctxt.get_devices(), + kernelIds)#. [source] ---- @@ -15343,21 +15381,22 @@ _Preconditions:_ The template parameter [code]#KernelName# must be the <> of a kernel that is defined in the <>. Since lambda functions have no standard type name, kernels defined as lambda functions must specify a [code]#KernelName# in their -<> in order to use these functions. Applications -which call these functions for a [code]#KernelName# that is not defined are ill -formed, and the implementation must issue a diagnostic in this case. +<> in order to use these functions. +Applications which call these functions for a [code]#KernelName# that is not +defined are ill formed, and the implementation must issue a diagnostic in this +case. - . Equivalent to - [code]#has_kernel_bundle(ctxt, {get_kernel_id()})#. - . Equivalent to - [code]#has_kernel_bundle(ctxt, devs, {get_kernel_id()})#. + . Equivalent to [code]#has_kernel_bundle(ctxt, + {get_kernel_id()})#. + . Equivalent to [code]#has_kernel_bundle(ctxt, devs, + {get_kernel_id()})#. === Querying if a kernel is compatible with a device The following free functions allow an application to test whether a particular -kernel is compatible with a device. A kernel that is defined in the -application is compatible with a device unless: +kernel is compatible with a device. +A kernel that is defined in the application is compatible with a device unless: * It uses optional features which are not supported on the device, as described in <>; or @@ -15365,10 +15404,10 @@ application is compatible with a device unless: lists an aspect that is not supported by the device, as described in <>; or * The translation unit containing the kernel was compiled in a compilation - environment that does not support the device. Each implementation defines - the specific criteria for which devices are supported in its compilation - environment. For example, this might be dependent on options passed to the - compiler. + environment that does not support the device. + Each implementation defines the specific criteria for which devices are + supported in its compilation environment. + For example, this might be dependent on options passed to the compiler. A device built-in kernel is only compatible with the device for which it is built-in. @@ -15390,22 +15429,23 @@ _Preconditions:_ The template parameter [code]#KernelName# must be the <> of a kernel that is defined in the <>. Since lambda functions have no standard type name, kernels defined as lambda functions must specify a [code]#KernelName# in their -<> in order to use this function. Applications -which call this function for a [code]#KernelName# that is not defined are ill -formed, and the implementation must issue a diagnostic in this case. +<> in order to use this function. +Applications which call this function for a [code]#KernelName# that is not +defined are ill formed, and the implementation must issue a diagnostic in this +case. -Equivalent to -[code]#is_compatible({get_kernel_id()}, dev)#. +Equivalent to [code]#is_compatible({get_kernel_id()}, dev)#. === Joining kernel bundles Two or more kernel bundles of the same state may be joined together into a -single composite bundle. Joining bundles together is not the same as online -compiling or linking because it produces a new bundle in the same state as its -inputs. Rather, joining creates the union of all the devices images from the -input bundles, eliminates duplicate copies of the same device image, and -creates a new bundle from the result. +single composite bundle. +Joining bundles together is not the same as online compiling or linking because +it produces a new bundle in the same state as its inputs. +Rather, joining creates the union of all the devices images from the input +bundles, eliminates duplicate copies of the same device image, and creates a new +bundle from the result. [source] ---- @@ -15414,15 +15454,15 @@ kernel_bundle join(const std::vector>& bundles); ---- _Returns:_ A new kernel bundle that contains a copy of all the device images in -the input [code]#bundles# with duplicates removed. The new bundle has the same -associated context and the same set of associated devices as those in -[code]#bundles#. +the input [code]#bundles# with duplicates removed. +The new bundle has the same associated context and the same set of associated +devices as those in [code]#bundles#. _Throws:_ * An [code]#exception# with the [code]#errc::invalid# error code if the - bundles in [code]#bundles# do not all have the same associated context - or do not all have the same set of associated devices. + bundles in [code]#bundles# do not all have the same associated context or do + not all have the same set of associated devices. [[sec:bundles.compile-link]] @@ -15440,8 +15480,8 @@ or linker by querying a device for [code]#aspect::online_compiler# or [code]#aspect::online_linker#. All of the functions in this section accept a [code]#property_list# parameter, -which can affect the semantics of the compilation or linking operation. The -<> does not currently define any such properties, but vendors may +which can affect the semantics of the compilation or linking operation. +The <> does not currently define any such properties, but vendors may specify these properties as an extension. [source] @@ -15451,13 +15491,13 @@ compile(const kernel_bundle& inputBundle, const std::vector& devs, const property_list& propList = {}); ---- -_Effects:_ The device images from [code]#inputBundle# are translated into one -or more new device images of state [code]#bundle_state::object#, and a new -kernel bundle is created to contain these new device images. The new bundle -represents all of the <> in [code]#inputBundles# that are -compatible with at least one of the devices in [code]#devs#. Any remaining -kernels (those that are not compatible with any of the devices [code]#devs#) -are not compiled and not represented in the new kernel bundle. +_Effects:_ The device images from [code]#inputBundle# are translated into one or +more new device images of state [code]#bundle_state::object#, and a new kernel +bundle is created to contain these new device images. +The new bundle represents all of the <> in [code]#inputBundles# +that are compatible with at least one of the devices in [code]#devs#. +Any remaining kernels (those that are not compatible with any of the devices +[code]#devs#) are not compiled and not represented in the new kernel bundle. The new bundle has the same associated context as [code]#inputBundle#, and the new bundle's set of associated devices is [code]#devs# (with duplicate devices @@ -15467,8 +15507,8 @@ _Returns:_ The new kernel bundle. _Throws:_ - * An [code]#exception# with the [code]#errc::invalid# error code if any of - the devices in [code]#devs# are not in the set of associated devices for + * An [code]#exception# with the [code]#errc::invalid# error code if any of the + devices in [code]#devs# are not in the set of associated devices for [code]#inputBundle# (as defined by [code]#kernel_bundle::get_devices()#) or if the [code]#devs# vector is empty. * An [code]#exception# with the [code]#errc::build# error code if the online @@ -15481,15 +15521,15 @@ link(const std::vector>& objectBundles, const std::vector& devs, const property_list& propList = {}); ---- -_Effects:_ Duplicate device images from [code]#objectBundles# are eliminated -as though they were joined via [code]#join()#, then the remaining device images -are translated into one or more new device images of state +_Effects:_ Duplicate device images from [code]#objectBundles# are eliminated as +though they were joined via [code]#join()#, then the remaining device images are +translated into one or more new device images of state [code]#bundle_state::executable#, and a new kernel bundle is created to contain -these new device images. The new bundle represents all of the -<> in [code]#objectBundles# that are compatible with at least -one of the devices in [code]#devs#. Any remaining kernels (those that are not -compatible with any of the devices in [code]#devs#) are not linked and not -represented in the new bundle. +these new device images. +The new bundle represents all of the <> in [code]#objectBundles# +that are compatible with at least one of the devices in [code]#devs#. +Any remaining kernels (those that are not compatible with any of the devices in +[code]#devs#) are not linked and not represented in the new bundle. The new bundle has the same associated context as those in [code]#objectBundles#, and the new bundle's set of associated devices is @@ -15502,9 +15542,9 @@ _Throws:_ * An [code]#exception# with the [code]#errc::invalid# error code if the bundles in [code]#objectBundles# do not all have the same associated context. - * An [code]#exception# with the [code]#errc::invalid# error code if any of - the devices in [code]#devs# are not in the set of associated devices for - any of the bundles in [code]#objectBundles# (as defined by + * An [code]#exception# with the [code]#errc::invalid# error code if any of the + devices in [code]#devs# are not in the set of associated devices for any of + the bundles in [code]#objectBundles# (as defined by [code]#kernel_bundle::get_devices()#) or if the [code]#devs# vector is empty. * An [code]#exception# with the [code]#errc::build# error code if the online @@ -15519,14 +15559,15 @@ build(const kernel_bundle& inputBundle, _Effects:_ This function performs both an online compile and link operation, translating a kernel bundle of state [code]#bundle_state::input# into a bundle -of state [code]#bundle_state::executable#. The device images from -[code]#inputBundle# are translated into one or more new device images of state -[code]#bundle_state::executable#, and a new bundle is created to contain these -new device images. The new bundle represents all of the <> in -[code]#inputBundle# that are compatible with at least one of the devices in -[code]#devs#. Any remaining kernels (those that are not compatible with any of -the devices [code]#devs#) are not compiled or linked and are not represented in -the new bundle. +of state [code]#bundle_state::executable#. +The device images from [code]#inputBundle# are translated into one or more new +device images of state [code]#bundle_state::executable#, and a new bundle is +created to contain these new device images. +The new bundle represents all of the <> in [code]#inputBundle# +that are compatible with at least one of the devices in [code]#devs#. +Any remaining kernels (those that are not compatible with any of the devices +[code]#devs#) are not compiled or linked and are not represented in the new +bundle. The new bundle has the same associated context as [code]#inputBundle#, and the new bundle's set of associated devices is [code]#devs# (with duplicate devices @@ -15536,8 +15577,8 @@ _Returns:_ The new kernel bundle. _Throws:_ - * An [code]#exception# with the [code]#errc::invalid# error code if any of - the devices in [code]#devs# are not in the set of associated devices for + * An [code]#exception# with the [code]#errc::invalid# error code if any of the + devices in [code]#devs# are not in the set of associated devices for [code]#inputBundle# (as defined by [code]#kernel_bundle::get_devices()#) or if the [code]#devs# vector is empty. * An [code]#exception# with the [code]#errc::build# error code if the online @@ -15566,8 +15607,8 @@ build(const kernel_bundle& inputBundle, const property_list& propList = {}); ---- - . Equivalent to - [code]#compile(inputBundle, inputBundle.get_devices(), propList)#. + . Equivalent to [code]#compile(inputBundle, inputBundle.get_devices(), + propList)#. . Equivalent to [code]#link({objectBundle}, devs, propList)#. @@ -15575,25 +15616,26 @@ build(const kernel_bundle& inputBundle, [code]#devs# is the intersection of associated devices in common for all bundles in [code]#objectBundles#. - . Equivalent to - [code]#link({objectBundle}, objectBundle.get_devices(), propList)#. + . Equivalent to [code]#link({objectBundle}, objectBundle.get_devices(), + propList)#. - . Equivalent to - [code]#build(inputBundle, inputBundle.get_devices(), propList)#. + . Equivalent to [code]#build(inputBundle, inputBundle.get_devices(), + propList)#. === The [code]#kernel_bundle# class -A synopsis of the [code]#kernel_bundle# class is shown below. Additionally, -this class provides the common special member functions and common member -functions that are listed in <> in +A synopsis of the [code]#kernel_bundle# class is shown below. +Additionally, this class provides the common special member functions and common +member functions that are listed in <> in <> and <>, respectively. As with all SYCL objects that have the common reference semantics, kernel -bundles are equality comparable. Two bundles of the same <> are -considered to be equal if they are associated with the same context, have the -same set of associated devices, and contain the same set of device images. +bundles are equality comparable. +Two bundles of the same <> are considered to be equal if they are +associated with the same context, have the same set of associated devices, and +contain the same set of device images. There is no public default constructor for this class. @@ -15660,9 +15702,10 @@ _Preconditions:_ The template parameter [code]#KernelName# must be the <> of a kernel that is defined in the <>. Since lambda functions have no standard type name, kernels defined as lambda functions must specify a [code]#KernelName# in their -<> in order to use these functions. Applications -which call these functions for a [code]#KernelName# that is not defined are ill -formed, and the implementation must issue a diagnostic in this case. +<> in order to use these functions. +Applications which call these functions for a [code]#KernelName# that is not +defined are ill formed, and the implementation must issue a diagnostic in this +case. . _Returns:_ [code]#true# only if the kernel bundle contains the kernel identified by [code]#KernelName#. @@ -15675,8 +15718,8 @@ formed, and the implementation must issue a diagnostic in this case. std::vector get_kernel_ids() const; ---- -_Returns:_ A vector of the identifiers for all kernels that are contained in -the kernel bundle. +_Returns:_ A vector of the identifiers for all kernels that are contained in the +kernel bundle. [source] ---- @@ -15691,8 +15734,8 @@ _Returns:_ A [code]#kernel# object representing the kernel identified by _Throws:_ - * An [code]#exception# with the [code]#errc::invalid# error code if the - kernel bundle does not contain the kernel identified by [code]#kernelId#. + * An [code]#exception# with the [code]#errc::invalid# error code if the kernel + bundle does not contain the kernel identified by [code]#kernelId#. [source] ---- @@ -15700,11 +15743,12 @@ template kernel get_kernel() const; ---- _Preconditions:_ This member function is only available if the kernel bundle's -state is [code]#bundle_state::executable#. The template parameter -[code]#KernelName# must be the <> of a kernel that is defined -in the <>. Since lambda functions have no standard type -name, kernels defined as lambda functions must specify a [code]#KernelName# in -their <> in order to use this function. +state is [code]#bundle_state::executable#. +The template parameter [code]#KernelName# must be the <> of a +kernel that is defined in the <>. +Since lambda functions have no standard type name, kernels defined as lambda +functions must specify a [code]#KernelName# in their +<> in order to use this function. Applications which call this function for a [code]#KernelName# that is not defined are ill formed, and the implementation must issue a diagnostic in this case. @@ -15714,28 +15758,30 @@ _Returns:_ A [code]#kernel# object representing the kernel identified by _Throws:_ - * An [code]#exception# with the [code]#errc::invalid# error code if the - kernel bundle does not contain the kernel identified by [code]#KernelName#. + * An [code]#exception# with the [code]#errc::invalid# error code if the kernel + bundle does not contain the kernel identified by [code]#KernelName#. ==== Specialization constant support The following member functions allow an application to manipulate -<> that are used in the -device images of a <>. Applications can set the value of -specialization constants in a kernel bundle whose state is -[code]#bundle_state::input# and then online compile that bundle into -[code]#bundle_state::object# or [code]#bundle_state::executable#. The value of -the specialization constants then become fixed in the compiled bundle and -cannot be changed. Specialization constants that have not had their values set -by the time the bundle is compiled take their default values. +<> that are used in the device +images of a <>. +Applications can set the value of specialization constants in a kernel bundle +whose state is [code]#bundle_state::input# and then online compile that bundle +into [code]#bundle_state::object# or [code]#bundle_state::executable#. +The value of the specialization constants then become fixed in the compiled +bundle and cannot be changed. +Specialization constants that have not had their values set by the time the +bundle is compiled take their default values. [NOTE] ==== It is expected that many implementations will use an intermediate language -representation for a bundle in state [code]#bundle_state::input# such as -SPIR-V, and the intermediate language will have native support for -specialization constants. However, implementations that do not have such -native support must still support specialization constants in some other way. +representation for a bundle in state [code]#bundle_state::input# such as SPIR-V, +and the intermediate language will have native support for specialization +constants. +However, implementations that do not have such native support must still support +specialization constants in some other way. ==== [source] @@ -15752,9 +15798,9 @@ bool native_specialization_constant() const noexcept; ---- _Returns:_ [code]#true# only if the kernel bundle contains at least one device -image which uses a specialization constant and all specialization constants -used in all of the bundle's device images are -<>. +image which uses a specialization constant and all specialization constants used +in all of the bundle's device images are <>. [source] ---- @@ -15775,12 +15821,14 @@ _Preconditions:_ This member function is only available if the kernel bundle's state is [code]#bundle_state::input#. _Effects:_ Sets the value of the <> whose address is -[code]#SpecName# for this bundle. If the specialization constant's value was -previously set in this bundle, the value is overwritten. +[code]#SpecName# for this bundle. +If the specialization constant's value was previously set in this bundle, the +value is overwritten. -The new value applies to all device images in the bundle. It is allowed to set -the value of a specialization constant even if no device image in the bundle -uses it; doing so has no effect on the execution of kernels from that bundle. +The new value applies to all device images in the bundle. +It is allowed to set the value of a specialization constant even if no device +image in the bundle uses it; doing so has no effect on the execution of kernels +from that bundle. [source] ---- @@ -15790,14 +15838,17 @@ get_specialization_constant() const; ---- _Returns:_ The value of the <> whose address is -[code]#SpecName# for this kernel bundle. The value returned is as follows: +[code]#SpecName# for this kernel bundle. +The value returned is as follows: * If the value of this specialization constant was previously set in this - bundle, that value is returned. Otherwise, + bundle, that value is returned. + Otherwise, -* If this bundle is the result of compiling, linking or joining another - bundle and this specialization constant was set in that other bundle prior - to compiling, linking or joining; then that value is returned. Otherwise, +* If this bundle is the result of compiling, linking or joining another bundle + and this specialization constant was set in that other bundle prior to + compiling, linking or joining; then that value is returned. + Otherwise, * The specialization constant's default value is returned. @@ -15812,9 +15863,9 @@ using device_image_iterator = __unspecified__; ---- An iterator type that satisfies the {cpp} requirements of -[code]#LegacyForwardIterator#. The iterator's referenced type is -[code]#const device_image#, where [code]#State# is the same state as the -containing [code]#kernel_bundle#. +[code]#LegacyForwardIterator#. +The iterator's referenced type is [code]#const device_image#, where +[code]#State# is the same state as the containing [code]#kernel_bundle#. [source] ---- @@ -15822,17 +15873,17 @@ device_image_iterator begin() const; // (1) device_image_iterator end() const; // (2) ---- - . _Returns:_ An iterator to the first <> contained by the - kernel bundle. - . _Returns:_ An iterator to one past the last <> contained by + . _Returns:_ An iterator to the first <> contained by the kernel + bundle. + . _Returns:_ An iterator to one past the last <> contained by the kernel bundle. === The [code]#kernel# class -A synopsis of the [code]#kernel# class is shown below. Additionally, -this class provides the common special member functions and common member -functions that are listed in <> in +A synopsis of the [code]#kernel# class is shown below. +Additionally, this class provides the common special member functions and common +member functions that are listed in <> in <> and <>, respectively. @@ -15897,8 +15948,8 @@ on the device [code]#dev#. _Throws:_ - * An [code]#exception# with the [code]#errc::invalid# error code if the - kernel is not compatible with device [code]#dev# (as defined by + * An [code]#exception# with the [code]#errc::invalid# error code if the kernel + is not compatible with device [code]#dev# (as defined by [code]#is_compatible()#). [source] @@ -15909,8 +15960,8 @@ template typename Param::return_type get_backend_info() const; _Preconditions:_ The [code]#Param# must be one of a descriptor defined by a <> specification. -_Returns:_ Backend specific information about the kernel that is not specific -to the device on which it is invoked. +_Returns:_ Backend specific information about the kernel that is not specific to +the device on which it is invoked. _Throws:_ @@ -15920,8 +15971,8 @@ _Throws:_ ==== Kernel information descriptors -A <> can be queried for information using the [code]#get_info()# -member function, specifying one of the info parameters in [code]#info::kernel#. +A <> can be queried for information using the [code]#get_info()# member +function, specifying one of the info parameters in [code]#info::kernel#. All info parameters in [code]#info::kernel# are specified in <> and the synopsis for [code]#info::kernel# is described in <>. @@ -15961,10 +16012,10 @@ info::kernel::attributes A <> can also be queried for device specific information using the [code]#get_info()# member function, specifying one of the info parameters in -[code]#info::kernel_device_specific#. All info parameters in -[code]#info::kernel_device_specific# are specified in -<>. The synopsis for -[code]#info::kernel_device_specific# is described in +[code]#info::kernel_device_specific#. +All info parameters in [code]#info::kernel_device_specific# are specified in +<>. +The synopsis for [code]#info::kernel_device_specific# is described in <>. [[table.kernel.devicespecificinfo]] @@ -16074,9 +16125,9 @@ info::kernel_device_specific::compile_sub_group_size === The [code]#device_image# class -A synopsis of the [code]#device_image# class is shown below. Additionally, -this class provides the common special member functions and common member -functions that are listed in <> in +A synopsis of the [code]#device_image# class is shown below. +Additionally, this class provides the common special member functions and common +member functions that are listed in <> in <> and <>, respectively. @@ -16104,17 +16155,18 @@ bool has_kernel(const kernel_id& kernelId, === Example usage This section provides some examples showing typical use cases for kernel -bundles. These examples are intended to clarify the definition of the kernel -bundle interfaces, but the content of this section is non-normative. +bundles. +These examples are intended to clarify the definition of the kernel bundle +interfaces, but the content of this section is non-normative. ==== Controlling the timing of online compilation In some cases an application may want to pre-compile its kernels before -submitting them to a device. This gives the application control over when the -overhead of online compilation happens, rather than relying on the default -behavior (which may cause the online compilation to happen at the point when -the kernel is submitted to a device). The following example shows how this can -be achieved. +submitting them to a device. +This gives the application control over when the overhead of online compilation +happens, rather than relying on the default behavior (which may cause the online +compilation to happen at the point when the kernel is submitted to a device). +The following example shows how this can be achieved. [source,,linenums] ---- @@ -16153,45 +16205,49 @@ include::{code_dir}/bundle-builtin-kernel.cpp[lines=4..-1] == Defining kernels -In SYCL, functions that are executed on a SYCL device are referred to -as <>. A <> containing such a -<> is enqueued on a device queue in order to -be executed on that particular device. +In SYCL, functions that are executed on a SYCL device are referred to as +<>. +A <> containing such a <> is enqueued on a device +queue in order to be executed on that particular device. The return type of the <> is [code]#void#, and all memory accesses between host and device are through <> or through <>. -There are two ways of defining kernels: as named function objects or as -lambda functions. A backend may also provide interoperability interfaces for -defining kernels. +There are two ways of defining kernels: as named function objects or as lambda +functions. +A backend may also provide interoperability interfaces for defining kernels. [[sec:interfaces.kernels.as.function-objects]] === Defining kernels as named function objects -A kernel can be defined as a named function object type. These function objects -provide the same functionality as any {cpp} function object, with the -restriction that they need to follow SYCL rules to be <>. -The kernel function can be templated via templating the kernel -function object type. For details on restrictions for kernel naming, -please refer to <>. +A kernel can be defined as a named function object type. +These function objects provide the same functionality as any {cpp} function +object, with the restriction that they need to follow SYCL rules to be +<>. +The kernel function can be templated via templating the kernel function object +type. +For details on restrictions for kernel naming, please refer to +<>. The [code]#operator()# member function must be const-qualified, and it may take different parameters depending on the data accesses defined for the specific -kernel. If the [code]#operator()# function writes to any of the member variables, -the behavior is undefined. - -The following example defines a <>, -_RandomFiller_, which initializes a buffer with a random number. The -random number is generated during the construction of the function object -while processing the command group. The [code]#operator()# member -function of the function object receives an [code]#item# object. This -member function will be called for each work-item of the execution range. The value -of the random number will be assigned to each element of the buffer. In this -case, the accessor and the scalar random number are members of the function -object and therefore will be arguments to the device kernel. Usual -restrictions of passing arguments to kernels apply. +kernel. +If the [code]#operator()# function writes to any of the member variables, the +behavior is undefined. + +The following example defines a <>, _RandomFiller_, which +initializes a buffer with a random number. +The random number is generated during the construction of the function object +while processing the command group. +The [code]#operator()# member function of the function object receives an +[code]#item# object. +This member function will be called for each work-item of the execution range. +The value of the random number will be assigned to each element of the buffer. +In this case, the accessor and the scalar random number are members of the +function object and therefore will be arguments to the device kernel. +Usual restrictions of passing arguments to kernels apply. [source,,linenums] ---- @@ -16202,17 +16258,19 @@ include::{code_dir}/myfunctor.cpp[lines=4..-1] [[sec:interfaces.kernels.as.lambdas]] === Defining kernels as lambda functions -In {cpp}, function objects can be defined using lambda functions. Kernels may be -defined as lambda functions in SYCL. The name of a lambda function -in SYCL may optionally be specified by passing it as a template parameter to the invoking -member function, and in that case, the lambda name is a [keyword]#{cpp} typename# which must -be forward declarable at namespace scope. If the lambda -function relies on template arguments, then if specified, -the name of the lambda function must contain those template arguments which must -also be forward declarable at namespace scope. The -class used for the name of a lambda function is only used for naming purposes -and is not required to be defined. For details on restrictions for kernel -naming, please refer to <>. +In {cpp}, function objects can be defined using lambda functions. +Kernels may be defined as lambda functions in SYCL. +The name of a lambda function in SYCL may optionally be specified by passing it +as a template parameter to the invoking member function, and in that case, the +lambda name is a [keyword]#{cpp} typename# which must be forward declarable at +namespace scope. +If the lambda function relies on template arguments, then if specified, the name +of the lambda function must contain those template arguments which must also be +forward declarable at namespace scope. +The class used for the name of a lambda function is only used for naming +purposes and is not required to be defined. +For details on restrictions for kernel naming, please refer to +<>. The kernel function for the lambda function is the lambda function itself. The kernel lambda must use copy for all of its captures (i.e. [code]#[=]#), and @@ -16223,9 +16281,9 @@ the lambda must not use the [code]#mutable# specifier. include::{code_dir}/mykernel.cpp[lines=4..-1] ---- -Explicit lambda naming is shown in the following code example, -including an illegal case that uses a class within the kernel -name which is not forward declarable ([code]#std::complex#). +Explicit lambda naming is shown in the following code example, including an +illegal case that uses a class within the kernel name which is not forward +declarable ([code]#std::complex#). [source,,linenums] ---- @@ -16249,17 +16307,16 @@ that a type [code]#T# is <>. * [code]#is_device_copyable# must meet the Cpp17UnaryTrait requirements. * If [code]#is_device_copyable# is specialized such that - [code]#is_device_copyable_v == true# on a [code]#T# that does not - satisfy all the requirements of a device copyable type, the results are - unspecified. + [code]#is_device_copyable_v == true# on a [code]#T# that does not satisfy + all the requirements of a device copyable type, the results are unspecified. If the application defines a type [code]#UDT# that satisfies the requirements of a <> type (as defined in <>) but the type is not implicitly device copyable as defined in that section, then the application must provide a specialization of [code]#is_device_copyable# that derives from [code]#std:true_type# in order to use that type in a context that -requires a device copyable type. Such a specialization can be declared like -this: +requires a device copyable type. +Such a specialization can be declared like this: .... template<> @@ -16267,20 +16324,21 @@ struct sycl::is_device_copyable : std::true_type {}; .... It is legal to provide this specialization even if the implementation does not -define [code]#SYCL_DEVICE_COPYABLE# to [code]#1#, but the type cannot be used -as a device copyable type in that case and the specialization is ignored. +define [code]#SYCL_DEVICE_COPYABLE# to [code]#1#, but the type cannot be used as +a device copyable type in that case and the specialization is ignored. [[sec:kernel.parameter.passing]] === Rules for parameter passing to kernels A SYCL application passes parameters to a kernel in different ways depending on -whether the kernel is a named function object or a lambda function. If the -kernel is a named function object, the [code]#operator()# member function (or -other member functions that it calls) may reference member variables inside the -same named function object. Any such member variables become parameters to the -kernel. If the kernel is a lambda function, any variables captured by the -lambda become parameters to the kernel. +whether the kernel is a named function object or a lambda function. +If the kernel is a named function object, the [code]#operator()# member function +(or other member functions that it calls) may reference member variables inside +the same named function object. +Any such member variables become parameters to the kernel. +If the kernel is a lambda function, any variables captured by the lambda become +parameters to the kernel. Regardless of how the parameter is passed, the following rules define the allowable types for a kernel parameter: @@ -16307,9 +16365,8 @@ allowable types for a kernel parameter: a legal parameter type. * A class type [code]#S# with a non-static member variable of type [code]#T# is - a legal parameter type if [code]#T# is a legal parameter type and if - [code]#S# would otherwise be a legal parameter type aside from this member - variable. + a legal parameter type if [code]#T# is a legal parameter type and if [code]#S# + would otherwise be a legal parameter type aside from this member variable. * A class type [code]#S# with a non-virtual base class of type [code]#T# is a legal parameter type if [code]#T# is a legal parameter type and if [code]#S# @@ -16318,20 +16375,21 @@ allowable types for a kernel parameter: [NOTE] ==== Pointer types are trivially copyable, so they may be passed as kernel -parameters. However, only the pointer value itself is passed to the kernel. -Dereferencing the pointer on the kernel results in undefined behavior unless -the pointer points to an address within a <> memory region that is -accessible on the device. +parameters. +However, only the pointer value itself is passed to the kernel. +Dereferencing the pointer on the kernel results in undefined behavior unless the +pointer points to an address within a <> memory region that is accessible +on the device. -Reference types are not trivially copyable, so they may not be passed as -kernel parameters. +Reference types are not trivially copyable, so they may not be passed as kernel +parameters. ==== [NOTE] ==== The [code]#reducer# class is a special type of kernel parameter which is passed -to a kernel in a different way. <> describes how this parameter -type is used. +to a kernel in a different way. +<> describes how this parameter type is used. ==== // %%%%%%%%%%%%%%%%%%%%%%%%%%%% end expressingParallelism %%%%%%%%%%%%%%%%%%%%%%%%%%%% @@ -16342,17 +16400,18 @@ type is used. === Error handling rules -Error handling in a SYCL application (host code) uses {cpp} exceptions. If an error -occurs, it will be thrown by the API function call and may be caught by the user -through standard {cpp} exception handling mechanisms. - -SYCL applications are asynchronous in the sense that host and device code executions -are decoupled from one another except at specific points. For example, device code -executions often begin when dependencies in the SYCL task graph are satisfied, which -occurs asynchronously from host code execution. As a result of this the errors -that occur on a device cannot be thrown directly from a host API call, because the call -enqueueing a device action has typically already returned by the time that the error -occurs. Such errors are not detected until the error-causing task executes or tries to +Error handling in a SYCL application (host code) uses {cpp} exceptions. +If an error occurs, it will be thrown by the API function call and may be caught +by the user through standard {cpp} exception handling mechanisms. + +SYCL applications are asynchronous in the sense that host and device code +executions are decoupled from one another except at specific points. +For example, device code executions often begin when dependencies in the SYCL +task graph are satisfied, which occurs asynchronously from host code execution. +As a result of this the errors that occur on a device cannot be thrown directly +from a host API call, because the call enqueueing a device action has typically +already returned by the time that the error occurs. +Such errors are not detected until the error-causing task executes or tries to execute, and we refer to these as <>. @@ -16362,41 +16421,44 @@ execute, and we refer to these as <>. The queue and context classes can optionally take an asynchronous handler object <> on construction, which is a callable such as a function class or lambda, with an [code]#exception_list# as a parameter. -Invocation of an <> may be triggered by the queue member functions -[code]#queue::wait_and_throw()# or [code]#queue::throw_asynchronous()#, by -the event member function [code]#event::wait_and_throw()#, or -automatically on destruction of a queue or context that contains unconsumed -asynchronous errors. When invoked, an <> is called and receives an -[code]#exception_list# argument containing a list of exception objects representing -any unconsumed <> associated with the queue or context. +Invocation of an <> may be triggered by the queue member +functions [code]#queue::wait_and_throw()# or +[code]#queue::throw_asynchronous()#, by the event member function +[code]#event::wait_and_throw()#, or automatically on destruction of a queue or +context that contains unconsumed asynchronous errors. +When invoked, an <> is called and receives an +[code]#exception_list# argument containing a list of exception objects +representing any unconsumed <> associated with +the queue or context. When an <> instance has been passed to an <>, then that instance of the error has been consumed for handling and is not reported on any subsequent invocations of the <>. -The <> may be a named function object type, a lambda -function or a [code]#std::function#. The [code]#exception_list# -object passed to the <> is constructed by the <>. +The <> may be a named function object type, a lambda function or +a [code]#std::function#. +The [code]#exception_list# object passed to the <> is constructed +by the <>. [[subsubsec:exception.nohandler]] ==== Behavior without an async handler If an asynchronous error occurs in a queue or context that has no user-supplied -asynchronous error handler object <>, then an implementation-defined -default <> is called to handle the error in the same situations that -a user-supplied <> would be, as defined in -<>. The default <> must in some way -report all errors passed to it, when possible, and must then invoke -[code]#std::terminate# or equivalent. +asynchronous error handler object <>, then an +implementation-defined default <> is called to handle the error +in the same situations that a user-supplied <> would be, as +defined in <>. +The default <> must in some way report all errors passed to it, +when possible, and must then invoke [code]#std::terminate# or equivalent. ==== Priorities of async handlers If the SYCL runtime can associate an <> with a specific queue, then: - * If the queue was constructed with an <>, that handler - is invoked to handle the error. + * If the queue was constructed with an <>, that handler is + invoked to handle the error. * Otherwise if the context enclosed by the queue was constructed with an <>, that handler is invoked to handle the error. * Otherwise when no handler was passed to either queue or context on @@ -16420,31 +16482,33 @@ then: ==== Asynchronous errors with a secondary queue If an <> occurs when running or enqueuing a command group which has -a secondary queue specified, then the command group may be enqueued -to the secondary queue instead of the primary queue. The error handling in this -case is also configured using the <> provided for both -queues. If there is no <> given on any of the queues, -then the asynchronous error handling proceeds through the contexts -associated with the queues, and if they were also constructed without -<>s, then the default handler will be used. -If the primary queue fails and there is an <> given at -this queue's construction, which populates the [code]#exception_list# -parameter, then any errors will be added and can be thrown whenever the user -chooses to handle those exceptions. Since there were errors on the primary -queue and a secondary queue was given, then the execution of the kernel is -re-scheduled to the secondary queue and any error reporting for the kernel -execution on that queue is done through that queue, in the same way as -described above. The secondary queue may fail as well, and the errors will be -thrown if there is an <> and either -[code]#wait_and_throw()# or [code]#throw()# are called on that queue. If no -<> was specified, then the one associated with the queue's context -will be used and if the context was also constructed without an <>, -then the default handler will be used. +a secondary queue specified, then the command group may be enqueued to the +secondary queue instead of the primary queue. +The error handling in this case is also configured using the <> +provided for both queues. +If there is no <> given on any of the queues, then the +asynchronous error handling proceeds through the contexts associated with the +queues, and if they were also constructed without <>s, then the +default handler will be used. +If the primary queue fails and there is an <> given at this +queue's construction, which populates the [code]#exception_list# parameter, then +any errors will be added and can be thrown whenever the user chooses to handle +those exceptions. +Since there were errors on the primary queue and a secondary queue was given, +then the execution of the kernel is re-scheduled to the secondary queue and any +error reporting for the kernel execution on that queue is done through that +queue, in the same way as described above. +The secondary queue may fail as well, and the errors will be thrown if there is +an <> and either [code]#wait_and_throw()# or [code]#throw()# are +called on that queue. +If no <> was specified, then the one associated with the queue's +context will be used and if the context was also constructed without an +<>, then the default handler will be used. The <> event returned by that function will be relevant to the queue where the kernel has been enqueued. -Below is an example of catching a SYCL [code]#exception# and printing out -the error message. +Below is an example of catching a SYCL [code]#exception# and printing out the +error message. [source,,linenums] ---- @@ -16468,18 +16532,18 @@ include::{code_dir}/handlingErrorCode.cpp[lines=4..-1] include::{header_dir}/exception.h[lines=4..-1] ---- -The SYCL [code]#exception_list# -class is also available in order to provide a list of synchronous and -asynchronous exceptions. +The SYCL [code]#exception_list# class is also available in order to provide a +list of synchronous and asynchronous exceptions. Errors can occur both in the SYCL library and SYCL host side, or may come -directly from a <>. The member functions on these exceptions provide the -corresponding information. -<> can provide additional exception class objects as long as they derive -from [code]#sycl::exception# object, or any of its derived classes. +directly from a <>. +The member functions on these exceptions provide the corresponding information. +<> can provide additional exception class objects as +long as they derive from [code]#sycl::exception# object, or any of its derived +classes. -A specialization of [code]#std::is_error_code_enum# must be defined -for [code]#sycl::errc# that inherits from [code]#std::true_type#. +A specialization of [code]#std::is_error_code_enum# must be defined for +[code]#sycl::errc# that inherits from [code]#std::true_type#. [[table.members.exception]] @@ -16802,22 +16866,23 @@ std::error_code make_error_code(errc e) noexcept; == Data types SYCL as a {cpp} programming model supports the {cpp} core language data types, -and it also provides the ability for all SYCL applications to be executed on SYCL -compatible devices. The scalar and vector data types that -are supported by the SYCL system are defined below. More details about the SYCL -device compiler support for fundamental and backend interoperability types are found -in <>. +and it also provides the ability for all SYCL applications to be executed on +SYCL compatible devices. +The scalar and vector data types that are supported by the SYCL system are +defined below. +More details about the SYCL device compiler support for fundamental and backend +interoperability types are found in <>. === Scalar data types The fundamental {cpp} data types which are supported in SYCL are described in -<>. Note these types are fundamental and therefore -do not exist within the [code]#sycl# namespace. +<>. +Note these types are fundamental and therefore do not exist within the +[code]#sycl# namespace. -Additional scalar data types which are supported by SYCL within the -[code]#sycl# namespace are described in -<>. +Additional scalar data types which are supported by SYCL within the [code]#sycl# +namespace are described in <>. [[table.types.additional]] @@ -16853,40 +16918,40 @@ half [[sec:vector.type]] === Vector types -SYCL provides a cross-platform class template that works -efficiently on SYCL devices as well as in host {cpp} code. This type -allows sharing of vectors between the host and its SYCL devices. The -vector supports member functions that allow construction of a new vector from a -swizzled set of component elements. - -[code]#vec# -is a vector type -that compiles down to a <> built-in vector types on SYCL devices, -where possible, and provides compatible support on the host or when it is -not possible. The [code]#vec# class is templated on its number of -elements and its element type. The number of elements parameter, -_NumElements_, can be one of: 1, 2, 3, 4, 8 or 16. Any other value shall -produce a compilation failure. The element type parameter, _DataT_, must -be one of the basic scalar types supported in device code. +SYCL provides a cross-platform class template that works efficiently on SYCL +devices as well as in host {cpp} code. +This type allows sharing of vectors between the host and its SYCL devices. +The vector supports member functions that allow construction of a new vector +from a swizzled set of component elements. + +[code]#vec# is a vector type that compiles +down to a <> built-in vector types on SYCL devices, where possible, and +provides compatible support on the host or when it is not possible. +The [code]#vec# class is templated on its number of elements and its element +type. +The number of elements parameter, _NumElements_, can be one of: 1, 2, 3, 4, 8 or 16. +Any other value shall produce a compilation failure. +The element type parameter, _DataT_, must be one of the basic scalar types +supported in device code. The SYCL [code]#vec# class template provides interoperability with the -underlying vector type defined by [code]#vector_t# which is -available only when compiled for the device. The SYCL [code]#vec# class can -be constructed from an instance of [code]#vector_t# and can implicitly -convert to an instance of [code]#vector_t# in order to support -interoperability with native <> functions from a SYCL kernel function. - -An instance of the SYCL [code]#vec# class template can also be -implicitly converted to an instance of the data type when the number of -elements is [code]#1# in order to allow single element vectors and -scalars to be convertible with each other. +underlying vector type defined by [code]#vector_t# which is available only when +compiled for the device. +The SYCL [code]#vec# class can be constructed from an instance of +[code]#vector_t# and can implicitly convert to an instance of [code]#vector_t# +in order to support interoperability with native <> functions from a +SYCL kernel function. + +An instance of the SYCL [code]#vec# class template can also be implicitly +converted to an instance of the data type when the number of elements is +[code]#1# in order to allow single element vectors and scalars to be convertible +with each other. ==== Vec interface The constructors, member functions and non-member functions of the SYCL -[code]#vec# class template are listed in -<>, <> and -<> respectively. +[code]#vec# class template are listed in <>, +<> and <> respectively. // Interface for class: vec [source,,linenums] @@ -17529,95 +17594,92 @@ The SYCL programming API provides all permutations of the type alias: [code]#+using = vec<, >+# -where [code]## is [code]#2#, [code]#3#, [code]#4#, -[code]#8# and [code]#16#, and pairings of [code]## and -[code]## for integral types are [code]#char# and -[code]#int8_t#, [code]#uchar# and [code]#uint8_t#, -[code]#short# and [code]#int16_t#, [code]#ushort# and -[code]#uint16_t#, [code]#int# and [code]#int32_t#, -[code]#uint# and [code]#uint32_t#, [code]#long# and -[code]#int64_t#, [code]#ulong# and [code]#uint64_t#, and for -floating point types are both [code]#half#, [code]#float# and -[code]#double#. +where [code]## is [code]#2#, [code]#3#, [code]#4#, [code]#8# and +[code]#16#, and pairings of [code]## and [code]## for +integral types are [code]#char# and [code]#int8_t#, [code]#uchar# and +[code]#uint8_t#, [code]#short# and [code]#int16_t#, [code]#ushort# and +[code]#uint16_t#, [code]#int# and [code]#int32_t#, [code]#uint# and +[code]#uint32_t#, [code]#long# and [code]#int64_t#, [code]#ulong# and +[code]#uint64_t#, and for floating point types are both [code]#half#, +[code]#float# and [code]#double#. -For example [code]#uint4# is the alias to [code]#vec# -and [code]#float16# is the alias to [code]#vec#. +For example [code]#uint4# is the alias to [code]#vec# and +[code]#float16# is the alias to [code]#vec#. ==== Swizzles -Swizzle operations can be performed in two ways. Firstly by calling the -[code]#swizzle# member function template, which takes a variadic number -of integer template arguments between [code]#0# and -[code]#NumElements-1#, specifying swizzle indexes. Secondly by calling -one of the simple swizzle member functions defined in -<> as [code]#XYZW_SWIZZLE# and -[code]#RGBA_SWIZZLE#. Note that the simple swizzle functions are only -available for up to 4 element vectors and are only available when the macro -[code]#SYCL_SIMPLE_SWIZZLES# is defined before including -[code]##. +Swizzle operations can be performed in two ways. +Firstly by calling the [code]#swizzle# member function template, which takes a +variadic number of integer template arguments between [code]#0# and +[code]#NumElements-1#, specifying swizzle indexes. +Secondly by calling one of the simple swizzle member functions defined in +<> as [code]#XYZW_SWIZZLE# and [code]#RGBA_SWIZZLE#. +Note that the simple swizzle functions are only available for up to 4 element +vectors and are only available when the macro [code]#SYCL_SIMPLE_SWIZZLES# is +defined before including [code]##. In both cases the return type is always an instance of [code]#+__swizzled_vec__+#, an implementation-defined temporary class representing the result of the swizzle operation on the original [code]#vec# instance. Since the swizzle operation may result in a different number of elements, the -[code]#+__swizzled_vec__+# instance may represent a different number of -elements than the original [code]#vec#. +[code]#+__swizzled_vec__+# instance may represent a different number of elements +than the original [code]#vec#. Both kinds of swizzle member functions must not perform the swizzle operation themselves, instead the swizzle operation must be performed by the returned -instance of [code]#+__swizzled_vec__+# when used within an expression, -meaning if the returned [code]#+__swizzled_vec__+# is never used in an -expression no swizzle operation is performed. +instance of [code]#+__swizzled_vec__+# when used within an expression, meaning +if the returned [code]#+__swizzled_vec__+# is never used in an expression no +swizzle operation is performed. -Both the [code]#swizzle# member function template and the simple -swizzle member functions allow swizzle indexes to be repeated. +Both the [code]#swizzle# member function template and the simple swizzle member +functions allow swizzle indexes to be repeated. -A series of static constexpr values are provided within the -[code]#elem# struct to allow specifying named swizzle indexes when -calling the [code]#swizzle# member function template. +A series of static constexpr values are provided within the [code]#elem# struct +to allow specifying named swizzle indexes when calling the [code]#swizzle# +member function template. [[swizzled-vec-class]] ==== Swizzled [code]#vec# class -The [code]#+__swizzled_vec__+# class must define an unspecified temporary -which provides the entire interface of the SYCL [code]#vec# class template, -including swizzled member functions, with the additions and alterations -described below. +The [code]#+__swizzled_vec__+# class must define an unspecified temporary which +provides the entire interface of the SYCL [code]#vec# class template, including +swizzled member functions, with the additions and alterations described below. The member functions of the [code]#+__swizzled_vec__+# class behave as though they operate on a [code]#vec# that is the result of the swizzle operation. - * The [code]#+__swizzled_vec__+# class template must be readable as an - r-value reference on the RHS of an expression. In this case the swizzle - operation is performed on the RHS of the expression and then the result - is applied to the LHS of the expression. - * The [code]#+__swizzled_vec__+# class template must be assignable as - an l-value reference on the LHS of an expression. In this case the RHS - of the expression is applied to the original SYCL [code]#vec# which - the [code]#+__swizzled_vec__+# represents via the swizzle operation. - Note that a [code]#+__swizzled_vec__+# that is used in an l-value - expression may not contain any repeated element indexes. + * The [code]#+__swizzled_vec__+# class template must be readable as an r-value + reference on the RHS of an expression. + In this case the swizzle operation is performed on the RHS of the expression + and then the result is applied to the LHS of the expression. + * The [code]#+__swizzled_vec__+# class template must be assignable as an + l-value reference on the LHS of an expression. + In this case the RHS of the expression is applied to the original SYCL + [code]#vec# which the [code]#+__swizzled_vec__+# represents via the swizzle + operation. + Note that a [code]#+__swizzled_vec__+# that is used in an l-value expression + may not contain any repeated element indexes. + For example: [code]#f4.xxxx() = fx.wzyx()# would not be valid. - * The [code]#+__swizzled_vec__+# class template must be convertible to - an instance of SYCL [code]#vec# with the type [code]#DataT# - and number of elements specified by the swizzle member function, if - [code]#NumElements > 1#, and must be convertible to an instance of - type [code]#DataT#, if [code]#NumElements == 1#. + * The [code]#+__swizzled_vec__+# class template must be convertible to an + instance of SYCL [code]#vec# with the type [code]#DataT# and number of + elements specified by the swizzle member function, if [code]#NumElements > + 1#, and must be convertible to an instance of type [code]#DataT#, if + [code]#NumElements == 1#. * The [code]#+__swizzled_vec__+# class template must be non-copyable, - non-moveable, non-user constructible and may not be bound to a l-value - or escape the expression it was constructed in. For example - [code]#auto x = f4.x()# would not be valid. + non-moveable, non-user constructible and may not be bound to a l-value or + escape the expression it was constructed in. + For example [code]#auto x = f4.x()# would not be valid. * The [code]#+__swizzled_vec__+# class template should return - [code]#+__swizzled_vec__&+# for each operator inherited from the - [code]#vec# class template interface which would return - [code]#vec&#. + [code]#+__swizzled_vec__&+# for each operator inherited from the [code]#vec# + class template interface which would return [code]#vec&#. ==== Rounding modes -The various rounding modes that can be used in the [code]#as# member -function template are described in <>. +The various rounding modes that can be used in the [code]#as# member function +template are described in <>. [[table.vec.roundingmodes]] @@ -17667,9 +17729,9 @@ rtn [[memory-layout-and-alignment]] ==== Memory layout and alignment -The elements of an instance of the SYCL [code]#vec# class template are -stored in memory sequentially and contiguously and are aligned to the size -of the element type in bytes multiplied by the number of elements: +The elements of an instance of the SYCL [code]#vec# class template are stored in +memory sequentially and contiguously and are aligned to the size of the element +type in bytes multiplied by the number of elements: [[vec-memory-alignment]] [latexmath] @@ -17677,9 +17739,9 @@ of the element type in bytes multiplied by the number of elements: \texttt{sizeof}(\texttt{DataT}) \cdot \texttt{NumElements} ++++ -The exception to this is when the number of element is three in which case -the SYCL [code]#vec# is aligned to the size of the element type in -bytes multiplied by four: +The exception to this is when the number of element is three in which case the +SYCL [code]#vec# is aligned to the size of the element type in bytes multiplied +by four: [[vec3-memory-alignment]] [latexmath] @@ -17687,23 +17749,23 @@ bytes multiplied by four: \texttt{sizeof}(\texttt{DataT}) \cdot 4 ++++ -This is true for both host and device code in order to allow for instances -of the [code]#vec# class template to be passed to SYCL kernel -functions. +This is true for both host and device code in order to allow for instances of +the [code]#vec# class template to be passed to SYCL kernel functions. In no case, however, is the alignment guaranteed to be greater than 64 bytes. [NOTE] ==== -The alignment guarantee is limited to 64 bytes because some host compilers -(e.g. on Microsoft Windows) limit the maximum alignment of function parameters -to this value. +The alignment guarantee is limited to 64 bytes because some host compilers (e.g. +on Microsoft Windows) limit the maximum alignment of function parameters to this +value. ==== ==== Performance note -The usage of the subscript [code]#operator[]# may not be efficient on some devices. +The usage of the subscript [code]#operator[]# may not be efficient on some +devices. // %%%%%%%%%%%%%%%%%%%%%%%%%%%% end vec_class %%%%%%%%%%%%%%%%%%%%%%%%%%%% @@ -17715,29 +17777,30 @@ The usage of the subscript [code]#operator[]# may not be efficient on some devic === Math array types SYCL provides an [code]#marray# class -template to represent a contiguous fixed-size container. This type allows -sharing of containers between the host and its SYCL devices. +template to represent a contiguous fixed-size container. +This type allows sharing of containers between the host and its SYCL devices. The [code]#marray# class is templated on its element type and number of -elements. The number of elements parameter, [code]#NumElements#, is a positive -value of the [code]#std::size_t# type. The element type parameter, [code]#DataT#, -must be a _numeric type_ as it is defined by {cpp} standard. +elements. +The number of elements parameter, [code]#NumElements#, is a positive value of +the [code]#std::size_t# type. +The element type parameter, [code]#DataT#, must be a _numeric type_ as it is +defined by {cpp} standard. -An instance of the [code]#marray# class template can also be -implicitly converted to an instance of the data type when the number of -elements is [code]#1# in order to allow single element arrays and -scalars to be convertible with each other. +An instance of the [code]#marray# class template can also be implicitly +converted to an instance of the data type when the number of elements is +[code]#1# in order to allow single element arrays and scalars to be convertible +with each other. -Logical and comparison operators for [code]#marray# class template -return [code]#marray#. +Logical and comparison operators for [code]#marray# class template return +[code]#marray#. ==== Math array interface The constructors, member functions and non-member functions of the SYCL -[code]#marray# class template are listed in -<>, <> and -<> respectively. +[code]#marray# class template are listed in <>, +<> and <> respectively. // Interface for class: vec [source,,linenums] @@ -18201,26 +18264,24 @@ The SYCL programming API provides all permutations of the type alias: [code]#+using m = marray<, >+# -where [code]## is [code]#2#, [code]#3#, [code]#4#, -[code]#8# and [code]#16#, and pairings of [code]## and -[code]## for integral types are [code]#char# and -[code]#int8_t#, [code]#uchar# and [code]#uint8_t#, -[code]#short# and [code]#int16_t#, [code]#ushort# and -[code]#uint16_t#, [code]#int# and [code]#int32_t#, -[code]#uint# and [code]#uint32_t#, [code]#long# and -[code]#int64_t#, [code]#ulong# and [code]#uint64_t#, for -floating point types are both [code]#half#, [code]#float# and -[code]#double#, and for boolean type [code]#bool#. +where [code]## is [code]#2#, [code]#3#, [code]#4#, [code]#8# and +[code]#16#, and pairings of [code]## and [code]## for +integral types are [code]#char# and [code]#int8_t#, [code]#uchar# and +[code]#uint8_t#, [code]#short# and [code]#int16_t#, [code]#ushort# and +[code]#uint16_t#, [code]#int# and [code]#int32_t#, [code]#uint# and +[code]#uint32_t#, [code]#long# and [code]#int64_t#, [code]#ulong# and +[code]#uint64_t#, for floating point types are both [code]#half#, [code]#float# +and [code]#double#, and for boolean type [code]#bool#. -For example [code]#muint4# is the alias to [code]#marray# -and [code]#mfloat16# is the alias to [code]#marray#. +For example [code]#muint4# is the alias to [code]#marray# and +[code]#mfloat16# is the alias to [code]#marray#. [[memory-layout-and-alignment.marray]] ==== Memory layout and alignment -The elements of an instance of the [code]#marray# class template as if -stored in [code]#std::array#. +The elements of an instance of the [code]#marray# class template as if stored in +[code]#std::array#. // %%%%%%%%%%%%%%%%%%%%%%%%%%%% end marray_class %%%%%%%%%%%%%%%%%%%%%%%%%%%% @@ -18229,21 +18290,21 @@ stored in [code]#std::array#. The available features are: - * Accessor classes: Accessor classes specify acquisition and release of - buffer and image data structures to provide points at which a SYCL runtime - must guarantee memory consistency. - * Atomic operations: SYCL devices support a restricted subset of {cpp} - atomics and SYCL uses the library syntax from the next {cpp} specification - to make this available. + * Accessor classes: Accessor classes specify acquisition and release of buffer + and image data structures to provide points at which a SYCL runtime must + guarantee memory consistency. + * Atomic operations: SYCL devices support a restricted subset of {cpp} atomics + and SYCL uses the library syntax from the next {cpp} specification to make + this available. * Fences: Fence primitives are made available to order loads and stores. - They are exposed through the [code]#atomic_fence# function. Fences - can have acquire semantics, release semantics or both. + They are exposed through the [code]#atomic_fence# function. + Fences can have acquire semantics, release semantics or both. * Barriers: Barrier primitives are made available as a coordination mechanism - for work-items within individual <>. They are exposed through - the [code]#group_barrier# function. + for work-items within individual <>. + They are exposed through the [code]#group_barrier# function. * Hierarchical parallel dispatch: In the hierarchical parallelism model of - describing computations, work-items within a work-group may coordinate - via multiple instances of the [code]#parallel_for_work_item# function call, + describing computations, work-items within a work-group may coordinate via + multiple instances of the [code]#parallel_for_work_item# function call, rather than through the use of explicit <> operations. * Device event: they are used inside SYCL kernel functions to wait for asynchronous operations within a SYCL kernel function to complete. @@ -18252,10 +18313,10 @@ The available features are: [[sec:barriers-fences]] === Barriers and fences -A <> or <> provides memory ordering semantics -over both the local address space and global address space. A -<> provides control over the re-ordering of memory load and -store operations, subject to the associated memory [code]#order# and memory +A <> or <> provides memory ordering semantics over +both the local address space and global address space. +A <> provides control over the re-ordering of memory load and store +operations, subject to the associated memory [code]#order# and memory [code]#scope#, when paired with synchronization through an atomic object. [source,,linenums] @@ -18263,24 +18324,24 @@ store operations, subject to the associated memory [code]#order# and memory include::{header_dir}/synchronization.h[lines=4..-1] ---- -The effects of a call to [code]#atomic_fence# depend on the value of -the [code]#order# parameter: +The effects of a call to [code]#atomic_fence# depend on the value of the +[code]#order# parameter: * [code]#memory_order::relaxed:# No effect * [code]#memory_order::acquire:# Acquire fence * [code]#memory_order::release:# Release fence - * [code]#memory_order::acq_rel:# Both an acquire fence and a release + * [code]#memory_order::acq_rel:# Both an acquire fence and a release fence + * [code]#memory_order::seq_cst:# A sequentially consistent acquire and release fence - * [code]#memory_order::seq_cst:# A sequentially consistent acquire - and release fence A <> acts as both an acquire fence and a release fence: all work-items in the group execute a release fence prior to signaling arrival at the barrier, and all work-items in the group execute an acquire fence -afterwards. A <> provides implicit atomic synchronization -as if through an internal atomic object, such that the acquire and release fences -associated with the barrier synchronize with each other, without an explicit -atomic operation being required on an atomic object to synchronize the fences. +afterwards. +A <> provides implicit atomic synchronization as if through an +internal atomic object, such that the acquire and release fences associated with +the barrier synchronize with each other, without an explicit atomic operation +being required on an atomic object to synchronize the fences. [[device-event-class]] @@ -18290,13 +18351,13 @@ The SYCL [code]#device_event# class encapsulates a single SYCL device event which is available only within SYCL kernel functions and can be used to wait for asynchronous operations within a SYCL kernel function to complete. -All member functions of the [code]#device_event# class must not throw a -SYCL exception. +All member functions of the [code]#device_event# class must not throw a SYCL +exception. -A synopsis of the SYCL [code]#device_event# class is provided below. The -constructors and member functions of the SYCL [code]#device_event# class -are listed in <> and -<> respectively. +A synopsis of the SYCL [code]#device_event# class is provided below. +The constructors and member functions of the SYCL [code]#device_event# class are +listed in <> and <> +respectively. // Interface of device event class [source,,linenums] @@ -18339,44 +18400,47 @@ device_event(___unspecified___) The [code]#sycl::atomic_ref# class provides the ability to perform atomic operations in device code with a syntax similar to the {cpp} standard -[code]#std::atomic_ref#. The [code]#sycl::atomic_ref# class must not be used -in host code. +[code]#std::atomic_ref#. +The [code]#sycl::atomic_ref# class must not be used in host code. Unlike [code]#std::atomic_ref#, [code]#sycl::atomic_ref# does not provide a -default memory ordering for its operations. Instead, the application must -specify a default ordering via the [code]#DefaultOrder# template parameter. -This ordering is used as a default for most of the atomic operations, but -most member functions also provide an optional parameter that allows the -application to override this default. The set of supported orderings is -specific to a device, but every device is guaranteed to support at least -[code]#memory_order::relaxed#. If the default order is set to -[code]#memory_order::relaxed#, all memory order arguments default to -[code]#memory_order::relaxed#. If the default order is set to -[code]#memory_order::acq_rel#, memory order arguments default to -[code]#memory_order::acquire# for load operations, +default memory ordering for its operations. +Instead, the application must specify a default ordering via the +[code]#DefaultOrder# template parameter. +This ordering is used as a default for most of the atomic operations, but most +member functions also provide an optional parameter that allows the application +to override this default. +The set of supported orderings is specific to a device, but every device is +guaranteed to support at least [code]#memory_order::relaxed#. +If the default order is set to [code]#memory_order::relaxed#, all memory order +arguments default to [code]#memory_order::relaxed#. +If the default order is set to [code]#memory_order::acq_rel#, memory order +arguments default to [code]#memory_order::acquire# for load operations, [code]#memory_order::release# for store operations and -[code]#memory_order::acq_rel# for read-modify-write operations. If the -default order is set to [code]#memory_order::seq_cst#, all memory order +[code]#memory_order::acq_rel# for read-modify-write operations. +If the default order is set to [code]#memory_order::seq_cst#, all memory order arguments default to [code]#memory_order::seq_cst#. The [code]#sycl::atomic_ref# class has a template parameter [code]#DefaultScope#, which allows the application to define a default memory -scope for the atomic operations. Most member functions also provide an -optional parameter that allows the application to override this default. +scope for the atomic operations. +Most member functions also provide an optional parameter that allows the +application to override this default. The [code]#sycl::atomic_ref# class also has a template parameter [code]#AddressSpace#, which allows the application to make an assertion about -the address space of the object of type [code]#T# that it references. The -default value for this parameter is +the address space of the object of type [code]#T# that it references. +The default value for this parameter is [code]#access::address_space::generic_space#, which indicates that the object -could be in either the global or local address spaces. If the application -knows the address space, it can set this template parameter to either -[code]#access::address_space::global_space# or +could be in either the global or local address spaces. +If the application knows the address space, it can set this template parameter +to either [code]#access::address_space::global_space# or [code]#access::address_space::local_space# as an assertion to the -implementation. Specifying the address space via this template parameter may -allow the implementation to perform certain optimizations. Specifying an -address space that does not match the object's actual address space results in -undefined behavior. +implementation. +Specifying the address space via this template parameter may allow the +implementation to perform certain optimizations. +Specifying an address space that does not match the object's actual address +space results in undefined behavior. The template parameter [code]#T# must be one of the following types: @@ -18410,25 +18474,26 @@ include::{header_dir}/atomicref.h[lines=4..-1] The constructors and member functions for instances of the SYCL [code]#atomic_ref# class using any compatible type are listed in -<> -and <> respectively. Additional member -functions for integral, floating-point and pointer types are listed in -<>, -<> -and <> respectively. - -The static member [code]#required_alignment# describes the minimum -required alignment in bytes of an object that can be referenced by an +<> and <> +respectively. +Additional member functions for integral, floating-point and pointer types are +listed in <>, +<> and <> +respectively. + +The static member [code]#required_alignment# describes the minimum required +alignment in bytes of an object that can be referenced by an [code]#atomic_ref#, which must be at least [code]#alignof(T)#. -The static member [code]#is_always_lock_free# is true if all atomic -operations for type [code]#T# are always lock-free. A SYCL -implementation is not guaranteed to support atomic operations that are not -lock-free. +The static member [code]#is_always_lock_free# is true if all atomic operations +for type [code]#T# are always lock-free. +A SYCL implementation is not guaranteed to support atomic operations that are +not lock-free. The static members [code]#default_read_order#, [code]#default_write_order# and -[code]#default_read_modify_write_order# reflect the default memory order values for -each type of atomic operation, consistent with the [code]#DefaultOrder# template. +[code]#default_read_modify_write_order# reflect the default memory order values +for each type of atomic operation, consistent with the [code]#DefaultOrder# +template. The atomic operations and member functions behave as described in the {cpp} specification, barring the restrictions discussed above. @@ -18437,9 +18502,10 @@ specification, barring the restrictions discussed above. ==== Care must be taken when using atomics for work-item coordination, because work-items are not required to provide stronger than weakly parallel forward -progress guarantees. Operations that block a work-item, such as continuously -checking the value of an atomic variable until some condition holds, or using -atomic operations that are not lock-free, may prevent overall progress. +progress guarantees. +Operations that block a work-item, such as continuously checking the value of an +atomic variable until some condition holds, or using atomic operations that are +not lock-free, may prevent overall progress. ==== [[table.atomic-refs.constructors]] @@ -18903,13 +18969,13 @@ T* operator--() const // Deprecated atomics from SYCL 1.2.1 The atomic types and operations on atomic types provided by SYCL 1.2.1 are -deprecated in SYCL 2020, and will be removed in a future version of SYCL. The -types and operations are made available in the [code]#cl::sycl::# -namespace for backwards compatibility. +deprecated in SYCL 2020, and will be removed in a future version of SYCL. +The types and operations are made available in the [code]#cl::sycl::# namespace +for backwards compatibility. -The constructors and member functions for the [code]#cl::sycl::atomic# -class are listed in <> -and <> respectively. +The constructors and member functions for the [code]#cl::sycl::atomic# class are +listed in <> and <> +respectively. [source,,linenums] ---- @@ -19277,24 +19343,25 @@ Equivalent to calling [code]#object.fetch_max(operand, memoryOrder)#. When a kernel runs on a device that has either [code]#aspect::usm_atomic_host_allocations# or -[code]#aspect::usm_atomic_shared_allocations#, the device code and the host -code can concurrently access the same memory. This has a ramification on the -atomic operations because it is possible for device code and host code to -perform atomic operations on the same object _M_ in this shared memory. It -also has a ramification on the fence operations because the {cpp} core language -defines the semantics of these fence operations in relation to atomic -operations on some shared object _M_. The following paragraphs specify the -guarantees that the SYCL implementation provides when the application performs -atomic or fence operations in device code using the memory scope -[code]#memory_scope::system#. - -Atomic operations in device code using [code]#sycl::atomic_ref# on an object -_M_ are guaranteed to be atomic with respect to atomic operations in host code -using [code]#std::atomic_ref# on that same object _M_. +[code]#aspect::usm_atomic_shared_allocations#, the device code and the host code +can concurrently access the same memory. +This has a ramification on the atomic operations because it is possible for +device code and host code to perform atomic operations on the same object _M_ in +this shared memory. +It also has a ramification on the fence operations because the {cpp} core +language defines the semantics of these fence operations in relation to atomic +operations on some shared object _M_. +The following paragraphs specify the guarantees that the SYCL implementation +provides when the application performs atomic or fence operations in device code +using the memory scope [code]#memory_scope::system#. + +Atomic operations in device code using [code]#sycl::atomic_ref# on an object _M_ +are guaranteed to be atomic with respect to atomic operations in host code using +[code]#std::atomic_ref# on that same object _M_. Fence operations in device code using [code]#sycl::atomic_fence# synchronize -with fence operations in host code using [code]#std::atomic_thread_fence# if -the fence operations shared the same atomic object _M_ and follow the rules for +with fence operations in host code using [code]#std::atomic_thread_fence# if the +fence operations shared the same atomic object _M_ and follow the rules for fence synchronization defined in the {cpp} core language. Fence operations in device code using [code]#sycl::atomic_fence# synchronize @@ -19302,69 +19369,66 @@ with atomic operations in host code using [code]#std::atomic_ref# if the operations share the same atomic object _M_ and follow the rules for fence synchronization defined in the {cpp} core language. -Atomic operations in device code using [code]#sycl::atomic_ref# synchronize -with fence operations in host code using [code]#std::atomic_thread_fence# if -the operations share the same atomic object _M_ and follow the rules for fence +Atomic operations in device code using [code]#sycl::atomic_ref# synchronize with +fence operations in host code using [code]#std::atomic_thread_fence# if the +operations share the same atomic object _M_ and follow the rules for fence synchronization defined in the {cpp} core language. [[subsec:stream]] == Stream class -The SYCL [code]#stream# class is a buffered output stream that allows -outputting the values of built-in, vector and SYCL types to the console. The -implementation of how values are streamed to the console is left as an +The SYCL [code]#stream# class is a buffered output stream that allows outputting +the values of built-in, vector and SYCL types to the console. +The implementation of how values are streamed to the console is left as an implementation detail. -The way in which values are output by an instance of the SYCL -[code]#stream# class can also be altered using a range of manipulators. +The way in which values are output by an instance of the SYCL [code]#stream# +class can also be altered using a range of manipulators. -There are two limits that are relevant for the [code]#stream# class. The -[code]#totalBufferSize# limit specifies the maximum size of the overall +There are two limits that are relevant for the [code]#stream# class. +The [code]#totalBufferSize# limit specifies the maximum size of the overall character stream that can be output during a kernel invocation, and the -[code]#workItemBufferSize# limit specifies the maximum size of the -character stream that can be output within a work-item before a flush must be -performed. Both of these limits are specified in bytes. The -[code]#totalBufferSize# limit must be sufficient to contain the characters +[code]#workItemBufferSize# limit specifies the maximum size of the character +stream that can be output within a work-item before a flush must be performed. +Both of these limits are specified in bytes. +The [code]#totalBufferSize# limit must be sufficient to contain the characters output by all stream statements during execution of a kernel invocation (the -aggregate of outputs from all work-items), and the -[code]#workItemBufferSize# limit must be sufficient to contain the -characters output within a work-item between stream flush operations. - -If the [code]#totalBufferSize# or [code]#workItemBufferSize# -limits are exceeded, it is implementation-defined whether the streamed -characters exceeding the limit are output, or silently ignored/discarded, -and if output it is implementation-defined whether those extra characters -exceeding the [code]#workItemBufferSize# limit count toward the -[code]#totalBufferSize# limit. Regardless of this implementation -defined behavior of output exceeding the limits, no undefined or erroneous -behavior is permitted of an implementation when the limits are exceeded. +aggregate of outputs from all work-items), and the [code]#workItemBufferSize# +limit must be sufficient to contain the characters output within a work-item +between stream flush operations. + +If the [code]#totalBufferSize# or [code]#workItemBufferSize# limits are +exceeded, it is implementation-defined whether the streamed characters exceeding +the limit are output, or silently ignored/discarded, and if output it is +implementation-defined whether those extra characters exceeding the +[code]#workItemBufferSize# limit count toward the [code]#totalBufferSize# limit. +Regardless of this implementation defined behavior of output exceeding the +limits, no undefined or erroneous behavior is permitted of an implementation +when the limits are exceeded. Unused characters within [code]#workItemBufferSize# (any portion of the -[code]#workItemBufferSize# capacity that has not been used at the time -of a stream flush) do not count toward the [code]#totalBufferSize# -limit, in that only characters flushed count toward the -[code]#totalBufferSize# limit. +[code]#workItemBufferSize# capacity that has not been used at the time of a +stream flush) do not count toward the [code]#totalBufferSize# limit, in that +only characters flushed count toward the [code]#totalBufferSize# limit. -The SYCL [code]#stream# class provides the common reference semantics -(see <>). +The SYCL [code]#stream# class provides the common reference semantics (see +<>). === Stream class interface -The constructors and member functions of the SYCL [code]#stream# class -are listed in <>, -<>, and <> respectively. The -additional common special member functions and common member functions are +The constructors and member functions of the SYCL [code]#stream# class are +listed in <>, <>, and +<> respectively. +The additional common special member functions and common member functions are listed in <> and <>, respectively. The operand types that are supported by the SYCL [code]#stream# class -[code]#operator<<()# operator are listed in -<>. +[code]#operator<<()# operator are listed in <>. The manipulators that are supported by the SYCL [code]#stream# class -[code]#operator<<()# operator are listed in -<>. +[code]#operator<<()# operator are listed in <>. // Interface of the device class [source,,linenums] @@ -19622,29 +19686,31 @@ An instance of the SYCL [code]#stream# class is required to output everything that is streamed to it via the [code]#operator<<()# operator before a flush operation (that doesn't exceed the [code]#workItemBufferSize# or [code]#totalBufferSize# limits) within a SYCL kernel function by the time that -the event associated with a command group submission enters the completed -state. The point at which the flush operation is performed is -implementation-defined. +the event associated with a command group submission enters the completed state. +The point at which the flush operation is performed is implementation-defined. -The SYCL [code]#stream# class is required to output the content of each stream, between flushes (up to -[code]#workItemBufferSize)#, without mixing with content from the same stream in other work-items. -There are no other output order guarantees between work-items or between streams. The stream flush -operation therefore delimits the unit of output that is guaranteed to be displayed without mixing with -other work-items, with respect to a single stream. +The SYCL [code]#stream# class is required to output the content of each stream, +between flushes (up to [code]#workItemBufferSize)#, without mixing with content +from the same stream in other work-items. +There are no other output order guarantees between work-items or between +streams. +The stream flush operation therefore delimits the unit of output that is +guaranteed to be displayed without mixing with other work-items, with respect to +a single stream. === Implicit flush -There is guaranteed to be an implicit flush of each stream used by a -kernel, at the end of kernel execution, from the perspective of each -work-item. There is also an implicit flush when the endl stream -manipulator is executed. No other implicit flushes are permitted in -an implementation. +There is guaranteed to be an implicit flush of each stream used by a kernel, at +the end of kernel execution, from the perspective of each work-item. +There is also an implicit flush when the endl stream manipulator is executed. +No other implicit flushes are permitted in an implementation. === Performance note -The usage of the [code]#stream# class is designed for debugging purposes and is therefore not recommended for performance critical applications. +The usage of the [code]#stream# class is designed for debugging purposes and is +therefore not recommended for performance critical applications. // \input{builtin_functions} @@ -19655,31 +19721,33 @@ The usage of the [code]#stream# class is designed for debugging purposes and is == SYCL built-in functions for SYCL host and device // Intentional OpenCL reference -SYCL kernels may execute on any SYCL device, which requires the functions -used in the kernels to be compiled and linked for both device and host. In -the SYCL programming model, the built-ins are available for the entire SYCL -application within the [code]#sycl# namespace, although their semantics -may be different. This section follows the OpenCL 1.2 specification document -<> - except that for SYCL, all functions are located -within the [code]#sycl# namespace - and describes the behavior of these -functions for SYCL host and device. The expected precision and any other -semantic requirements are defined in the backend specification. - -The SYCL built-in functions are available throughout the SYCL application, -and depending on where they execute, they are either implemented using their -host implementation or the device implementation. The SYCL system guarantees -that all of the built-in functions fulfill the same requirements for both -host and device. +SYCL kernels may execute on any SYCL device, which requires the functions used +in the kernels to be compiled and linked for both device and host. +In the SYCL programming model, the built-ins are available for the entire SYCL +application within the [code]#sycl# namespace, although their semantics may be +different. +This section follows the OpenCL 1.2 specification document <> - except that for SYCL, all functions are located within the [code]#sycl# +namespace - and describes the behavior of these functions for SYCL host and +device. +The expected precision and any other semantic requirements are defined in the +backend specification. + +The SYCL built-in functions are available throughout the SYCL application, and +depending on where they execute, they are either implemented using their host +implementation or the device implementation. +The SYCL system guarantees that all of the built-in functions fulfill the same +requirements for both host and device. [[sec:function-objects]] === Function objects -SYCL provides a number of function objects in the [code]#sycl# namespace -on host and device. All function objects obey {cpp} conversion and promotion -rules. Each function object is additionally specialized for [code]#void# -as a _transparent_ function object that deduces its parameter types -and return type. +SYCL provides a number of function objects in the [code]#sycl# namespace on host +and device. +All function objects obey {cpp} conversion and promotion rules. +Each function object is additionally specialized for [code]#void# as a +_transparent_ function object that deduces its parameter types and return type. [source,,linenums] ---- @@ -19820,30 +19888,31 @@ T operator()(const T& x, const T& y) const SYCL provides a number of functions that expose functionality tied to groups of work-items (such as <> and collective operations). These group functions act as _synchronization points_ and must be encountered in -converged <> by all work-items in the group. If one work-item in -a group calls a group function, then all work-items in that group must call -exactly the same function under the same set of conditions --- calling the same -function under different conditions (e.g. in different iterations of a loop, or -different branches of a conditional statement) results in undefined behavior. +converged <> by all work-items in the group. +If one work-item in a group calls a group function, then all work-items in that +group must call exactly the same function under the same set of conditions --- +calling the same function under different conditions (e.g. in different +iterations of a loop, or different branches of a conditional statement) results +in undefined behavior. Additionally, restrictions may be placed on the arguments passed to each function in order to ensure that all work-items in the group agree on the -operation that is being performed. Any such restrictions on the arguments -passed to a function are defined within the descriptions of those functions. +operation that is being performed. +Any such restrictions on the arguments passed to a function are defined within +the descriptions of those functions. Violating these restrictions results in undefined behavior. All group functions are supported for the fundamental scalar types supported by -SYCL (see <>) and instances of the SYCL -[code]#vec# and [code]#marray# classes. +SYCL (see <>) and instances of the SYCL [code]#vec# and +[code]#marray# classes. -Using a group function inside of a kernel may introduce additional -limits on the resources available to user code inside the same kernel. The -behavior of these limits is implementation-defined, but must be reflected by -calls to kernel querying functions (such as -[code]#kernel::get_info#) as described in <>. +Using a group function inside of a kernel may introduce additional limits on the +resources available to user code inside the same kernel. +The behavior of these limits is implementation-defined, but must be reflected by +calls to kernel querying functions (such as [code]#kernel::get_info#) as +described in <>. It is undefined behavior for any group function to be invoked within a -[code]#parallel_for_work_group# or [code]#parallel_for_work_item# -context. +[code]#parallel_for_work_group# or [code]#parallel_for_work_item# context. ==== Group type trait @@ -19853,21 +19922,21 @@ include::{header_dir}/algorithms/is_group.h[lines=4..-1] ---- The [code]#is_group# type trait is used to determine which types of groups are -supported by group functions, and to control when group functions participate -in overload resolution. +supported by group functions, and to control when group functions participate in +overload resolution. [code]#is_group# inherits from [code]#std::true_type# if [code]#T# is the type of a standard SYCL group ([code]#group# or [code]#sub_group#) and it -inherits from [code]#std::false_type# otherwise. A SYCL implementation may -introduce additional specializations of [code]#is_group# for -implementation-defined group types, if the interface of those types supports all -member functions and static members common to the [code]#group# and -[code]#sub_group# classes. +inherits from [code]#std::false_type# otherwise. +A SYCL implementation may introduce additional specializations of +[code]#is_group# for implementation-defined group types, if the interface of +those types supports all member functions and static members common to the +[code]#group# and [code]#sub_group# classes. ==== [code]#group_broadcast# -The [code]#group_broadcast# function communicates a value held by one -work-item to all other work-items in the group. +The [code]#group_broadcast# function communicates a value held by one work-item +to all other work-items in the group. [source,,linenums] ---- @@ -19879,8 +19948,8 @@ include::{header_dir}/groups/broadcast.h[lines=4..-1] trivially copyable type. + -- -_Returns:_ The value of [code]#x# from the work-item with the smallest linear -id within group [code]#g#. +_Returns:_ The value of [code]#x# from the work-item with the smallest linear id +within group [code]#g#. -- . _Constraints:_ Available only if @@ -19902,9 +19971,8 @@ id within group [code]#g#. -- _Preconditions:_ [code]#local_id# must be the same for all work-items in the group, and its dimensionality must match the dimensionality of the group. -The value of [code]#local_id# in each dimension must be greater than or equal -to 0 and less than the value of [code]#get_local_range()# in the same -dimension. +The value of [code]#local_id# in each dimension must be greater than or equal to +0 and less than the value of [code]#get_local_range()# in the same dimension. _Returns:_ The value of [code]#x# from the work-item with the specified id within group [code]#g#. @@ -19925,49 +19993,50 @@ include::{header_dir}/groups/barrier.h[lines=4..-1] + -- _Effects:_ The current work-item will wait at the barrier until all work-items -in group [code]#g# have reached the barrier. In addition, the barrier performs -<> operations ensuring that memory accesses issued before the -barrier are not re-ordered with those issued after the barrier: all work-items -in group [code]#g# execute a release fence prior to arriving at the -barrier, all work-items in group [code]#g# execute an acquire fence afterwards, -and there is an implicit synchronization of these fences as if provided by an -explicit atomic operation on an atomic object. - -By default, the scope of these fences is set to the narrowest -scope including all work-items in group [code]#g# (as reported by -[code]#Group::fence_scope#). This scope may be optionally overridden -with a wider scope, specified by the [code]#fence_scope# argument. +in group [code]#g# have reached the barrier. +In addition, the barrier performs <> operations ensuring that memory +accesses issued before the barrier are not re-ordered with those issued after +the barrier: all work-items in group [code]#g# execute a release fence prior to +arriving at the barrier, all work-items in group [code]#g# execute an acquire +fence afterwards, and there is an implicit synchronization of these fences as if +provided by an explicit atomic operation on an atomic object. + +By default, the scope of these fences is set to the narrowest scope including +all work-items in group [code]#g# (as reported by [code]#Group::fence_scope#). +This scope may be optionally overridden with a wider scope, specified by the +[code]#fence_scope# argument. -- [[sec:algorithms]] === Group algorithms library -SYCL provides an algorithms library based on the functions described -in Section 28 of the {cpp17} specification. The first argument to each function -is a <>, and data ranges can be described using pointers, iterators or -instances of the [code]#multi_ptr# class. The functions defined in this -section are free functions available in the [code]#sycl# namespace. +SYCL provides an algorithms library based on the functions described in Section +28 of the {cpp17} specification. +The first argument to each function is a <>, and data ranges can be +described using pointers, iterators or instances of the [code]#multi_ptr# class. +The functions defined in this section are free functions available in the +[code]#sycl# namespace. -Any restrictions from the standard algorithms library apply. Some of the -functions in the SYCL algorithms library introduce additional restrictions -in order to maximize portability across different devices and to minimize -the chances of encountering unexpected behavior. +Any restrictions from the standard algorithms library apply. +Some of the functions in the SYCL algorithms library introduce additional +restrictions in order to maximize portability across different devices and to +minimize the chances of encountering unexpected behavior. All algorithms are supported for the fundamental scalar types supported by SYCL -(see <>) and instances of the SYCL -[code]#vec# and [code]#marray# classes. +(see <>) and instances of the SYCL [code]#vec# and +[code]#marray# classes. The <> argument to a SYCL algorithm denotes that it should be performed -collaboratively by the work-items in the specified group. All algorithms -act as group functions (as defined in <>), inheriting all -restrictions of group functions. Unless the description of a function says -otherwise, how the elements of a range are processed by the work-items in a -group is undefined. +collaboratively by the work-items in the specified group. +All algorithms act as group functions (as defined in <>), +inheriting all restrictions of group functions. +Unless the description of a function says otherwise, how the elements of a range +are processed by the work-items in a group is undefined. SYCL provides separate functions for algorithms which use the work-items in a -group to execute an operation over a range of iterators and algorithms which -are applied to data held directly by the work-items in a group. An example -of the usage of these functions is given below: +group to execute an operation over a range of iterators and algorithms which are +applied to data held directly by the work-items in a group. +An example of the usage of these functions is given below: [[listing.group.algorithms]] .Using the group algorithms library to perform a work-group reduce @@ -20114,13 +20183,11 @@ shifting values a fixed number of work-items to the left or right. include::{header_dir}/algorithms/shift.h[lines=4..-1] ---- - . _Constraints:_ Available only if - [code]#std::is_same_v, sub_group># is true and - [code]#T# is a trivially copyable type. + . _Constraints:_ Available only if [code]#std::is_same_v, + sub_group># is true and [code]#T# is a trivially copyable type. + -- -_Preconditions:_ [code]#delta# must be the same for all work-items in the -group. +_Preconditions:_ [code]#delta# must be the same for all work-items in the group. _Returns:_ the value of [code]#x# from the work-item whose group local id ([code]#id#) is [code]#delta# larger than that of the calling work-item. @@ -20128,18 +20195,16 @@ _Returns:_ the value of [code]#x# from the work-item whose group local id size, but the value returned in this case is unspecified. -- - . _Constraints:_ Available only if - [code]#std::is_same_v, sub_group># is true and - [code]#T# is a trivially copyable type. + . _Constraints:_ Available only if [code]#std::is_same_v, + sub_group># is true and [code]#T# is a trivially copyable type. + -- -_Preconditions:_ [code]#delta# must be the same for all work-items in the -group. +_Preconditions:_ [code]#delta# must be the same for all work-items in the group. _Returns:_ the value of [code]#x# from the work-item whose group local id ([code]#id#) is [code]#delta# smaller than that of the calling work-item. -[code]#id - delta# may be less than 0, but the value returned in this case -is unspecified. +[code]#id - delta# may be less than 0, but the value returned in this case is +unspecified. -- ==== [code]#permute# @@ -20156,18 +20221,17 @@ id and some fixed mask. include::{header_dir}/algorithms/permute.h[lines=4..-1] ---- - . _Constraints:_ Available only if - [code]#std::is_same_v, sub_group># is true and - [code]#T# is a trivially copyable type. + . _Constraints:_ Available only if [code]#std::is_same_v, + sub_group># is true and [code]#T# is a trivially copyable type. + -- -_Preconditions:_ [code]#mask# must be the same for all work-items in the -group. +_Preconditions:_ [code]#mask# must be the same for all work-items in the group. -_Returns:_ the value of [code]#x# from the work-item whose group local id -is equal to the bitwise exclusive OR of the calling work-item's group local id -and [code]#mask#. The result of the exclusive OR may be greater than or equal to -the group's linear size, but the value returned in this case is unspecified. +_Returns:_ the value of [code]#x# from the work-item whose group local id is +equal to the bitwise exclusive OR of the calling work-item's group local id and +[code]#mask#. +The result of the exclusive OR may be greater than or equal to the group's +linear size, but the value returned in this case is unspecified. -- ==== [code]#select# @@ -20183,20 +20247,20 @@ by any other work-item in group [code]#g#. include::{header_dir}/algorithms/select.h[lines=4..-1] ---- - . _Constraints:_ Available only if - [code]#std::is_same_v, sub_group># is true and - [code]#T# is a trivially copyable type. + . _Constraints:_ Available only if [code]#std::is_same_v, + sub_group># is true and [code]#T# is a trivially copyable type. + -- _Returns:_ the value of [code]#x# from the work-item with the group local id -specified by [code]#remote_local_id#. The value of [code]#remote_local_id# may -be outside of the group, but the value returned in this case is unspecified. +specified by [code]#remote_local_id#. +The value of [code]#remote_local_id# may be outside of the group, but the value +returned in this case is unspecified. -- ==== [code]#reduce# -The [code]#reduce# function from standard {cpp} combines the values in a range in -an unspecified order using a binary operator. +The [code]#reduce# function from standard {cpp} combines the values in a range +in an unspecified order using a binary operator. SYCL provides two similar algorithms that compute the same generalized sum as defined by standard {cpp}: @@ -20208,10 +20272,10 @@ defined by standard {cpp}: a group. The result of a call to these functions is non-deterministic if the binary -operator is not commutative and associative. Only the binary operators defined -in <> are supported by the [code]#reduce# functions in -SYCL 2020, but the standard {cpp} syntax is used for forward compatibility with -future SYCL versions. +operator is not commutative and associative. +Only the binary operators defined in <> are supported by +the [code]#reduce# functions in SYCL 2020, but the standard {cpp} syntax is used +for forward compatibility with future SYCL versions. [source,,linenums] ---- @@ -20228,8 +20292,8 @@ _Mandates:_ [code]#binary_op(*first, *first)# must return a value of type [code]#std::iterator_traits::value_type#. _Preconditions:_ [code]#first#, [code]#last# and the type of [code]#binary_op# -must be the same for all work-items in group [code]#g#. [code]#binary_op# must -be an instance of a SYCL function object. +must be the same for all work-items in group [code]#g#. +[code]#binary_op# must be an instance of a SYCL function object. _Returns:_ The result of combining the values resulting from dereferencing all iterators in the range [code]#[first, last)# using the operator @@ -20251,15 +20315,14 @@ _Preconditions:_ [code]#first#, [code]#last#, [code]#init# and the type of [code]#binary_op# must be an instance of a SYCL function object. _Returns:_ The result of combining the values resulting from dereferencing all -iterators in the range [code]#[first, last)# and the initial value -[code]#init# using the operator [code]#binary_op#, where the values are combined -according to the generalized sum defined in standard {cpp}. +iterators in the range [code]#[first, last)# and the initial value [code]#init# +using the operator [code]#binary_op#, where the values are combined according to +the generalized sum defined in standard {cpp}. -- . _Constraints:_ Available only if [code]#sycl::is_group_v># is true, [code]#T# is a - fundamental type and [code]#BinaryOperation# is a SYCL function object - type. + fundamental type and [code]#BinaryOperation# is a SYCL function object type. + -- _Mandates:_ [code]#binary_op(x, x)# must return a value of type [code]#T#. @@ -20267,9 +20330,9 @@ _Mandates:_ [code]#binary_op(x, x)# must return a value of type [code]#T#. _Preconditions:_ [code]#binary_op# must be an instance of a SYCL function object. -_Returns:_ The result of combining all the values of [code]#x# specified by -each work-item in group [code]#g# using the operator [code]#binary_op#, where -the values are combined according to the generalized sum defined in standard {cpp}. +_Returns:_ The result of combining all the values of [code]#x# specified by each +work-item in group [code]#g# using the operator [code]#binary_op#, where the +values are combined according to the generalized sum defined in standard {cpp}. -- . _Constraints:_ Available only if @@ -20283,8 +20346,8 @@ _Mandates:_ [code]#binary_op(init, x)# must return a value of type [code]#T#. _Preconditions:_ [code]#binary_op# must be an instance of a SYCL function object. -_Returns:_ The result of combining all the values of [code]#x# specified by -each work-item in group [code]#g# and the initial value [code]#init# using the +_Returns:_ The result of combining all the values of [code]#x# specified by each +work-item in group [code]#g# and the initial value [code]#init# using the operator [code]#binary_op#, where the values are combined according to the generalized sum defined in standard {cpp}. -- @@ -20292,12 +20355,12 @@ generalized sum defined in standard {cpp}. ==== [code]#exclusive_scan# and [code]#inclusive_scan# The [code]#exclusive_scan# and [code]#inclusive_scan# functions in standard -{cpp} compute a prefix sum using a binary operator. For a scan of elements -_[x~0~, {ldots}, x~n~]_, the _i_ th result in an exclusive scan is the -generalized noncommutative sum of all elements preceding _x~i~_ (excluding -_x~i~_ itself), whereas the _i_ th result in an inclusive scan is the -generalized noncommutative sum of all elements preceding _x~i~_ (including -_x~i~_ itself). +{cpp} compute a prefix sum using a binary operator. +For a scan of elements _[x~0~, {ldots}, x~n~]_, the _i_ th result in an +exclusive scan is the generalized noncommutative sum of all elements preceding +_x~i~_ (excluding _x~i~_ itself), whereas the _i_ th result in an inclusive scan +is the generalized noncommutative sum of all elements preceding _x~i~_ +(including _x~i~_ itself). SYCL provides two similar sets of algorithms that compute the same prefix sums using the generalized noncommutative sum as defined by standard {cpp}: @@ -20310,10 +20373,11 @@ intermediate partial prefix sums are written to memory as in standard {cpp}. perform a scan over values held directly by the work-items in a group, and the result returned to each work-item represents a partial prefix sum. -The result of a call to a scan is non-deterministic if the binary operator is not -associative. Only the binary operators defined in <> are -supported by the scan functions in SYCL 2020, but the standard {cpp} syntax is -used for forward compatibility with future SYCL versions. +The result of a call to a scan is non-deterministic if the binary operator is +not associative. +Only the binary operators defined in <> are supported by +the scan functions in SYCL 2020, but the standard {cpp} syntax is used for +forward compatibility with future SYCL versions. [source,,linenums] ---- @@ -20331,7 +20395,7 @@ _Mandates:_ [code]#binary_op(*first, *first)# must return a value of type _Preconditions:_ [code]#first#, [code]#last#, [code]#result# and the type of [code]#binary_op# must be the same for all work-items in group [code]#g#. -[code]#binary_op# must be an instance of a SYCL function object. +[code]#binary_op# must be an instance of a SYCL function object. [NOTE] ==== @@ -20340,27 +20404,27 @@ Note that [code]#first# may be equal to [code]#result#. _Effects:_ The value written to [code]#result# + _i_ is the exclusive scan of the values resulting from dereferencing the first _i_ values in the range -[code]#[first, last)# and the identity value of [code]#binary_op# (as -identified by [code]#sycl::known_identity#), using the operator -[code]#binary_op#. The scan is computed using a generalized noncommutative sum -as defined in standard {cpp}. +[code]#[first, last)# and the identity value of [code]#binary_op# (as identified +by [code]#sycl::known_identity#), using the operator [code]#binary_op#. +The scan is computed using a generalized noncommutative sum as defined in +standard {cpp}. _Returns:_ A pointer to the end of the output range. -- . _Constraints:_ Available only if [code]#sycl::is_group_v># is true, [code]#InPtr# and - [code]#OutPtr# are pointers to fundamental types, [code]#T# is a - fundamental type, and [code]#BinaryOperation# is a SYCL function object - type. + [code]#OutPtr# are pointers to fundamental types, [code]#T# is a fundamental + type, and [code]#BinaryOperation# is a SYCL function object type. + -- _Mandates:_ [code]#binary_op(init, *first)# must return a value of type [code]#T#. -_Preconditions:_ [code]#first#, [code]#last#, [code]#result#, [code]#init# and the -type of [code]#binary_op# must be the same for all work-items in group -[code]#g#. [code]#binary_op# must be an instance of a SYCL function object. +_Preconditions:_ [code]#first#, [code]#last#, [code]#result#, [code]#init# and +the type of [code]#binary_op# must be the same for all work-items in group +[code]#g#. +[code]#binary_op# must be an instance of a SYCL function object. [NOTE] ==== @@ -20369,9 +20433,10 @@ Note that [code]#first# may be equal to [code]#result#. _Effects:_ The value written to [code]#result# + _i_ is the exclusive scan of the values resulting from dereferencing the first _i_ values in the range -[code]#[first, last)# and an initial value specified by [code]#init#, using -the operator [code]#binary_op#. The scan is computed using a generalized -noncommutative sum as defined in standard {cpp}. +[code]#[first, last)# and an initial value specified by [code]#init#, using the +operator [code]#binary_op#. +The scan is computed using a generalized noncommutative sum as defined in +standard {cpp}. _Returns:_ A pointer to the end of the output range. -- @@ -20387,12 +20452,14 @@ _Mandates:_ [code]#binary_op(x, x)# must return a value of type [code]#T#. _Preconditions:_ [code]#binary_op# must be an instance of a SYCL function object. -_Returns:_ The value returned on work-item _i_ is the exclusive scan of -the first _i_ values in group [code]#g# and the identity value of -[code]#binary_op# (as identified by [code]#sycl::known_identity#), using the -operator [code]#binary_op#. The scan is computed using a generalized -noncommutative sum as defined in standard {cpp}. For multi-dimensional groups, -the order of work-items in group [code]#g# is determined by their linear id. +_Returns:_ The value returned on work-item _i_ is the exclusive scan of the +first _i_ values in group [code]#g# and the identity value of [code]#binary_op# +(as identified by [code]#sycl::known_identity#), using the operator +[code]#binary_op#. +The scan is computed using a generalized noncommutative sum as defined in +standard {cpp}. +For multi-dimensional groups, the order of work-items in group [code]#g# is +determined by their linear id. -- . _Constraints:_ Available only if @@ -20406,11 +20473,12 @@ _Mandates:_ [code]#binary_op(init, x)# must return a value of type [code]#T#. _Preconditions:_ [code]#binary_op# must be an instance of a SYCL function object. -_Returns:_ The value returned on work-item _i_ is the exclusive scan of -the first _i_ values in group [code]#g# and an initial value specified by -[code]#init#, using the operator [code]#binary_op#. The scan is computed using -a generalized noncommutative sum as defined in standard {cpp}. For -multi-dimensional groups, the order of work-items in group [code]#g# is +_Returns:_ The value returned on work-item _i_ is the exclusive scan of the +first _i_ values in group [code]#g# and an initial value specified by +[code]#init#, using the operator [code]#binary_op#. +The scan is computed using a generalized noncommutative sum as defined in +standard {cpp}. +For multi-dimensional groups, the order of work-items in group [code]#g# is determined by their linear id. -- @@ -20430,7 +20498,7 @@ _Mandates:_ [code]#binary_op(*first, *first)# must return a value of type _Preconditions:_ [code]#first#, [code]#last#, [code]#result# and the type of [code]#binary_op# must be the same for all work-items in group [code]#g#. -[code]#binary_op# must be an instance of a SYCL function object. +[code]#binary_op# must be an instance of a SYCL function object. [NOTE] ==== @@ -20439,24 +20507,26 @@ Note that [code]#first# may be equal to [code]#result#. _Effects:_ The value written to [code]#result# + _i_ is the inclusive scan of the values resulting from dereferencing the first _i_ values in the range -[code]#[first, last)#, using the operator [code]#binary_op#. The scan is -computed using a generalized noncommutative sum as defined in standard {cpp}. +[code]#[first, last)#, using the operator [code]#binary_op#. +The scan is computed using a generalized noncommutative sum as defined in +standard {cpp}. _Returns:_ A pointer to the end of the output range. -- . _Constraints:_ Available only if [code]#sycl::is_group_v># is true, [code]#InPtr# and - [code]#OutPtr# are pointers to fundamental types, [code]#BinaryOperation# - is a SYCL function object type, and [code]#T# is a fundamental type. + [code]#OutPtr# are pointers to fundamental types, [code]#BinaryOperation# is + a SYCL function object type, and [code]#T# is a fundamental type. + -- _Mandates:_ [code]#binary_op(init, *first)# must return a value of type [code]#T#. -_Preconditions:_ [code]#first#, [code]#last#, [code]#result#, [code]#init# and the -type of [code]#binary_op# must be the same for all work-items in group -[code]#g#. [code]#binary_op# must be an instance of a SYCL function object. +_Preconditions:_ [code]#first#, [code]#last#, [code]#result#, [code]#init# and +the type of [code]#binary_op# must be the same for all work-items in group +[code]#g#. +[code]#binary_op# must be an instance of a SYCL function object. [NOTE] ==== @@ -20465,9 +20535,10 @@ Note that [code]#first# may be equal to [code]#result#. _Effects:_ The value written to [code]#result# + _i_ is the inclusive scan of the values resulting from dereferencing the first _i_ values in the range -[code]#[first, last)# and an initial value specified by -[code]#init#, using the operator [code]#binary_op#. The scan is computed using -a generalized noncommutative sum as defined in standard {cpp}. +[code]#[first, last)# and an initial value specified by [code]#init#, using the +operator [code]#binary_op#. +The scan is computed using a generalized noncommutative sum as defined in +standard {cpp}. _Returns:_ A pointer to the end of the output range. -- @@ -20483,11 +20554,12 @@ _Mandates:_ [code]#binary_op(x, x)# must return a value of type [code]#T#. _Preconditions:_ [code]#binary_op# must be an instance of a SYCL function object. -_Returns:_ The value returned on work-item _i_ is the inclusive scan of -the first _i_ values in group [code]#g#, using the operator [code]#binary_op#. +_Returns:_ The value returned on work-item _i_ is the inclusive scan of the +first _i_ values in group [code]#g#, using the operator [code]#binary_op#. The scan is computed using a generalized noncommutative sum as defined in -standard {cpp}. For multi-dimensional groups, the order of work-items in group -[code]#g# is determined by their linear id. +standard {cpp}. +For multi-dimensional groups, the order of work-items in group [code]#g# is +determined by their linear id. -- . _Constraints:_ Available only if @@ -20501,30 +20573,31 @@ _Mandates:_ [code]#binary_op(init, x)# must return a value of type [code]#T#. _Preconditions:_ [code]#binary_op# must be an instance of a SYCL function object. -_Returns:_ The value returned on work-item _i_ is the inclusive scan of -the first _i_ values in group [code]#g# and an initial value specified by -[code]#init#, using the operator [code]#binary_op#. The scan is computed using -a generalized noncommutative sum as defined in standard {cpp}. For -multi-dimensional groups, the order of work-items in group [code]#g# is +_Returns:_ The value returned on work-item _i_ is the inclusive scan of the +first _i_ values in group [code]#g# and an initial value specified by +[code]#init#, using the operator [code]#binary_op#. +The scan is computed using a generalized noncommutative sum as defined in +standard {cpp}. +For multi-dimensional groups, the order of work-items in group [code]#g# is determined by their linear id. -- === Math functions -In SYCL the OpenCL math functions are available in the namespace -[code]#sycl# on host and device with the same precision -guarantees as defined in the OpenCL 1.2 specification document -<> for host and device. For a SYCL platform the -numerical requirements for host need to match the numerical -requirements of the OpenCL math built-in functions. +In SYCL the OpenCL math functions are available in the namespace [code]#sycl# on +host and device with the same precision guarantees as defined in the OpenCL 1.2 +specification document <> for host and device. +For a SYCL platform the numerical requirements for host need to match the +numerical requirements of the OpenCL math built-in functions. The built-in functions available for SYCL host and device, with the same precision requirements for both host and device, are described in <>. -The function descriptions in this section use the term _writeable address -space_ to represent the following address spaces: +The function descriptions in this section use the term _writeable address space_ +to represent the following address spaces: * [code]#access::address_space::global_space# * [code]#access::address_space::local_space# @@ -23201,9 +23274,9 @@ corresponding [code]#vec#. === Native precision math functions -In SYCL the implementation-defined precision math functions are -defined in the namespace [code]#sycl::native#. The functions -that are available within this namespace are specified in +In SYCL the implementation-defined precision math functions are defined in the +namespace [code]#sycl::native#. +The functions that are available within this namespace are specified in <>. The range of valid input values and the maximum error for these functions is @@ -23664,12 +23737,12 @@ corresponding [code]#vec#. === Half precision math functions -In SYCL the half precision math functions are defined in -the namespace [code]#sycl::half_precision#. The functions that are -available within this namespace are specified in -<>. These functions are -implemented with a minimum of 10-bits of accuracy i.e. the maximum error is -less than or equal to 8192 ulp. +In SYCL the half precision math functions are defined in the namespace +[code]#sycl::half_precision#. +The functions that are available within this namespace are specified in +<>. +These functions are implemented with a minimum of 10-bits of accuracy i.e. the +maximum error is less than or equal to 8192 ulp. [[table.half.math.functions]] .Half precision math functions @@ -24145,8 +24218,8 @@ corresponding [code]#vec#. <> describes the integer math functions that are available in the [code]#sycl# namespace in both host and device code. -The function descriptions in this section use the term _generic integer type_ -to represent the following types: +The function descriptions in this section use the term _generic integer type_ to +represent the following types: * [code]#char# * [code]#signed char# @@ -24993,10 +25066,11 @@ corresponding [code]#vec#. === Common functions -In SYCL the OpenCL [keyword]#common functions# are available in the -namespace [code]#sycl# on host and device as defined in the -OpenCL 1.2 specification document <>. They -are described here in <>. +In SYCL the OpenCL [keyword]#common functions# are available in the namespace +[code]#sycl# on host and device as defined in the OpenCL 1.2 specification +document <>. +They are described here in <>. The function descriptions in this section use the term _generic floating point type_ to represent the following types: @@ -25456,18 +25530,19 @@ corresponding [code]#vec#. [[sec:geometric-functions]] === Geometric functions -In SYCL the OpenCL [keyword]#geometric functions# are available in the -namespace [code]#sycl# on host and device as defined in the OpenCL 1.2 -specification document <>. On the host the -vector types use the [code]#vec# class and on an SYCL device use the -corresponding native <> vector types. All of the geometric functions -use round-to-nearest-even rounding mode. -<> contains the definitions of supported -geometric functions. +In SYCL the OpenCL [keyword]#geometric functions# are available in the namespace +[code]#sycl# on host and device as defined in the OpenCL 1.2 specification +document <>. +On the host the vector types use the [code]#vec# class and on an SYCL device use +the corresponding native <> vector types. +All of the geometric functions use round-to-nearest-even rounding mode. +<> contains the definitions of supported geometric +functions. -The function descriptions in this section use two terms that refer to a -specific list of types. The term _generic geometric type_ represents the -following types: +The function descriptions in this section use two terms that refer to a specific +list of types. +The term _generic geometric type_ represents the following types: * [code]#float# * [code]#double# @@ -25478,20 +25553,20 @@ following types: * [code]#vec#, where [code]#N# is 2, 3, or 4 * [code]#vec#, where [code]#N# is 2, 3, or 4 * [code]#vec#, where [code]#N# is 2, 3, or 4 -* [code]#+__swizzled_vec__+# that is convertible to [code]#vec#, - where [code]#N# is 2, 3, or 4 +* [code]#+__swizzled_vec__+# that is convertible to [code]#vec#, where + [code]#N# is 2, 3, or 4 * [code]#+__swizzled_vec__+# that is convertible to [code]#vec#, where [code]#N# is 2, 3, or 4 -* [code]#+__swizzled_vec__+# that is convertible to [code]#vec#, - where [code]#N# is 2, 3, or 4 +* [code]#+__swizzled_vec__+# that is convertible to [code]#vec#, where + [code]#N# is 2, 3, or 4 The term _float geometric type_ represents these types: * [code]#float# * [code]#marray#, where [code]#N# is 2, 3, or 4 * [code]#vec#, where [code]#N# is 2, 3, or 4 -* [code]#+__swizzled_vec__+# that is convertible to [code]#vec#, - where [code]#N# is 2, 3, or 4 +* [code]#+__swizzled_vec__+# that is convertible to [code]#vec#, where + [code]#N# is 2, 3, or 4 [[table.geometric.functions]] .Geometric functions @@ -25714,10 +25789,10 @@ else with the following exceptions: -- - . If the sum of squares is greater than [code]#FLT_MAX# then the - value of the floating-point values in the result vector are undefined. - . If the sum of squares is less than [code]#FLT_MIN# then the - implementation may return back [code]#p#. + . If the sum of squares is greater than [code]#FLT_MAX# then the value of the + floating-point values in the result vector are undefined. + . If the sum of squares is less than [code]#FLT_MIN# then the implementation + may return back [code]#p#. -- The return type is [code]#GeoFloat# unless [code]#GeoFloat# is the @@ -25729,19 +25804,19 @@ corresponding [code]#vec#. === Relational functions The functions in <> are defined in the [code]#sycl# -namespace and are available on both host and device. These functions perform -various relational comparisons on [code]#vec#, [code]#marray#, and scalar -types. +namespace and are available on both host and device. +These functions perform various relational comparisons on [code]#vec#, +[code]#marray#, and scalar types. The comparisons performed by [code]#isequal#, [code]#isgreater#, [code]#isgreaterequal#, [code]#isless#, [code]#islessequal#, and -[code]#islessgreater# are false when one or both operands are NaN. The -comparison performed by [code]#isnotequal# is true when one or both operands +[code]#islessgreater# are false when one or both operands are NaN. +The comparison performed by [code]#isnotequal# is true when one or both operands are NaN. -The function descriptions in this section use two terms that refer to a -specific list of types. The term _generic scalar type_ represents the -following types: +The function descriptions in this section use two terms that refer to a specific +list of types. +The term _generic scalar type_ represents the following types: * [code]#char# * [code]#signed char# diff --git a/adoc/chapters/references.adoc b/adoc/chapters/references.adoc index 750c3980..542d14f1 100644 --- a/adoc/chapters/references.adoc +++ b/adoc/chapters/references.adoc @@ -3,7 +3,8 @@ = References [[cpp17]] International Organization for Standardization (ISO). -"`Programming Languages — {cpp}`". ISO/IEC 14882:2017, 2017. +"`Programming Languages — {cpp}`". +ISO/IEC 14882:2017, 2017. [[dr2325]] International Organization for Standardization (ISO). @@ -27,6 +28,5 @@ https://www.khronos.org/registry/OpenCL/specs/opencl-2.0.pdf . [[cpp20]] International Organization for Standardization (ISO). -" Programming Languages — {cpp}, Langages de programmation -— C++ ", International Standard ISO/IEC 14882:2020(E), -Sixth edition 2020-12, 2020. +" Programming Languages — {cpp}, Langages de programmation — C++ ", +International Standard ISO/IEC 14882:2020(E), Sixth edition 2020-12, 2020. diff --git a/adoc/chapters/what_changed.adoc b/adoc/chapters/what_changed.adoc index a1954808..798fded4 100644 --- a/adoc/chapters/what_changed.adoc +++ b/adoc/chapters/what_changed.adoc @@ -7,66 +7,70 @@ [[sec:what-changed-between]] == What has changed from SYCL 1.2.1 to SYCL 2020 -The SYCL runtime moved from namespace [code]#cl::sycl# provided -by [code]#{hash}include # to namespace [code]#sycl# -provided by [code]#{hash}include # as explained in -<>. The old header file is still -available for compatibility with SYCL 1.2.1 applications. +The SYCL runtime moved from namespace [code]#cl::sycl# provided by +[code]#{hash}include # to namespace [code]#sycl# provided by +[code]#{hash}include # as explained in +<>. +The old header file is still available for compatibility with SYCL 1.2.1 +applications. The SYCL specification is now based on the core language of {cpp17}, as -described in <>. Features of -{cpp17} are now enabled within the specification, such as deduction guides -for class template argument deduction. +described in <>. +Features of {cpp17} are now enabled within the specification, such as deduction +guides for class template argument deduction. Naming of lambda functions passed to kernel invocations is now optional. Changes to buffers, images and accessors: - * The [code]#image# class has been removed. There are now new classes - [code]#unsampled_image# and [code]#sampled_image# which represent sampled - and unsampled images. The [code]#sampler# class has been removed and - replaced with the new [code]#image_sampler# structure. + * The [code]#image# class has been removed. + There are now new classes [code]#unsampled_image# and [code]#sampled_image# + which represent sampled and unsampled images. + The [code]#sampler# class has been removed and replaced with the new + [code]#image_sampler# structure. * Support for image arrays has been removed. * The type name [code]#access::target# has been deprecated and replaced with the type [code]#target#. - * The type name [code]#access::mode# has been deprecated and replaced with - the type [code]#access_mode#. + * The type name [code]#access::mode# has been deprecated and replaced with the + type [code]#access_mode#. - * The name of the [code]#accessor# target [code]#target::global_buffer# - has been deprecated and replaced with [code]#target::device#. + * The name of the [code]#accessor# target [code]#target::global_buffer# has + been deprecated and replaced with [code]#target::device#. - * Support for the [code]#accessor# target [code]#target::host_buffer# has - been deprecated. There is now a new accessor class [code]#host_accessor# - which provides equivalent functionality. + * Support for the [code]#accessor# target [code]#target::host_buffer# has been + deprecated. + There is now a new accessor class [code]#host_accessor# which provides + equivalent functionality. - * The [code]#buffer# member functions which return an [code]#accessor# of - type [code]#target::host_buffer# have been deprecated. A new member - function [code]#get_host_access()# has been added which returns a - [code]#host_accessor#. + * The [code]#buffer# member functions which return an [code]#accessor# of type + [code]#target::host_buffer# have been deprecated. + A new member function [code]#get_host_access()# has been added which returns + a [code]#host_accessor#. * The [code]#buffer# class has a new variadic overload of the [code]#get_access()# member function which allows construction of an [code]#accessor# with various parameters. * Support for the [code]#accessor# target [code]#target::local# has been - deprecated. There is now a new accessor class [code]#local_accessor# which - provides equivalent functionality. + deprecated. + There is now a new accessor class [code]#local_accessor# which provides + equivalent functionality. * Support for the [code]#accessor# targets [code]#target::image# and - [code]#target::host_image# have been removed. There are now new accessor - classes for sampled and unsampled images: [code]#sampled_image_accessor#, - [code]#host_sampled_image_accessor#, [code]#unsampled_image_accessor# and - [code]#host_unsampled_image_accessor#. + [code]#target::host_image# have been removed. + There are now new accessor classes for sampled and unsampled images: + [code]#sampled_image_accessor#, [code]#host_sampled_image_accessor#, + [code]#unsampled_image_accessor# and [code]#host_unsampled_image_accessor#. * A new [code]#accessor# target [code]#target::host_task# has been added, which allows access to a [code]#buffer# from a <>. * Support for the [code]#accessor# modes [code]#access_mode::discard_write# - and [code]#access_mode::discard_read_write# has been deprecated. Accessors - can now be constructed with a property list, and the new property + and [code]#access_mode::discard_read_write# has been deprecated. + Accessors can now be constructed with a property list, and the new property [code]#property::no_init# provides equivalent functionality. * Support for the [code]#accessor# mode [code]#access_mode::atomic# and the @@ -75,18 +79,18 @@ Changes to buffers, images and accessors: * Support for the [code]#accessor# template parameter [code]#isPlaceholder# has been deprecated, and the value of this parameter no longer has any - bearing on whether the accessor is a placeholder. The enumerated type - [code]#access::placeholder# is also deprecated. A placeholder - accessor can now be constructed by calling the appropriate constructor, - without regard to the template parameter. + bearing on whether the accessor is a placeholder. + The enumerated type [code]#access::placeholder# is also deprecated. + A placeholder accessor can now be constructed by calling the appropriate + constructor, without regard to the template parameter. * The return type of [code]#accessor::is_placeholder()# is no longer [code]#constexpr#. * The member function [code]#handler::require()# may now be called on any [code]#accessor# with target [code]#target::device#, - [code]#target::constant_buffer# or [code]#target::host_task#, regardless - of whether it is a placeholder. + [code]#target::constant_buffer# or [code]#target::host_task#, regardless of + whether it is a placeholder. * New [code]#accessor# constructors have been added which take a type tag parameter, which allows the class template parameters to be inferred via @@ -99,44 +103,50 @@ Changes to buffers, images and accessors: * The [code]#accessor# template parameters [code]#Dimensions# and [code]#AccessMode# now have default values, so the only required template - parameter is [code]#DataT#. Moreover, the default access mode is either - [code]#access_mode::read_write# or [code]#access_mode::read#, - depending on the constness of the [code]#DataT# type. This makes it - possible to declare a read-only accessor by simply using a [code]#const# - qualified type. + parameter is [code]#DataT#. + Moreover, the default access mode is either [code]#access_mode::read_write# + or [code]#access_mode::read#, depending on the constness of the + [code]#DataT# type. + This makes it possible to declare a read-only accessor by simply using a + [code]#const# qualified type. * Implicit conversions have been added between the two forms of read-only [code]#accessor# (one form has [code]#const DataT# and [code]#access_mode::read# and the other has non-const [code]#DataT# and - [code]#access_mode::read#). There is also an implicit conversion from - a read-write [code]#accessor# to either of the read-only forms. + [code]#access_mode::read#). + There is also an implicit conversion from a read-write [code]#accessor# to + either of the read-only forms. - * Member functions of [code]#accessor# which return a reference to an - element have been changed to return a [code]#const# reference for - read-only accessors. The [code]#get_pointer()# member function has also - been changed to return a [code]#const# pointer for read-only accessors. + * Member functions of [code]#accessor# which return a reference to an element + have been changed to return a [code]#const# reference for read-only + accessors. + The [code]#get_pointer()# member function has also been changed to return a + [code]#const# pointer for read-only accessors. The [code]#value_type# and [code]#reference# member types of [code]#accessor# have been changed to be [code]#const# types for read-only accessors. * The [code]#accessor# class now meets the {cpp} requirement of - [code]#ReversibleContainer#. This includes (but is not limited to) - returning [code]#begin# and [code]#end# iterators, specifying a default - constructible accessor that can be passed to a kernel but not dereferenced, - and making them equality comparable. + [code]#ReversibleContainer#. + This includes (but is not limited to) returning [code]#begin# and + [code]#end# iterators, specifying a default constructible accessor that can + be passed to a kernel but not dereferenced, and making them equality + comparable. * Many of the [code]#accessor# member functions have been marked [code]#noexcept#. - * A <> is no longer allowed to read elements that are - outside of its range; attempting to do so produces undefined behavior. + * A <> is no longer allowed to read elements that are outside + of its range; attempting to do so produces undefined behavior. * The semantics of the subscript operator have been changed for a - <> which has an offset. Calling [code]#operator[](0)# now - returns a reference to the first element in the range, rather than a - reference to the first element in the underlying buffer. + <> which has an offset. + Calling [code]#operator[](0)# now returns a reference to the first element + in the range, rather than a reference to the first element in the underlying + buffer. - * The behavior of buffers and accessors with a zero-sized range has been clarified. + * The behavior of buffers and accessors with a zero-sized range has been + clarified. Constant memory no longer appears in the SYCL device memory model in SYCL 2020. @@ -144,7 +154,7 @@ The {cpp} attributes that decorate kernels are now better described, and their position has changed so that they are applied directly to the kernel function. (Previously, they were applied to a device function that the kernel calls, and the implementation needed to propagate the information up to the enclosing -kernel.) The old {cpp} attribute form is no longer included in the SYCL +kernel.) The old {cpp} attribute form is no longer included in the SYCL specification. Changes to the built-in functions specified in <>: @@ -153,189 +163,198 @@ Changes to the built-in functions specified in <>: these functions, and it now lists the exact synopsis for each function. * The return type of the integer [code]#abs# and [code]#abs_diff# functions - has changed. The return type is now the same as the input type, matching - the {cpp} [code]#std::abs# function. + has changed. + The return type is now the same as the input type, matching the {cpp} + [code]#std::abs# function. - * The geometric functions specified in <> now - support the [code]#half# data type. + * The geometric functions specified in <> now support + the [code]#half# data type. * The [code]#ctz# function was added to <>. * The specification of [code]#clz# was clarified for the case when the input is zero. -The classes [code]#vector_class#, [code]#string_class#, -[code]#function_class#, [code]#mutex_class#, -[code]#shared_ptr_class#, [code]#weak_ptr_class#, -[code]#hash_class# and [code]#exception_ptr_class# have been -removed from the API and the standard classes -[code]#std::vector#, [code]#std::string#, -[code]#std::function#, [code]#std::mutex#, -[code]#std::shared_ptr#, [code]#std::weak_ptr#, -[code]#std::hash# and [code]#std::exception_ptr# are used +The classes [code]#vector_class#, [code]#string_class#, [code]#function_class#, +[code]#mutex_class#, [code]#shared_ptr_class#, [code]#weak_ptr_class#, +[code]#hash_class# and [code]#exception_ptr_class# have been removed from the +API and the standard classes [code]#std::vector#, [code]#std::string#, +[code]#std::function#, [code]#std::mutex#, [code]#std::shared_ptr#, +[code]#std::weak_ptr#, [code]#std::hash# and [code]#std::exception_ptr# are used instead. -The specific [code]#sycl::buffer# API taking -[code]#std::unique_ptr# has been removed. The behavior is the -same as in SYCL 1.2.1 but with a simplified API. Since there is still -the API taking [code]#std::shared_ptr# and there is an implicit -conversion from a [code]#std::unique_ptr# prvalue to a -[code]#std::shared_ptr#, the API can still be used as before with -a [code]#std::unique_ptr# to give away memory ownership. - -Offsets to [code]#parallel_for#, [code]#nd_range#, [code]#nd_item# and [code]#item# classes have been deprecated. -As such, the parallel iteration spaces all begin at [code]#(0,0,0)# and developers are now required to handle any offset arithmetic themselves. -The behavior of [code]#nd_item.get_global_linear_id()# and [code]#nd_item.get_local_linear_id()# has been clarified accordingly. - -Unified Shared Memory (USM), in <>, has been added as a pointer-based strategy -for data management. It defines several types of allocations with various -accessibility rules for host and devices. USM is meant to complement -buffers, not replace them. - -The [code]#queue# class received a new [code]#property# -that requires in-order semantics for a queue where operations are -executed in the order in which they are submitted. - -The [code]#queue# class received several new member functions to -invoke kernels directly on a queue objects instead of inside a -command group handler in the [code]#submit# member function. - -The [code]#queue# constructor overloads that accept both a [code]#context# and -a [code]#device# parameter have been broadened to allow the device to be either -a device that is in the context or a <> of a device that is -in the context. +The specific [code]#sycl::buffer# API taking [code]#std::unique_ptr# has been +removed. +The behavior is the same as in SYCL 1.2.1 but with a simplified API. +Since there is still the API taking [code]#std::shared_ptr# and there is an +implicit conversion from a [code]#std::unique_ptr# prvalue to a +[code]#std::shared_ptr#, the API can still be used as before with a +[code]#std::unique_ptr# to give away memory ownership. + +Offsets to [code]#parallel_for#, [code]#nd_range#, [code]#nd_item# and +[code]#item# classes have been deprecated. +As such, the parallel iteration spaces all begin at [code]#(0,0,0)# and +developers are now required to handle any offset arithmetic themselves. +The behavior of [code]#nd_item.get_global_linear_id()# and +[code]#nd_item.get_local_linear_id()# has been clarified accordingly. + +Unified Shared Memory (USM), in <>, has been added as a pointer-based +strategy for data management. +It defines several types of allocations with various accessibility rules for +host and devices. +USM is meant to complement buffers, not replace them. + +The [code]#queue# class received a new [code]#property# that requires in-order +semantics for a queue where operations are executed in the order in which they +are submitted. + +The [code]#queue# class received several new member functions to invoke kernels +directly on a queue objects instead of inside a command group handler in the +[code]#submit# member function. + +The [code]#queue# constructor overloads that accept both a [code]#context# and a +[code]#device# parameter have been broadened to allow the device to be either a +device that is in the context or a <> of a device that is in +the context. The [code]#program# class has been removed and replaced with a new class [code]#kernel_bundle#, which provides similar functionality in a type-safe and -thread-safe way. The [code]#kernel# class has changed, and some member -functions have been removed. +thread-safe way. +The [code]#kernel# class has changed, and some member functions have been +removed. Support has been added for <>, which allow a <> to use constant variables whose values -aren't known until the kernel is invoked. A <> can now -take an optional parameter of type [code]#kernel_handler#, which allows the -kernel to read the values of +aren't known until the kernel is invoked. +A <> can now take an optional parameter of type +[code]#kernel_handler#, which allows the kernel to read the values of <>. -The constructors for SYCL [code]#context# and [code]#queue# -are made [code]#explicit# to prevent ambiguities in the selected -constructor resulting from implicit type conversion. +The constructors for SYCL [code]#context# and [code]#queue# are made +[code]#explicit# to prevent ambiguities in the selected constructor resulting +from implicit type conversion. -The requirement for {cpp} standard layout for data shared between host -and devices has been relaxed. SYCL now requires data shared between -host and devices to be <> as defined <>. +The requirement for {cpp} standard layout for data shared between host and +devices has been relaxed. +SYCL now requires data shared between host and devices to be <> +as defined <>. -The concept of a <> of <> was generalized to include -<> and <>. A <> is represented -by the [code]#sycl::group# class as in SYCL 1.2.1, and a <> -is represented by the new [code]#sycl::sub_group# class. +The concept of a <> of <> was generalized to +include <> and <>. +A <> is represented by the [code]#sycl::group# class as in SYCL +1.2.1, and a <> is represented by the new [code]#sycl::sub_group# +class. -The [code]#host_task# member function for the [code]#queue# has been -introduced for en-queueing <> on a <> to schedule the +The [code]#host_task# member function for the [code]#queue# has been introduced +for en-queueing <> on a <> to schedule the <> to invoke native {cpp} functions, conforming to the SYCL memory -model. <> also support interoperability with the native -<> objects associated at that point in the DAG using -the optional [code]#interop_handle# class. - -A library of algorithms based on the {cpp17} algorithms library -was introduced in <>. These algorithms -provide a simple way for developers to apply common parallel algorithms -using the work-items of a group. - -The definition of the [code]#sycl::group# class was modified to -support the new group functions in <>. +model. +<> also support interoperability with the native +<> objects associated at that point in the DAG using the optional +[code]#interop_handle# class. + +A library of algorithms based on the {cpp17} algorithms library was introduced +in <>. +These algorithms provide a simple way for developers to apply common parallel +algorithms using the work-items of a group. + +The definition of the [code]#sycl::group# class was modified to support the new +group functions in <>. New member types and variables were added to enable generic programming, and member functions were updated to encapsulate all functionality tied to -<> in the [code]#sycl::group# class. See -<> for details. +<> in the [code]#sycl::group# class. +See <> for details. The [code]#barrier# and [code]#mem_fence# member functions of the -[code]#nd_item# class have been removed. The [code]#barrier# member -function has been replaced by the [code]#group_barrier()# function, which -can be used to block work-items in either <> or -<> until all work-items in the group arrive at the -barrier. The [code]#mem_fence# member function has been replaced by the +[code]#nd_item# class have been removed. +The [code]#barrier# member function has been replaced by the +[code]#group_barrier()# function, which can be used to block work-items in +either <> or <> until all +work-items in the group arrive at the barrier. +The [code]#mem_fence# member function has been replaced by the [code]#atomic_fence# function, which is more closely aligned with -[code]#std::atomic_thread_fence# and offers control over memory ordering -and scope. +[code]#std::atomic_thread_fence# and offers control over memory ordering and +scope. -Changes in the SYCL [code]#vec# class described in -<>: +Changes in the SYCL [code]#vec# class described in <>: * [code]#operator[]# was added; * unary [code]#pass:[operator+()]# and [code]#operator-()# were added; -The device selection now relies on a simpler API based on ranking -functions used as <> described in -<>. +The device selection now relies on a simpler API based on ranking functions used +as <> described in <>. -A new device selector utility has been added to <>, -the [code]#aspect_selector#, which returns a selector object -that only selects devices that have all the requested aspects. +A new device selector utility has been added to <>, the +[code]#aspect_selector#, which returns a selector object that only selects +devices that have all the requested aspects. -The device query [code]#info::fp_config::correctly_rounded_divide_sqrt# has -been deprecated. +The device query [code]#info::fp_config::correctly_rounded_divide_sqrt# has been +deprecated. A new reduction library consisting of the [code]#reduction# function and [code]#reducer# class was introduced to simplify the expression of variables -with <> semantics in SYCL kernels. See <>. +with <> semantics in SYCL kernels. +See <>. The [code]#atomic# class from SYCL 1.2.1 was deprecated in favor of a new [code]#atomic_ref# interface. The SYCL exception class hierarchy has been condensed into a single exception type: [code]#exception#. -[code]#exception# now derives from -[code]#std::exception#. The variety of errors are now provided via error -codes, which aligns with the {cpp} error code mechanism. +[code]#exception# now derives from [code]#std::exception#. +The variety of errors are now provided via error codes, which aligns with the +{cpp} error code mechanism. The new error code mechanism now also generalizes the previous -[code]#get_cl_code# interface to provide a generic interface way for -querying backend-specific error codes. - -Default asynchronous error handling behavior is now defined, so that asynchronous -errors will cause abnormal program termination even if a user-defined -asynchronous handler function is not defined. This prevents asynchronous errors -from being silently lost during early stages of application development. - -Kernel invocation functions, such as [code]#parallel_for#, now take -kernel functions by [code]#const# reference. Kernel functions must now have -a [code]#const#-qualified [code]#operator()#, and are allowed to be copied zero -or more times by an implementation. These clarifications allow implementations -to have flexibility for specific devices, and define what users should expect -with kernel functors. Specifically, kernel functors can not be marked as -[code]#mutable#, and sharing of data between work-items should not be -attempted through state stored within a kernel functor. - -A new concept called device <> has been added, which tells the set -of optional features a device supports. This new mechanism replaces the -[code]#has_extension()# function and some uses of [code]#get_info()#. - -There is a new <> which describes how extensions -to the SYCL language can be added by vendors and by the Khronos Group. - -A [code]#queue# constructor has been added that takes both a -[code]#device# and [code]#context#, to simplify interfacing -with libraries. - -The [code]#parallel_for# interface has been simplified in some forms -to accept a braced initializer list in place of a [code]#range#, and -to always take [code]#item# arguments. Kernel invocation functions have -also been modified to accept generic lambda expressions. Implicit conversions -from one-dimensional [code]#item# and one-dimensional [code]#id# to scalar types -have been defined. All of these modifications lead to simpler SYCL code in common -use cases. +[code]#get_cl_code# interface to provide a generic interface way for querying +backend-specific error codes. + +Default asynchronous error handling behavior is now defined, so that +asynchronous errors will cause abnormal program termination even if a +user-defined asynchronous handler function is not defined. +This prevents asynchronous errors from being silently lost during early stages +of application development. + +Kernel invocation functions, such as [code]#parallel_for#, now take kernel +functions by [code]#const# reference. +Kernel functions must now have a [code]#const#-qualified [code]#operator()#, and +are allowed to be copied zero or more times by an implementation. +These clarifications allow implementations to have flexibility for specific +devices, and define what users should expect with kernel functors. +Specifically, kernel functors can not be marked as [code]#mutable#, and sharing +of data between work-items should not be attempted through state stored within a +kernel functor. + +A new concept called device <> has been added, which tells the +set of optional features a device supports. +This new mechanism replaces the [code]#has_extension()# function and some uses +of [code]#get_info()#. + +There is a new <> which describes how extensions to the SYCL +language can be added by vendors and by the Khronos Group. + +A [code]#queue# constructor has been added that takes both a [code]#device# and +[code]#context#, to simplify interfacing with libraries. + +The [code]#parallel_for# interface has been simplified in some forms to accept a +braced initializer list in place of a [code]#range#, and to always take +[code]#item# arguments. +Kernel invocation functions have also been modified to accept generic lambda +expressions. +Implicit conversions from one-dimensional [code]#item# and one-dimensional +[code]#id# to scalar types have been defined. +All of these modifications lead to simpler SYCL code in common use cases. The behaviour of executing a kernel over a [code]#range# or [code]#nd_range# with index space of zero has been clarified. -Some device-specific queries have been renamed to more clearly be "`device-specific -kernel`" [code]#get_info# queries ([code]#info::kernel_device_specific#) -instead of "`work-group`" ([code]#get_workgroup_info#) and sub-group -([code]#get_sub_group_info#) queries. +Some device-specific queries have been renamed to more clearly be +"`device-specific kernel`" [code]#get_info# queries +([code]#info::kernel_device_specific#) instead of "`work-group`" +([code]#get_workgroup_info#) and sub-group ([code]#get_sub_group_info#) queries. -A new math array type [code]#marray# has been defined to begin disambiguation -of the multiple possible interpretations of how [code]#sycl::vec# should be +A new math array type [code]#marray# has been defined to begin disambiguation of +the multiple possible interpretations of how [code]#sycl::vec# should be interpreted and implemented. Changes in SYCL address spaces: @@ -344,40 +363,40 @@ Changes in SYCL address spaces: * the generic address space was introduced; * the constant address space was deprecated; * behavior of unannotated pointer/reference (raw pointer/reference) is now - dependent on the compilation mode. The compiler can either interpret - unannotated pointer/reference has addressing the generic address space - or to be deduced; - * some ambiguities in the address space deduction were clarified. Notably - that deduced type does not affect the user-provided type. + dependent on the compilation mode. + The compiler can either interpret unannotated pointer/reference has + addressing the generic address space or to be deduced; + * some ambiguities in the address space deduction were clarified. + Notably that deduced type does not affect the user-provided type. Changes in [code]#multi_ptr# interface: - * addition of [code]#access::address_space::generic_space# to represent - the generic address space; + * addition of [code]#access::address_space::generic_space# to represent the + generic address space; * deprecation of [code]#access::address_space::constant_space#; * an extra template parameter to allow to select a flavor of the - [code]#multi_ptr# interface. There are now 3 different interfaces: - ** interface exposing undecorated types. Returned pointer and reference - are not annotated by an address space; - ** interface exposing decorated types. Returned pointer and reference are - annotated by an address space; + [code]#multi_ptr# interface. + There are now 3 different interfaces: + ** interface exposing undecorated types. + Returned pointer and reference are not annotated by an address space; + ** interface exposing decorated types. + Returned pointer and reference are annotated by an address space; ** legacy 1.2.1 interface (deprecated). * deprecation of the 1.2.1 interface; * deprecation of [code]#constant_ptr#; - * [code]#global_ptr#, [code]#local_ptr# and - [code]#private_ptr# alias take the new extra parameter; - * addition of the [code]#address_space_cast# free function to cast - undecorated pointer to [code]#multi_pointer#; - * addition of construction/conversion operator for the generic address - space; + * [code]#global_ptr#, [code]#local_ptr# and [code]#private_ptr# alias take the + new extra parameter; + * addition of the [code]#address_space_cast# free function to cast undecorated + pointer to [code]#multi_pointer#; + * addition of construction/conversion operator for the generic address space; * removal of the constructor and assignment operator taking an unannotated pointer; - * implicit conversion to a pointer is now deprecated. [code]#get# should - be used instead; + * implicit conversion to a pointer is now deprecated. + [code]#get# should be used instead; * the return type of the member function [code]#get# now depends on the selected interface. - * addition of the member function [code]#get_raw# which returns the - underlying pointer as an unannotated pointer; + * addition of the member function [code]#get_raw# which returns the underlying + pointer as an unannotated pointer; * addition of the member function [code]#get_decorated# which returns the underlying pointer as an annotated pointer; * addition of the subscript operator providing random access. @@ -386,64 +405,63 @@ The [code]#cl::sycl::byte# has been deprecated and now the {cpp17} [code]#std::byte# should be used instead. A SYCL implementation is no longer required to provide a host device. -Instead, an implementation is only required to provide at least one -device. Implementations are still allowed to provide devices that are -implemented on the host, but it is no longer required. The specification -no longer defines any special semantics for a "host device" and APIs -specific to the host device have been removed. +Instead, an implementation is only required to provide at least one device. +Implementations are still allowed to provide devices that are implemented on the +host, but it is no longer required. +The specification no longer defines any special semantics for a "host device" +and APIs specific to the host device have been removed. The default constructors for the [code]#device# and [code]#platform# classes have been changed to construct a copy of the default device and a copy of the -platform containing the default device. Previously, they returned a copy of -the host device and a copy of the platform containing the host device. The -default constructor for the [code]#event# class has also been changed to +platform containing the default device. +Previously, they returned a copy of the host device and a copy of the platform +containing the host device. +The default constructor for the [code]#event# class has also been changed to construct an event that comes from a default-constructed [code]#queue#. Previously, it constructed an event that used the host backend. -Explicit copy functions of the handler class -have also been introduced to the queue class as shortcuts for the handler ones. -This is enabled by the improved placeholder accessors -to help reduce code verbosity in certain cases -because the shortcut functions implicitly create a command group -and call [code]#handler::require#. +Explicit copy functions of the handler class have also been introduced to the +queue class as shortcuts for the handler ones. +This is enabled by the improved placeholder accessors to help reduce code +verbosity in certain cases because the shortcut functions implicitly create a +command group and call [code]#handler::require#. Information query descriptors have been changed to structures under namespaces -named accordingly. [code]#param_traits# has been removed and the return type of -an information query is now contained in the descriptor. -The [code]#sycl::info::device::max_work_item_sizes# is now a -template that takes a dimension parameter corresponding to the number of -dimensions of the work-item size maxima. +named accordingly. +[code]#param_traits# has been removed and the return type of an information +query is now contained in the descriptor. +The [code]#sycl::info::device::max_work_item_sizes# is now a template that takes +a dimension parameter corresponding to the number of dimensions of the work-item +size maxima. Changes to retrieving size information: - * all [code]#get_size()# member functions have been deprecated - and replaced with [code]#byte_size()#, which is marked [code]#noexcept#; - * all [code]#get_count()# member functions have been deprecated - and replaced with [code]#size()#, which is marked [code]#noexcept#; - * in the [code]#vec# class the functions [code]#byte_size()# and [code]#size()# - are now static member functions; - * in the [code]#stream# class [code]#get_size()# has been deprecated - in favor of [code]#size()#, - whereas [code]#stream::byte_size()# is not available; - * accessors for sampled and unsampled images only define [code]#size()# - and not [code]#byte_size()#. + * all [code]#get_size()# member functions have been deprecated and replaced + with [code]#byte_size()#, which is marked [code]#noexcept#; + * all [code]#get_count()# member functions have been deprecated and replaced + with [code]#size()#, which is marked [code]#noexcept#; + * in the [code]#vec# class the functions [code]#byte_size()# and + [code]#size()# are now static member functions; + * in the [code]#stream# class [code]#get_size()# has been deprecated in favor + of [code]#size()#, whereas [code]#stream::byte_size()# is not available; + * accessors for sampled and unsampled images only define [code]#size()# and + not [code]#byte_size()#. The device descriptors [code]#info::device::max_constant_buffer_size# and [code]#info::device::max_constant_args# are deprecated in SYCL 2020. -The [code]#buffer_allocator# is now templated on the data type -and follows the C++ named requirement [code]#Allocator#. +The [code]#buffer_allocator# is now templated on the data type and follows the +C++ named requirement [code]#Allocator#. // Expose various workarounds showing how to typeset +, ++ and -- The -The SYCL [code]#id# and [code]#range# have now unary -pass:quotes[[code\]#+#] and [code]#-# operations, prefix -[code]#++# and [code]#--# operations, postfix -pass:quotes[[code\]#++#] and pass:quotes[[code\]#--#] operations which -were forgotten in SYCL 1.2.1. +The SYCL [code]#id# and [code]#range# have now unary pass:quotes[[code\]#+#] and +[code]#-# operations, prefix [code]#++# and [code]#--# +operations, postfix pass:quotes[[code\]#++#] and pass:quotes[[code\]#--#] +operations which were forgotten in SYCL 1.2.1. In SYCL 1.2.1, the [code]#handler::copy()# overload with two [code]#accessor# parameters did not clearly specify which accessor's size determines the amount -of memory that is copied. The spec now clarifies that the [code]#src# -accessor's size is used. +of memory that is copied. +The spec now clarifies that the [code]#src# accessor's size is used. // %%%%%%%%%%%%%%%%%%%%%%%%%%%% end what_changed %%%%%%%%%%%%%%%%%%%%%%%%%%%%