Skip to content

Commit

Permalink
[SYCL][NFC] Remove more explicit "cl::" references (intel#6507)
Browse files Browse the repository at this point in the history
Previous patch was implemented by splitting off piece of a bigger patch by
completely eliminating "cl" namespace and then addressing the local failures.
Local testing didn't cover all possible platforms so these occurrences were left
untouched.

This is a wider application of grep/sed, but it's still not complete as some
instances of "cl" namespace references cannot be eliminated in an NFC
change (e.g. everything affecting/affected by mangling as in clang or some
tools). The reason I'm committing these changes separately is to ease the review
of the actual non-NFC PR later.
  • Loading branch information
aelovikov-intel authored Aug 3, 2022
1 parent 1a8bb53 commit 433ea5c
Show file tree
Hide file tree
Showing 45 changed files with 377 additions and 391 deletions.
4 changes: 2 additions & 2 deletions clang/include/clang/Basic/AttrDocs.td
Original file line number Diff line number Diff line change
Expand Up @@ -378,7 +378,7 @@ outlining job:

int foo(int x) { return ++x; }

using namespace cl::sycl;
using namespace sycl;
queue Q;
buffer<int, 1> a(range<1>{1024});
Q.submit([&](handler& cgh) {
Expand Down Expand Up @@ -3790,7 +3790,7 @@ cannot be optimized out due to reachability analysis or by any other
optimization.

This attribute allows to pass name and address of the function to a special
``cl::sycl::intel::get_device_func_ptr`` API call which extracts the device
``sycl::intel::get_device_func_ptr`` API call which extracts the device
function pointer for the specified function.

.. code-block:: c++
Expand Down
2 changes: 1 addition & 1 deletion llvm/lib/SYCLLowerIR/LowerWGScope.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -780,7 +780,7 @@ PreservedAnalyses SYCLLowerWGScopePass::run(Function &F,
I = I->getNextNode()) {
auto *AllocaI = dyn_cast<AllocaInst>(I);
// Allocas marked with "work_item_scope" are those originating from
// cl::sycl::private_memory<T> variables, which must be in private memory.
// sycl::private_memory<T> variables, which must be in private memory.
// No shadows/materialization is needed for them because they can be
// updated only within PFWIs
if (AllocaI && !AllocaI->getMetadata(WI_SCOPE_MD))
Expand Down
4 changes: 2 additions & 2 deletions sycl/doc/EnvironmentVariables.md
Original file line number Diff line number Diff line change
Expand Up @@ -8,7 +8,7 @@ compiler and runtime.
| Environment variable | Values | Description |
| -------------------- | ------ | ----------- |
| `SYCL_BE` (deprecated) | `PI_OPENCL`, `PI_LEVEL_ZERO`, `PI_CUDA` | Force SYCL RT to consider only devices of the specified backend during the device selection. We are planning to deprecate `SYCL_BE` environment variable in the future. The specific grace period is not decided yet. Please use the new env var `SYCL_DEVICE_FILTER` instead. |
| `SYCL_DEVICE_TYPE` (deprecated) | CPU, GPU, ACC, HOST | Force SYCL to use the specified device type. If unset, default selection rules are applied. If set to any unlisted value, this control has no effect. If the requested device type is not found, a `cl::sycl::runtime_error` exception is thrown. If a non-default device selector is used, a device must satisfy both the selector and this control to be chosen. This control only has effect on devices created with a selector. We are planning to deprecate `SYCL_DEVICE_TYPE` environment variable in the future. The specific grace period is not decided yet. Please use the new env var `SYCL_DEVICE_FILTER` instead. |
| `SYCL_DEVICE_TYPE` (deprecated) | CPU, GPU, ACC, HOST | Force SYCL to use the specified device type. If unset, default selection rules are applied. If set to any unlisted value, this control has no effect. If the requested device type is not found, a `sycl::runtime_error` exception is thrown. If a non-default device selector is used, a device must satisfy both the selector and this control to be chosen. This control only has effect on devices created with a selector. We are planning to deprecate `SYCL_DEVICE_TYPE` environment variable in the future. The specific grace period is not decided yet. Please use the new env var `SYCL_DEVICE_FILTER` instead. |
| `SYCL_DEVICE_FILTER` | `backend:device_type:device_num` | See Section [`SYCL_DEVICE_FILTER`](#sycl_device_filter) below. |
| `SYCL_DEVICE_ALLOWLIST` | See [below](#sycl_device_allowlist) | Filter out devices that do not match the pattern specified. `BackendName` accepts `host`, `opencl`, `level_zero` or `cuda`. `DeviceType` accepts `host`, `cpu`, `gpu` or `acc`. `DeviceVendorId` accepts uint32_t in hex form (`0xXYZW`). `DriverVersion`, `PlatformVersion`, `DeviceName` and `PlatformName` accept regular expression. Special characters, such as parenthesis, must be escaped. DPC++ runtime will select only those devices which satisfy provided values above and regex. More than one device can be specified using the piping symbol "\|".|
| `SYCL_DISABLE_PARALLEL_FOR_RANGE_ROUNDING` | Any(\*) | Disables automatic rounding-up of `parallel_for` invocation ranges. |
Expand Down Expand Up @@ -107,7 +107,7 @@ variables in production code.</span>
| `SYCL_DEVICELIB_INHIBIT_NATIVE` | String of device library extensions (separated by a whitespace) | Do not rely on device native support for devicelib extensions listed in this option. |
| `SYCL_PROGRAM_COMPILE_OPTIONS` | String of valid OpenCL compile options | Override compile options for all programs. |
| `SYCL_PROGRAM_LINK_OPTIONS` | String of valid OpenCL link options | Override link options for all programs. |
| `SYCL_USE_KERNEL_SPV` | Path to the SPIR-V binary | Load device image from the specified file. If runtime is unable to read the file, `cl::sycl::runtime_error` exception is thrown.|
| `SYCL_USE_KERNEL_SPV` | Path to the SPIR-V binary | Load device image from the specified file. If runtime is unable to read the file, `sycl::runtime_error` exception is thrown.|
| `SYCL_DUMP_IMAGES` | Any(\*) | Dump device image binaries to file. Control has no effect if `SYCL_USE_KERNEL_SPV` is set. |
| `SYCL_HOST_UNIFIED_MEMORY` | Integer | Enforce host unified memory support or lack of it for the execution graph builder. If set to 0, it is enforced as not supported by all devices. If set to 1, it is enforced as supported by all devices. |
| `SYCL_CACHE_TRACE` | Any(\*) | If the variable is set, messages are sent to std::cerr when caching events or non-blocking failures happen (e.g. unable to access cache item file). |
Expand Down
2 changes: 1 addition & 1 deletion sycl/doc/FAQ.md
Original file line number Diff line number Diff line change
Expand Up @@ -128,7 +128,7 @@ C:\Program Files (x86)\Windows Kits\10\include\10.0.18362.0\ucrt\crtdbg.h(607,26
> beyond those explicitly mentioned as usable in kernels in this spec.
Replace usage of STD built-ins with SYCL-defined math built-ins. Please, note
that you have to explicitly specify built-in namespace (i.e. `cl::sycl::fmin`).
that you have to explicitly specify built-in namespace (i.e. `sycl::fmin`).
The full list of SYCL math built-ins is provided in section 4.13.3 of the
specification.

Expand Down
46 changes: 23 additions & 23 deletions sycl/doc/GetStartedGuide.md
Original file line number Diff line number Diff line change
Expand Up @@ -557,29 +557,29 @@ Creating a file `simple-sycl-app.cpp` with the following C++/SYCL code:
int main() {
// Creating buffer of 4 ints to be used inside the kernel code
cl::sycl::buffer<cl::sycl::cl_int, 1> Buffer(4);
sycl::buffer<sycl::cl_int, 1> Buffer(4);
// Creating SYCL queue
cl::sycl::queue Queue;
sycl::queue Queue;
// Size of index space for kernel
cl::sycl::range<1> NumOfWorkItems{Buffer.size()};
sycl::range<1> NumOfWorkItems{Buffer.size()};
// Submitting command group(work) to queue
Queue.submit([&](cl::sycl::handler &cgh) {
Queue.submit([&](sycl::handler &cgh) {
// Getting write only access to the buffer on a device
auto Accessor = Buffer.get_access<cl::sycl::access::mode::write>(cgh);
auto Accessor = Buffer.get_access<sycl::access::mode::write>(cgh);
// Executing kernel
cgh.parallel_for<class FillBuffer>(
NumOfWorkItems, [=](cl::sycl::id<1> WIid) {
NumOfWorkItems, [=](sycl::id<1> WIid) {
// Fill buffer with indexes
Accessor[WIid] = (cl::sycl::cl_int)WIid.get(0);
Accessor[WIid] = (sycl::cl_int)WIid.get(0);
});
});
// Getting read only access to the buffer on the host.
// Implicit barrier waiting for queue to complete the work.
const auto HostAccessor = Buffer.get_access<cl::sycl::access::mode::read>();
const auto HostAccessor = Buffer.get_access<sycl::access::mode::read>();
// Check the results
bool MismatchFound = false;
Expand Down Expand Up @@ -704,36 +704,36 @@ SYCL_BE=PI_CUDA ./simple-sycl-app-cuda.exe
```
**NOTE**: DPC++/SYCL developers can specify SYCL device for execution using
device selectors (e.g. `cl::sycl::cpu_selector`, `cl::sycl::gpu_selector`,
device selectors (e.g. `sycl::cpu_selector`, `sycl::gpu_selector`,
[Intel FPGA selector(s)](extensions/supported/sycl_ext_intel_fpga_device_selector.md)) as
explained in following section [Code the program for a specific
GPU](#code-the-program-for-a-specific-gpu).
### Code the program for a specific GPU
To specify OpenCL device SYCL provides the abstract `cl::sycl::device_selector`
To specify OpenCL device SYCL provides the abstract `sycl::device_selector`
class which the can be used to define how the runtime should select the best
device.
The method `cl::sycl::device_selector::operator()` of the SYCL
`cl::sycl::device_selector` is an abstract member function which takes a
The method `sycl::device_selector::operator()` of the SYCL
`sycl::device_selector` is an abstract member function which takes a
reference to a SYCL device and returns an integer score. This abstract member
function can be implemented in a derived class to provide a logic for selecting
a SYCL device. SYCL runtime uses the device for with the highest score is
returned. Such object can be passed to `cl::sycl::queue` and `cl::sycl::device`
returned. Such object can be passed to `sycl::queue` and `sycl::device`
constructors.
The example below illustrates how to use `cl::sycl::device_selector` to create
The example below illustrates how to use `sycl::device_selector` to create
device and queue objects bound to Intel GPU device:
```c++
#include <sycl/sycl.hpp>
int main() {
class NEOGPUDeviceSelector : public cl::sycl::device_selector {
class NEOGPUDeviceSelector : public sycl::device_selector {
public:
int operator()(const cl::sycl::device &Device) const override {
using namespace cl::sycl::info;
int operator()(const sycl::device &Device) const override {
using namespace sycl::info;
const std::string DeviceName = Device.get_info<device::name>();
const std::string DeviceVendor = Device.get_info<device::vendor>();
Expand All @@ -744,9 +744,9 @@ int main() {
NEOGPUDeviceSelector Selector;
try {
cl::sycl::queue Queue(Selector);
cl::sycl::device Device(Selector);
} catch (cl::sycl::invalid_parameter_error &E) {
sycl::queue Queue(Selector);
sycl::device Device(Selector);
} catch (sycl::invalid_parameter_error &E) {
std::cout << E.what() << std::endl;
}
}
Expand All @@ -757,10 +757,10 @@ The device selector below selects an NVIDIA device only, and won't execute if
there is none.
```c++
class CUDASelector : public cl::sycl::device_selector {
class CUDASelector : public sycl::device_selector {
public:
int operator()(const cl::sycl::device &Device) const override {
using namespace cl::sycl::info;
int operator()(const sycl::device &Device) const override {
using namespace sycl::info;
const std::string DriverVersion = Device.get_info<device::driver_version>();
if (Device.is_gpu() && (DriverVersion.find("CUDA") != std::string::npos)) {
Expand Down
4 changes: 2 additions & 2 deletions sycl/doc/MultiTileCardWithLevelZero.md
Original file line number Diff line number Diff line change
Expand Up @@ -46,8 +46,8 @@ The root-device in such cases can be partitioned to sub-devices, each correspond
``` C++
try {
vector<device> SubDevices = RootDevice.create_sub_devices<
cl::sycl::info::partition_property::partition_by_affinity_domain>(
cl::sycl::info::partition_affinity_domain::next_partitionable);
sycl::info::partition_property::partition_by_affinity_domain>(
sycl::info::partition_affinity_domain::next_partitionable);
}
```

Expand Down
8 changes: 4 additions & 4 deletions sycl/doc/design/CompilerAndRuntimeDesign.md
Original file line number Diff line number Diff line change
Expand Up @@ -90,7 +90,7 @@ work:
int foo(int x) { return ++x; }
int bar(int x) { throw std::exception{"CPU code only!"}; }
...
using namespace cl::sycl;
using namespace sycl;
queue Q;
buffer<int, 1> a{range<1>{1024}};
Q.submit([&](handler& cgh) {
Expand All @@ -103,17 +103,17 @@ Q.submit([&](handler& cgh) {
```
In this example, the compiler needs to compile the lambda expression passed
to the `cl::sycl::handler::parallel_for` method, as well as the function `foo`
to the `sycl::handler::parallel_for` method, as well as the function `foo`
called from the lambda expression for the device.
The compiler must also ignore the `bar` function when we compile the
"device" part of the single source code, as it's unused inside the device
portion of the source code (the contents of the lambda expression passed to the
`cl::sycl::handler::parallel_for` and any function called from this lambda
`sycl::handler::parallel_for` and any function called from this lambda
expression).
The current approach is to use the SYCL kernel attribute in the runtime to
mark code passed to `cl::sycl::handler::parallel_for` as "kernel functions".
mark code passed to `sycl::handler::parallel_for` as "kernel functions".
The runtime library can't mark foo as "device" code - this is a compiler
job: to traverse all symbols accessible from kernel functions and add them to
the "device part" of the code marking them with the new SYCL device attribute.
Expand Down
8 changes: 4 additions & 4 deletions sycl/doc/design/KernelParameterPassing.md
Original file line number Diff line number Diff line change
Expand Up @@ -60,7 +60,7 @@ int main()

myQueue.submit([&](handler &cgh) {
auto outAcc = outBuf.get_access<access::mode::write>(cgh);
cgh.parallel_for<class Worker>(num_items, [=](cl::sycl::id<1> index) {
cgh.parallel_for<class Worker>(num_items, [=](sycl::id<1> index) {
outAcc[index] = i + s.m;
});
});
Expand Down Expand Up @@ -192,7 +192,7 @@ are copied into the array within the local capture object.

myQueue.submit([&](handler &cgh) {
auto outAcc = outBuf.get_access<access::mode::write>(cgh);
cgh.parallel_for<class Worker>(num_items, [=](cl::sycl::id<1> index) {
cgh.parallel_for<class Worker>(num_items, [=](sycl::id<1> index) {
outAcc[index] = array[index.get(0)];
});
});
Expand Down Expand Up @@ -264,7 +264,7 @@ of each accessor array element in ascending index value.
in_buffer2.get_access<access::mode::read>(cgh)};
auto outAcc = out_buffer.get_access<access::mode::write>(cgh);

cgh.parallel_for<class Worker>(num_items, [=](cl::sycl::id<1> index) {
cgh.parallel_for<class Worker>(num_items, [=](sycl::id<1> index) {
outAcc[index] = inAcc[0][index] + inAcc[1][index];
});
});
Expand Down Expand Up @@ -356,7 +356,7 @@ in a manner similar to other instances of accessor arrays.
};
auto outAcc = out_buffer.get_access<access::mode::write>(cgh);

cgh.parallel_for<class Worker>(num_items, [=](cl::sycl::id<1> index) {
cgh.parallel_for<class Worker>(num_items, [=](sycl::id<1> index) {
outAcc[index] = s.m + s.inAcc[0][index] + s.inAcc[1][index];
});
});
Expand Down
16 changes: 8 additions & 8 deletions sycl/doc/design/LinkedAllocations.md
Original file line number Diff line number Diff line change
Expand Up @@ -9,33 +9,33 @@ Instead, memory is allocated in each context whenever the SYCL memory object
is first accessed there:

```
cl::sycl::buffer<int, 1> buf{cl::sycl::range<1>(1)}; // No allocation here
sycl::buffer<int, 1> buf{sycl::range<1>(1)}; // No allocation here
cl::sycl::queue q;
q.submit([&](cl::sycl::handler &cgh){
sycl::queue q;
q.submit([&](sycl::handler &cgh){
// First access to buf in q's context: allocate memory
auto acc = buf.get_access<cl::sycl::access::mode::read_write>(cgh);
auto acc = buf.get_access<sycl::access::mode::read_write>(cgh);
...
});
// First access to buf on host (assuming q is not host): allocate memory
auto acc = buf.get_access<cl::sycl::access::mode::read_write>();
auto acc = buf.get_access<sycl::access::mode::read_write>();
```

In the DPCPP execution graph these allocations are represented by allocation
command nodes (`cl::sycl::detail::AllocaCommand`). A finished allocation
command nodes (`sycl::detail::AllocaCommand`). A finished allocation
command means that the associated memory object is ready for its first use in
that context, but for host allocation commands it might be the case that no
actual memory allocation takes place: either because it is possible to reuse the
data pointer provided by the user:

```
int val;
cl::sycl::buffer<int, 1> buf{&val, cl::sycl::range<1>(1)};
sycl::buffer<int, 1> buf{&val, sycl::range<1>(1)};
// An alloca command is created, but it does not allocate new memory: &val
// is reused instead.
auto acc = buf.get_access<cl::sycl::access::mode::read_write>();
auto acc = buf.get_access<sycl::access::mode::read_write>();
```

Or because a mapped host pointer obtained from a native device memory object
Expand Down
10 changes: 5 additions & 5 deletions sycl/doc/design/OptionalDeviceFeatures.md
Original file line number Diff line number Diff line change
Expand Up @@ -403,9 +403,9 @@ name. The format looks like this:

```
!intel_types_that_use_aspects = !{!0, !1, !2}
!0 = !{!"class.cl::sycl::detail::half_impl::half", i32 8}
!1 = !{!"class.cl::sycl::amx_type", i32 9}
!2 = !{!"class.cl::sycl::other_type", i32 8, i32 9}
!0 = !{!"class.sycl::detail::half_impl::half", i32 8}
!1 = !{!"class.sycl::amx_type", i32 9}
!2 = !{!"class.sycl::other_type", i32 8, i32 9}
```

The value of the `!intel_types_that_use_aspects` metadata is a list of unnamed
Expand All @@ -415,8 +415,8 @@ starts with a string giving the name of the type which is followed by a list of
`i32` constants where each constant is a value from `enum class aspect` telling
the numerical value of an aspect from the type's
`[[sycl_detail::uses_aspects()]]` attribute. In the example above, the type
`cl::sycl::detail::half_impl::half` uses an aspect whose numerical value is
`8` and the type `cl::sycl::other_type` uses two aspects `8` and `9`.
`sycl::detail::half_impl::half` uses an aspect whose numerical value is
`8` and the type `sycl::other_type` uses two aspects `8` and `9`.

**NOTE**: The reason we choose this representation is because LLVM IR does not
allow metadata to be attached directly to types. This representation works
Expand Down
2 changes: 1 addition & 1 deletion sycl/doc/design/SYCLInstrumentationUsingXPTI.md
Original file line number Diff line number Diff line change
Expand Up @@ -97,7 +97,7 @@ language constructs (2) The instrumentation that handles capturing the
relevant metadata.

1. In order to capture end-user source code information, we have implemented
`cl::sycl::detail::code_location` class that uses the builtin functions
`sycl::detail::code_location` class that uses the builtin functions
in the compiler. However, equivalent implementations are unavailable on
Windows and separate cross-platform implementation might be used in the
future. To mitigate this, the Windows implementation will always report
Expand Down
18 changes: 9 additions & 9 deletions sycl/doc/design/SpecializationConstants.md
Original file line number Diff line number Diff line change
Expand Up @@ -34,10 +34,10 @@ struct A {
struct POD {
A a[2];
// FIXME: cl::sycl::vec class is not a POD type in our implementation by some
// FIXME: sycl::vec class is not a POD type in our implementation by some
// reason, but there are no limitations for vector types from spec constatns
// design point of view.
cl::sycl::vec<int, 2> b;
sycl::vec<int, 2> b;
};
class MyInt32Const;
Expand All @@ -55,16 +55,16 @@ class MyPODConst;
sycl::ONEAPI::experimental::spec_constant<int32_t, MyInt32Const> i32 =
p.set_spec_constant<MyInt32Const>(rt_val);
cl::sycl::ONEAPI::experimental::spec_constant<POD, MyPODConst> pod =
sycl::ONEAPI::experimental::spec_constant<POD, MyPODConst> pod =
p.set_spec_constant<MyPODConst>(gold);
p.build_with_kernel_type<MyKernel>();
sycl::buffer<int, 1> buf(vec.data(), vec.size());
sycl::buffer<POD, 1> buf(vec_pod.data(), vec_pod.size());
q.submit([&](cl::sycl::handler &cgh) {
auto acc = buf.get_access<cl::sycl::access::mode::write>(cgh);
auto acc_pod = buf.get_access<cl::sycl::access::mode::write>(cgh);
q.submit([&](sycl::handler &cgh) {
auto acc = buf.get_access<sycl::access::mode::write>(cgh);
auto acc_pod = buf.get_access<sycl::access::mode::write>(cgh);
cgh.single_task<MyKernel>(
p.get_kernel<MyKernel>(),
[=]() {
Expand Down Expand Up @@ -169,7 +169,7 @@ struct A {
sycl::ONEAPI::experimental::spec_constant<int32_t, MyInt32Const> i32 =
p.set_spec_constant<MyInt32Const>(rt_val);
cl::sycl::ONEAPI::experimental::spec_constant<A, MyPODConst> pod =
sycl::ONEAPI::experimental::spec_constant<A, MyPODConst> pod =
p.set_spec_constant<MyPODConst>(gold);
// ...
i32.get();
Expand Down Expand Up @@ -335,7 +335,7 @@ struct A {
sycl::ONEAPI::experimental::spec_constant<int32_t, MyInt32Const> i32 =
p.set_spec_constant<MyInt32Const>(rt_val);
cl::sycl::ONEAPI::experimental::spec_constant<A, MyPODConst> pod =
sycl::ONEAPI::experimental::spec_constant<A, MyPODConst> pod =
p.set_spec_constant<MyPODConst>(gold);
// ...
i32.get();
Expand Down Expand Up @@ -428,7 +428,7 @@ specialization constant:
```
// user code:
class MyIn32Constant;
cl::sycl::ONEAPI::experimental::spec_constant<int, MyInt32Const> i32(0);
sycl::ONEAPI::experimental::spec_constant<int, MyInt32Const> i32(0);
// integration header:
template <> struct sycl::detail::SpecConstantInfo<::MyInt32Const> {
static constexpr const char* getName() {
Expand Down
4 changes: 2 additions & 2 deletions sycl/doc/design/fpga_io_pipes_design.rst
Original file line number Diff line number Diff line change
Expand Up @@ -13,9 +13,9 @@ Requirements
static constexpr unsigned id = ID;
};
using ethernet_read_pipe =
cl::sycl::intel::kernel_readable_io_pipe<ethernet_pipe_id<0>, int, 0>;
sycl::intel::kernel_readable_io_pipe<ethernet_pipe_id<0>, int, 0>;
using ethernet_write_pipe =
cl::sycl::intel::kernel_writeable_io_pipe<ethernet_pipe_id<1>, int, 0>;
sycl::intel::kernel_writeable_io_pipe<ethernet_pipe_id<1>, int, 0>;
}
Thus, the user interacts only with vendor-defined pipe objects.
Expand Down
Loading

0 comments on commit 433ea5c

Please sign in to comment.