[SYCL][NFC] Remove more explicit "cl::" references (intel#6507)

Previous patch was implemented by splitting off piece of a bigger patch by completely eliminating "cl" namespace and then addressing the local failures. Local testing didn't cover all possible platforms so these occurrences were left untouched. This is a wider application of grep/sed, but it's still not complete as some instances of "cl" namespace references cannot be eliminated in an NFC change (e.g. everything affecting/affected by mangling as in clang or some tools). The reason I'm committing these changes separately is to ease the review of the actual non-NFC PR later.
XinWang10 · Aug 3, 2022 · 433ea5c · 433ea5c
1 parent 1a8bb53
commit 433ea5c
Show file tree

Hide file tree

Showing 45 changed files with 377 additions and 391 deletions.
diff --git a/clang/include/clang/Basic/AttrDocs.td b/clang/include/clang/Basic/AttrDocs.td
@@ -378,7 +378,7 @@ outlining job:
 
   int foo(int x) { return ++x; }
 
-  using namespace cl::sycl;
+  using namespace sycl;
   queue Q;
   buffer<int, 1> a(range<1>{1024});
   Q.submit([&](handler& cgh) {
@@ -3790,7 +3790,7 @@ cannot be optimized out due to reachability analysis or by any other
 optimization.
 
 This attribute allows to pass name and address of the function to a special
-``cl::sycl::intel::get_device_func_ptr`` API call which extracts the device
+``sycl::intel::get_device_func_ptr`` API call which extracts the device
 function pointer for the specified function.
 
 .. code-block:: c++

diff --git a/llvm/lib/SYCLLowerIR/LowerWGScope.cpp b/llvm/lib/SYCLLowerIR/LowerWGScope.cpp
@@ -780,7 +780,7 @@ PreservedAnalyses SYCLLowerWGScopePass::run(Function &F,
          I = I->getNextNode()) {
       auto *AllocaI = dyn_cast<AllocaInst>(I);
       // Allocas marked with "work_item_scope" are those originating from
-      // cl::sycl::private_memory<T> variables, which must be in private memory.
+      // sycl::private_memory<T> variables, which must be in private memory.
       // No shadows/materialization is needed for them because they can be
       // updated only within PFWIs
       if (AllocaI && !AllocaI->getMetadata(WI_SCOPE_MD))

diff --git a/sycl/doc/EnvironmentVariables.md b/sycl/doc/EnvironmentVariables.md
@@ -8,7 +8,7 @@ compiler and runtime.
 | Environment variable | Values | Description |
 | -------------------- | ------ | ----------- |
 | `SYCL_BE` (deprecated) | `PI_OPENCL`, `PI_LEVEL_ZERO`, `PI_CUDA` | Force SYCL RT to consider only devices of the specified backend during the device selection. We are planning to deprecate `SYCL_BE` environment variable in the future. The specific grace period is not decided yet. Please use the new env var `SYCL_DEVICE_FILTER` instead. |
-| `SYCL_DEVICE_TYPE` (deprecated) | CPU, GPU, ACC, HOST | Force SYCL to use the specified device type. If unset, default selection rules are applied. If set to any unlisted value, this control has no effect. If the requested device type is not found, a `cl::sycl::runtime_error` exception is thrown. If a non-default device selector is used, a device must satisfy both the selector and this control to be chosen. This control only has effect on devices created with a selector. We are planning to deprecate `SYCL_DEVICE_TYPE` environment variable in the future. The specific grace period is not decided yet. Please use the new env var `SYCL_DEVICE_FILTER` instead. |
+| `SYCL_DEVICE_TYPE` (deprecated) | CPU, GPU, ACC, HOST | Force SYCL to use the specified device type. If unset, default selection rules are applied. If set to any unlisted value, this control has no effect. If the requested device type is not found, a `sycl::runtime_error` exception is thrown. If a non-default device selector is used, a device must satisfy both the selector and this control to be chosen. This control only has effect on devices created with a selector. We are planning to deprecate `SYCL_DEVICE_TYPE` environment variable in the future. The specific grace period is not decided yet. Please use the new env var `SYCL_DEVICE_FILTER` instead. |
 | `SYCL_DEVICE_FILTER` | `backend:device_type:device_num` | See Section [`SYCL_DEVICE_FILTER`](#sycl_device_filter)  below. |
 | `SYCL_DEVICE_ALLOWLIST` | See [below](#sycl_device_allowlist) | Filter out devices that do not match the pattern specified. `BackendName` accepts `host`, `opencl`, `level_zero` or `cuda`. `DeviceType` accepts `host`, `cpu`, `gpu` or `acc`. `DeviceVendorId` accepts uint32_t in hex form (`0xXYZW`). `DriverVersion`, `PlatformVersion`, `DeviceName` and `PlatformName` accept regular expression. Special characters, such as parenthesis, must be escaped. DPC++ runtime will select only those devices which satisfy provided values above and regex. More than one device can be specified using the piping symbol "\|".|
 | `SYCL_DISABLE_PARALLEL_FOR_RANGE_ROUNDING` | Any(\*) | Disables automatic rounding-up of `parallel_for` invocation ranges. |
@@ -107,7 +107,7 @@ variables in production code.</span>
 | `SYCL_DEVICELIB_INHIBIT_NATIVE` | String of device library extensions (separated by a whitespace) | Do not rely on device native support for devicelib extensions listed in this option. |
 | `SYCL_PROGRAM_COMPILE_OPTIONS` | String of valid OpenCL compile options | Override compile options for all programs. |
 | `SYCL_PROGRAM_LINK_OPTIONS` | String of valid OpenCL link options | Override link options for all programs. |
-| `SYCL_USE_KERNEL_SPV` | Path to the SPIR-V binary | Load device image from the specified file. If runtime is unable to read the file, `cl::sycl::runtime_error` exception is thrown.|
+| `SYCL_USE_KERNEL_SPV` | Path to the SPIR-V binary | Load device image from the specified file. If runtime is unable to read the file, `sycl::runtime_error` exception is thrown.|
 | `SYCL_DUMP_IMAGES` | Any(\*) | Dump device image binaries to file. Control has no effect if `SYCL_USE_KERNEL_SPV` is set. |
 | `SYCL_HOST_UNIFIED_MEMORY` | Integer | Enforce host unified memory support or lack of it for the execution graph builder. If set to 0, it is enforced as not supported by all devices. If set to 1, it is enforced as supported by all devices. |
 | `SYCL_CACHE_TRACE` | Any(\*) | If the variable is set, messages are sent to std::cerr when caching events or non-blocking failures happen (e.g. unable to access cache item file). |

diff --git a/sycl/doc/FAQ.md b/sycl/doc/FAQ.md
@@ -128,7 +128,7 @@ C:\Program Files (x86)\Windows Kits\10\include\10.0.18362.0\ucrt\crtdbg.h(607,26
 >  beyond those explicitly mentioned as usable in kernels in this spec.
 
 Replace usage of STD built-ins with SYCL-defined math built-ins. Please, note
-that you have to explicitly specify built-in namespace (i.e. `cl::sycl::fmin`).
+that you have to explicitly specify built-in namespace (i.e. `sycl::fmin`).
 The full list of SYCL math built-ins is provided in section 4.13.3 of the
 specification.
 

diff --git a/sycl/doc/GetStartedGuide.md b/sycl/doc/GetStartedGuide.md
@@ -557,29 +557,29 @@ Creating a file `simple-sycl-app.cpp` with the following C++/SYCL code:
 
 int main() {
   // Creating buffer of 4 ints to be used inside the kernel code
-  cl::sycl::buffer<cl::sycl::cl_int, 1> Buffer(4);
+  sycl::buffer<sycl::cl_int, 1> Buffer(4);
 
   // Creating SYCL queue
-  cl::sycl::queue Queue;
+  sycl::queue Queue;
 
   // Size of index space for kernel
-  cl::sycl::range<1> NumOfWorkItems{Buffer.size()};
+  sycl::range<1> NumOfWorkItems{Buffer.size()};
 
   // Submitting command group(work) to queue
-  Queue.submit([&](cl::sycl::handler &cgh) {
+  Queue.submit([&](sycl::handler &cgh) {
     // Getting write only access to the buffer on a device
-    auto Accessor = Buffer.get_access<cl::sycl::access::mode::write>(cgh);
+    auto Accessor = Buffer.get_access<sycl::access::mode::write>(cgh);
     // Executing kernel
     cgh.parallel_for<class FillBuffer>(
-        NumOfWorkItems, [=](cl::sycl::id<1> WIid) {
+        NumOfWorkItems, [=](sycl::id<1> WIid) {
           // Fill buffer with indexes
-          Accessor[WIid] = (cl::sycl::cl_int)WIid.get(0);
+          Accessor[WIid] = (sycl::cl_int)WIid.get(0);
         });
   });
 
   // Getting read only access to the buffer on the host.
   // Implicit barrier waiting for queue to complete the work.
-  const auto HostAccessor = Buffer.get_access<cl::sycl::access::mode::read>();
+  const auto HostAccessor = Buffer.get_access<sycl::access::mode::read>();
 
   // Check the results
   bool MismatchFound = false;
@@ -704,36 +704,36 @@ SYCL_BE=PI_CUDA ./simple-sycl-app-cuda.exe
 ```
 
 **NOTE**: DPC++/SYCL developers can specify SYCL device for execution using
-device selectors (e.g. `cl::sycl::cpu_selector`, `cl::sycl::gpu_selector`,
+device selectors (e.g. `sycl::cpu_selector`, `sycl::gpu_selector`,
 [Intel FPGA selector(s)](extensions/supported/sycl_ext_intel_fpga_device_selector.md)) as
 explained in following section [Code the program for a specific
 GPU](#code-the-program-for-a-specific-gpu).
 
 ### Code the program for a specific GPU
 
-To specify OpenCL device SYCL provides the abstract `cl::sycl::device_selector`
+To specify OpenCL device SYCL provides the abstract `sycl::device_selector`
 class which the can be used to define how the runtime should select the best
 device.
 
-The method `cl::sycl::device_selector::operator()` of the SYCL
-`cl::sycl::device_selector` is an abstract member function which takes a
+The method `sycl::device_selector::operator()` of the SYCL
+`sycl::device_selector` is an abstract member function which takes a
 reference to a SYCL device and returns an integer score. This abstract member
 function can be implemented in a derived class to provide a logic for selecting
 a SYCL device. SYCL runtime uses the device for with the highest score is
-returned. Such object can be passed to `cl::sycl::queue` and `cl::sycl::device`
+returned. Such object can be passed to `sycl::queue` and `sycl::device`
 constructors.
 
-The example below illustrates how to use `cl::sycl::device_selector` to create
+The example below illustrates how to use `sycl::device_selector` to create
 device and queue objects bound to Intel GPU device:
 
 ```c++
 #include <sycl/sycl.hpp>
 
 int main() {
-  class NEOGPUDeviceSelector : public cl::sycl::device_selector {
+  class NEOGPUDeviceSelector : public sycl::device_selector {
   public:
-    int operator()(const cl::sycl::device &Device) const override {
-      using namespace cl::sycl::info;
+    int operator()(const sycl::device &Device) const override {
+      using namespace sycl::info;
 
       const std::string DeviceName = Device.get_info<device::name>();
       const std::string DeviceVendor = Device.get_info<device::vendor>();
@@ -744,9 +744,9 @@ int main() {
 
   NEOGPUDeviceSelector Selector;
   try {
-    cl::sycl::queue Queue(Selector);
-    cl::sycl::device Device(Selector);
-  } catch (cl::sycl::invalid_parameter_error &E) {
+    sycl::queue Queue(Selector);
+    sycl::device Device(Selector);
+  } catch (sycl::invalid_parameter_error &E) {
     std::cout << E.what() << std::endl;
   }
 }
@@ -757,10 +757,10 @@ The device selector below selects an NVIDIA device only, and won't execute if
 there is none.
 
 ```c++
-class CUDASelector : public cl::sycl::device_selector {
+class CUDASelector : public sycl::device_selector {
   public:
-    int operator()(const cl::sycl::device &Device) const override {
-      using namespace cl::sycl::info;
+    int operator()(const sycl::device &Device) const override {
+      using namespace sycl::info;
       const std::string DriverVersion = Device.get_info<device::driver_version>();
 
       if (Device.is_gpu() && (DriverVersion.find("CUDA") != std::string::npos)) {

diff --git a/sycl/doc/MultiTileCardWithLevelZero.md b/sycl/doc/MultiTileCardWithLevelZero.md
@@ -46,8 +46,8 @@ The root-device in such cases can be partitioned to sub-devices, each correspond
 ``` C++	
 try {
   vector<device> SubDevices = RootDevice.create_sub_devices<
-  cl::sycl::info::partition_property::partition_by_affinity_domain>(
-  cl::sycl::info::partition_affinity_domain::next_partitionable);
+  sycl::info::partition_property::partition_by_affinity_domain>(
+  sycl::info::partition_affinity_domain::next_partitionable);
 }
 ```
 

diff --git a/sycl/doc/design/CompilerAndRuntimeDesign.md b/sycl/doc/design/CompilerAndRuntimeDesign.md
@@ -90,7 +90,7 @@ work:
 int foo(int x) { return ++x; }
 int bar(int x) { throw std::exception{"CPU code only!"}; }
 ...
-using namespace cl::sycl;
+using namespace sycl;
 queue Q;
 buffer<int, 1> a{range<1>{1024}};
 Q.submit([&](handler& cgh) {
@@ -103,17 +103,17 @@ Q.submit([&](handler& cgh) {
 ```
 
 In this example, the compiler needs to compile the lambda expression passed
-to the `cl::sycl::handler::parallel_for` method, as well as the function `foo`
+to the `sycl::handler::parallel_for` method, as well as the function `foo`
 called from the lambda expression for the device.
 
 The compiler must also ignore the `bar` function when we compile the
 "device" part of the single source code, as it's unused inside the device
 portion of the source code (the contents of the lambda expression passed to the
-`cl::sycl::handler::parallel_for` and any function called from this lambda
+`sycl::handler::parallel_for` and any function called from this lambda
 expression).
 
 The current approach is to use the SYCL kernel attribute in the runtime to
-mark code passed to `cl::sycl::handler::parallel_for` as "kernel functions".
+mark code passed to `sycl::handler::parallel_for` as "kernel functions".
 The runtime library can't mark foo as "device" code - this is a compiler
 job: to traverse all symbols accessible from kernel functions and add them to
 the "device part" of the code marking them with the new SYCL device attribute.

diff --git a/sycl/doc/design/KernelParameterPassing.md b/sycl/doc/design/KernelParameterPassing.md
@@ -60,7 +60,7 @@ int main()
 
   myQueue.submit([&](handler &cgh) {
     auto outAcc = outBuf.get_access<access::mode::write>(cgh);
-    cgh.parallel_for<class Worker>(num_items, [=](cl::sycl::id<1> index) {
+    cgh.parallel_for<class Worker>(num_items, [=](sycl::id<1> index) {
       outAcc[index] = i + s.m;
     });
   });
@@ -192,7 +192,7 @@ are copied into the array within the local capture object.
 
   myQueue.submit([&](handler &cgh) {
     auto outAcc = outBuf.get_access<access::mode::write>(cgh);
-    cgh.parallel_for<class Worker>(num_items, [=](cl::sycl::id<1> index) {
+    cgh.parallel_for<class Worker>(num_items, [=](sycl::id<1> index) {
       outAcc[index] = array[index.get(0)];
     });
   });
@@ -264,7 +264,7 @@ of each accessor array element in ascending index value.
                          in_buffer2.get_access<access::mode::read>(cgh)};
     auto outAcc = out_buffer.get_access<access::mode::write>(cgh);
 
-    cgh.parallel_for<class Worker>(num_items, [=](cl::sycl::id<1> index) {
+    cgh.parallel_for<class Worker>(num_items, [=](sycl::id<1> index) {
       outAcc[index] = inAcc[0][index] + inAcc[1][index];
     });
   });
@@ -356,7 +356,7 @@ in a manner similar to other instances of accessor arrays.
    };
    auto outAcc = out_buffer.get_access<access::mode::write>(cgh);
 
-   cgh.parallel_for<class Worker>(num_items, [=](cl::sycl::id<1> index) {
+   cgh.parallel_for<class Worker>(num_items, [=](sycl::id<1> index) {
      outAcc[index] = s.m + s.inAcc[0][index] + s.inAcc[1][index];
    });
 });

diff --git a/sycl/doc/design/LinkedAllocations.md b/sycl/doc/design/LinkedAllocations.md
@@ -9,33 +9,33 @@ Instead, memory is allocated in each context whenever the SYCL memory object
 is first accessed there:
 
 ```
-  cl::sycl::buffer<int, 1> buf{cl::sycl::range<1>(1)}; // No allocation here
+  sycl::buffer<int, 1> buf{sycl::range<1>(1)}; // No allocation here
 
-  cl::sycl::queue q;
-  q.submit([&](cl::sycl::handler &cgh){
+  sycl::queue q;
+  q.submit([&](sycl::handler &cgh){
     // First access to buf in q's context: allocate memory
-    auto acc = buf.get_access<cl::sycl::access::mode::read_write>(cgh);
+    auto acc = buf.get_access<sycl::access::mode::read_write>(cgh);
 	...
   });
 
   // First access to buf on host (assuming q is not host): allocate memory
-  auto acc = buf.get_access<cl::sycl::access::mode::read_write>();
+  auto acc = buf.get_access<sycl::access::mode::read_write>();
 ```
 
 In the DPCPP execution graph these allocations are represented by allocation
-command nodes (`cl::sycl::detail::AllocaCommand`). A finished allocation
+command nodes (`sycl::detail::AllocaCommand`). A finished allocation
 command means that the associated memory object is ready for its first use in
 that context, but for host allocation commands it might be the case that no
 actual memory allocation takes place: either because it is possible to reuse the
 data pointer provided by the user:
 
 ```
   int val;
-  cl::sycl::buffer<int, 1> buf{&val, cl::sycl::range<1>(1)};
+  sycl::buffer<int, 1> buf{&val, sycl::range<1>(1)};
 
   // An alloca command is created, but it does not allocate new memory: &val
   // is reused instead.
-  auto acc = buf.get_access<cl::sycl::access::mode::read_write>();
+  auto acc = buf.get_access<sycl::access::mode::read_write>();
 ```
 
 Or because a mapped host pointer obtained from a native device memory object

diff --git a/sycl/doc/design/OptionalDeviceFeatures.md b/sycl/doc/design/OptionalDeviceFeatures.md
@@ -403,9 +403,9 @@ name.  The format looks like this:
 
 ```
 !intel_types_that_use_aspects = !{!0, !1, !2}
-!0 = !{!"class.cl::sycl::detail::half_impl::half", i32 8}
-!1 = !{!"class.cl::sycl::amx_type", i32 9}
-!2 = !{!"class.cl::sycl::other_type", i32 8, i32 9}
+!0 = !{!"class.sycl::detail::half_impl::half", i32 8}
+!1 = !{!"class.sycl::amx_type", i32 9}
+!2 = !{!"class.sycl::other_type", i32 8, i32 9}
 ```
 
 The value of the `!intel_types_that_use_aspects` metadata is a list of unnamed
@@ -415,8 +415,8 @@ starts with a string giving the name of the type which is followed by a list of
 `i32` constants where each constant is a value from `enum class aspect` telling
 the numerical value of an aspect from the type's
 `[[sycl_detail::uses_aspects()]]` attribute.  In the example above, the type
-`cl::sycl::detail::half_impl::half` uses an aspect whose numerical value is
-`8` and the type `cl::sycl::other_type` uses two aspects `8` and `9`.
+`sycl::detail::half_impl::half` uses an aspect whose numerical value is
+`8` and the type `sycl::other_type` uses two aspects `8` and `9`.
 
 **NOTE**: The reason we choose this representation is because LLVM IR does not
 allow metadata to be attached directly to types.  This representation works

diff --git a/sycl/doc/design/SYCLInstrumentationUsingXPTI.md b/sycl/doc/design/SYCLInstrumentationUsingXPTI.md
@@ -97,7 +97,7 @@ language constructs (2) The instrumentation that handles capturing the
 relevant metadata.
 
 1. In order to capture end-user source code information, we have implemented
-`cl::sycl::detail::code_location` class that uses the builtin functions
+`sycl::detail::code_location` class that uses the builtin functions
 in the compiler. However, equivalent implementations are unavailable on
 Windows and separate cross-platform implementation might be used in the
 future. To mitigate this, the Windows implementation will always report

diff --git a/sycl/doc/design/SpecializationConstants.md b/sycl/doc/design/SpecializationConstants.md
@@ -34,10 +34,10 @@ struct A {
 
 struct POD {
   A a[2];
-  // FIXME: cl::sycl::vec class is not a POD type in our implementation by some
+  // FIXME: sycl::vec class is not a POD type in our implementation by some
   // reason, but there are no limitations for vector types from spec constatns
   // design point of view.
-  cl::sycl::vec<int, 2> b;
+  sycl::vec<int, 2> b;
 };
 
 class MyInt32Const;
@@ -55,16 +55,16 @@ class MyPODConst;
   sycl::ONEAPI::experimental::spec_constant<int32_t, MyInt32Const> i32 =
       p.set_spec_constant<MyInt32Const>(rt_val);
 
-  cl::sycl::ONEAPI::experimental::spec_constant<POD, MyPODConst> pod =
+  sycl::ONEAPI::experimental::spec_constant<POD, MyPODConst> pod =
       p.set_spec_constant<MyPODConst>(gold);
 
   p.build_with_kernel_type<MyKernel>();
   sycl::buffer<int, 1> buf(vec.data(), vec.size());
   sycl::buffer<POD, 1> buf(vec_pod.data(), vec_pod.size());
 
-  q.submit([&](cl::sycl::handler &cgh) {
-    auto acc = buf.get_access<cl::sycl::access::mode::write>(cgh);
-    auto acc_pod = buf.get_access<cl::sycl::access::mode::write>(cgh);
+  q.submit([&](sycl::handler &cgh) {
+    auto acc = buf.get_access<sycl::access::mode::write>(cgh);
+    auto acc_pod = buf.get_access<sycl::access::mode::write>(cgh);
     cgh.single_task<MyKernel>(
         p.get_kernel<MyKernel>(),
         [=]() {
@@ -169,7 +169,7 @@ struct A {
 sycl::ONEAPI::experimental::spec_constant<int32_t, MyInt32Const> i32 =
     p.set_spec_constant<MyInt32Const>(rt_val);
 
-cl::sycl::ONEAPI::experimental::spec_constant<A, MyPODConst> pod =
+sycl::ONEAPI::experimental::spec_constant<A, MyPODConst> pod =
     p.set_spec_constant<MyPODConst>(gold);
 // ...
   i32.get();
@@ -335,7 +335,7 @@ struct A {
 sycl::ONEAPI::experimental::spec_constant<int32_t, MyInt32Const> i32 =
     p.set_spec_constant<MyInt32Const>(rt_val);
 
-cl::sycl::ONEAPI::experimental::spec_constant<A, MyPODConst> pod =
+sycl::ONEAPI::experimental::spec_constant<A, MyPODConst> pod =
     p.set_spec_constant<MyPODConst>(gold);
 // ...
   i32.get();
@@ -428,7 +428,7 @@ specialization constant:
 ```
 // user code:
 class MyIn32Constant;
-cl::sycl::ONEAPI::experimental::spec_constant<int, MyInt32Const> i32(0);
+sycl::ONEAPI::experimental::spec_constant<int, MyInt32Const> i32(0);
 // integration header:
 template <> struct sycl::detail::SpecConstantInfo<::MyInt32Const> {
   static constexpr const char* getName() {

diff --git a/sycl/doc/design/fpga_io_pipes_design.rst b/sycl/doc/design/fpga_io_pipes_design.rst
@@ -13,9 +13,9 @@ Requirements
       static constexpr unsigned id = ID;
     };
     using ethernet_read_pipe =
-        cl::sycl::intel::kernel_readable_io_pipe<ethernet_pipe_id<0>, int, 0>;
+        sycl::intel::kernel_readable_io_pipe<ethernet_pipe_id<0>, int, 0>;
     using ethernet_write_pipe =
-        cl::sycl::intel::kernel_writeable_io_pipe<ethernet_pipe_id<1>, int, 0>;
+        sycl::intel::kernel_writeable_io_pipe<ethernet_pipe_id<1>, int, 0>;
   }
 
   Thus, the user interacts only with vendor-defined pipe objects.