[SYCL-MLIR] Merge from intel/llvm sycl branch #9895

whitneywhtsang · 2023-06-15T05:08:10Z

Please only review 0d222a6 and files with conflicts:

        both modified:   devops/actions/cached_checkout/action.yml

Now same as sycl branch:

        both modified:   devops/actions/clang-format/action.yml

Undo a69e515, track in #9535:

        both modified:   sycl/include/sycl/multi_ptr.hpp

Please do not squash and merge this PR.

…ly. (intel#9787) * The `threadsPerBlock` values computed by `guessLocalWorkSize` are not the most optimal values. In particular the `threadsPerBlock` for `Y` and `Z` were much below the possible values. * When Y/Z values of range are prime a very poor performance is witnessed as shown in the associated [issue](intel#8018) * This PR compute `threadsPerBlock` for X/Y/Z to reduce corresponding `BlocksPerGrid` values. * Below presents the output of the code in associated issue without the changes in this PR. Device = NVIDIA GeForce GTX 1050 Ti N, elapsed(ms) - 1009,4.61658 - 2003,45.6869 - 3001,67.5192 - 4001,88.1543 - 5003,111.338 - 6007,132.848 - 7001,154.697 - 8009,175.452 - 9001,196.237 - 10007,219.39 - 1000,4.59423 - 2000,4.61525 - 3000,4.61935 - 4000,4.62526 - 5000,4.64623 - 6000,4.78904 - 7000,8.92251 - 8000,8.97263 - 9000,9.06992 - 10000,9.03802 * And below shows the output with the PR's updates Device = NVIDIA GeForce GTX 1050 Ti N, elapsed(ms) - 1009,4.58252 - 2003,4.60139 - 3001,3.47269 - 4001,3.62314 - 5003,4.15179 - 6007,7.07976 - 7001,7.49027 - 8009,8.00097 - 9001,9.08756 - 10007,8.0005 - 1000,4.56335 - 2000,4.60376 - 3000,4.76395 - 4000,4.63283 - 5000,4.64732 - 6000,4.63936 - 7000,8.97499 - 8000,8.9941 - 9000,9.01531 - 10000,9.00935

This patch maps `Device` scope fence to the right NVVM built-in. It would previously incorrectly use the CTA (threadblock) variant.

* Add ability to skip the merge * Setup alternates on the filesystem level so that other jobs in the workflow could work with GIT without setting the environment variable.

The llvm-no-spir-kernel tool is no longer in use. Remove the creation, tests and driver infrastructure to use the tool. Also remove the reference from the docs.

Github's allocation of default ubuntu-* runners isn't reliably stable, so keep moving tasks to self hosted runners. We don't use the cuda runner currently, so assign those to it for the time being. Later we should be able to extend those utility tasks to run on generic `Linux` class of self-hosted runners.

We decided to use "zstd" instead.

That is needed so that we could use the resulting image to test PRs that only touch SYCL End-to-End tests

…upport fp64 (intel#9552)" (intel#9826) After 2910add, the splitting does the right thing with `invoke_simd`, and we can use this test to lock down the functionality which previously didn't work. This reverts commit 8e19a94.

Signed-off-by: Tikhomirova, Kseniya <kseniya.tikhomirova@intel.com>

We should not run ZE_DEBUG tests on Windows. --------- Signed-off-by: Byoungro So <byoungro.so@intel.com>

This moves the CUDA plugin implementation to Unified Runtime; and changes the pi_cuda plugin to use pi2ur to implement PI. The changes to the implementation have been kept to a minimum and should be functionally the same. Documentation and comments have been moved verbatim, other than changing PI references to UR. This PR is based on top of the Level Zero adapter (intel#8744) so will only be ready when that is merged. --------- Co-authored-by: Petr Vesely <petr.vesely@codeplay.com> Co-authored-by: Omar Ahmed <omar.ahmed@codeplay.com> Co-authored-by: Martin Morrison-Grant <martin.morrisongrant@codeplay.com> Co-authored-by: Aaron Greig <aaron.greig@codeplay.com>

…owners (intel#9851) This commit changes the code-owners of the sycl/docs/design section from @intel/dpcpp-specification-reviewers to intel/llvm-reviewers-runtime. With this, the intel/llvm-reviewers-runtime would be responsible for either reviewing the design changes or assign the appropriate teams to make a design review. Signed-off-by: Larsen, Steffen <steffen.larsen@intel.com>

…mask to ESIMD::simd_mask (intel#9830) Users of `invoke_simd` need to use `std::experimental::simd_mask` for masks as per the spec, but once they enter ESIMD code they will likely want to use the ESIMD classes. Provide an implicit conversion from `std::experimental::simd_mask` to `esimd::simd_mask` Without this change, you need to use a manual loop, as all you can do is access `std::experimental::simd_mask` element-by-element. Signed-off-by: Sarnie, Nick <nick.sarnie@intel.com>

This test uses an interop API to create a kernel. So, ZE_DEBUG should be disabled. Signed-off-by: Byoungro So <byoungro.so@intel.com>

Resolves the warnings as errors reported in [post merge](https://github.com/intel/llvm/actions/runs/5266121277/jobs/9519634360) as a result of merging intel#9512. Additionally move pre-processor guards to resolve unused global variables which would also fail in this build configuration (clang & SYCL_ENABLE_WERROR=ON).

) This patch implements the accuracy controls for floating-point math functions in DPC++. Using the -ffp-accuracy command line option, the user can request an accuracy level for all math functions or for specific ones. Calls to fpbuiltin intrinsics llvm.fpbuilin.* are then generated. Syntax: Linux: -ffp-accuracy=[default|value][:funclist] Windows: /Qfp-accuracy:[default|value][:funclist] funclist is an optional comma separated list of math library functions. -ffp-accuracy=[default|value] default: Use the implementation defined accuracy for all math library functions. This is equivalent to not using this option. value: Use the defined standard accuracy for what each accuracy value means for all math library functions. -ffp-accuracy=[default|value][:funclist] default: Use the implementation defined accuracy for the math library functions in funclist. This is equivalent to not using this option. value: Use the defined standard accuracy for what each accuracy value means for the math library functions in funclist. value is one of the following values denoting the library function accuracy. high This is equivalent to max-error = 1.0. medium This is equivalent to max-error = 4. low This is equivalent to accuracy-bits = 11 for single-precision functions. accuracy-bits = 26 for double-precision functions. sycl Determined by the OpenCL specification for math function accuracy: https://registry.khronos.org/OpenCL/specs/3.0-unified/html/OpenCL_C.html#relative-error-as-ulps cuda Determined by standard https://docs.nvidia.com/cuda/cuda-c-programming-guide/#mathematical-functions-appendix

I don't think we test it anywhere in our CI pipeline.

Doesn't seem to be needed.

…ntel#9841) This reverts commit 4447a50. Previous attempt: intel#8343 What changed: One extra patch is being added to the headers: intel@ca0595b with this patch clang won't generate llvm.memcpy for trivial c'tor. So later on inst combine won't replace it with a cast to i64 followed by load + store which SROA + mem2reg won't be able to handle for target extension types. It adds: ConvertSYCLJointMatrixINTELType - Convert SYCL joint_matrix type which is represented as a pointer to a structure to LLVM extension type with the parameters that follow SPIR-V JointMatrixINTEL type. The expected representation is: target("spirv.JointMatrixINTEL", %element_type, %rows%, %cols%, %scope%, %use%, (optional) %element_type_interpretation%) Better approach is to introduce joint matrix type to clang, but it's off the table now, since we are lacking OpenCL spec. Co-authored-by: Joshua Cranmer <joshua.cranmer@intel.com> --------- Signed-off-by: Sidorov, Dmitry <dmitry.sidorov@intel.com> Co-authored-by: Alexey Bader <alexey.bader@intel.com>

These operators were changed from aliasing their `std` counterparts in intel#9298 but a const-qualification was not added (as required by [4.17.2. Function objects](https://registry.khronos.org/SYCL/specs/sycl-2020/html/sycl-2020.html#sec:function-objects)).

This extension is used to specify the register mode on an Intel GPU. Currently we only support specific register mode values on specific GPUs. --------- Signed-off-by: Sarnie, Nick <nick.sarnie@intel.com>

intel#9844 somehow caused problems with lint tasks when `origin/sycl` is newer than PR's merge base with it. I don't understand how that wasn't a problem before, but let's try to fix it. While on it, start using sparse checkout to get `devops/actions/cached_checkout` instead of "wget".

The sycl_ext_usm_address_spaces extension adds the ext_intel_global_device_space address space together with additional multi_ptr constructors for creating a multi_ptr from an accessor. However, the current implementation fails to construct the multi_ptr from an accessor when the extended address space decorations are enabled (through __ENABLE_USM_ADDR_SPACE__) as it attempts to use the normal global address space decoration. This commit fixes these constructors by doing a legal cast of the underlying global-space pointer to a ext_intel_global_device_space decorated pointer. Signed-off-by: Larsen, Steffen <steffen.larsen@intel.com>

Manual testing before merging lint-related PRs didn't reveal issues but it seems to misbehave after merge. Add some debug output to root cause. I hope to address the issues during the day, if not I'm going to revert the changes in the evening.

…sts (intel#9859) Co-authored-by: Vyacheslav Klochkov <vyacheslav.n.klochkov@intel.com>

Modify `is_compatible` to check if specific target is defined with `-fsycl-targets` and change the result. Previously there was a situation when kernel is compatible with the device by aspects, but actually it fails to run on this device as it was compiled for another target device. Related spec change: KhronosGroup/SYCL-Docs#381 Resolves intel#7561

…ntel#9883) Mark newly ported UR CUDA plugin as owned by CUDA reviewer group --------- Co-authored-by: Alexey Bader <alexey.bader@intel.com>

Signed-off-by: Whitney Tsang <whitney.tsang@intel.com>

mmoadeli and others added 30 commits June 13, 2023 15:15

[SYCL][CUDA] Add missing device scope to atomic fence (intel#9824)

93eb9ff

This patch maps `Device` scope fence to the right NVVM built-in. It would previously incorrectly use the CTA (threadblock) variant.

[CI] Improve devops/actions/cached_checkout (intel#9831)

a055665

* Add ability to skip the merge * Setup alternates on the filesystem level so that other jobs in the workflow could work with GIT without setting the environment variable.

[SYCL] Non-standard RT namespace removed(intel#7133) (intel#9837)

b3e0428

[SYCL][NFC] Remove llvm-no-spir-kernel tool (intel#9710)

36e6e06

The llvm-no-spir-kernel tool is no longer in use. Remove the creation, tests and driver infrastructure to use the tool. Also remove the reference from the docs.

[CI] Don't install lz4 (intel#9848)

53c8089

We decided to use "zstd" instead.

[CI] Enable HIP/CUDA/ESIMD plugins in nightly build (intel#9850)

6b74243

That is needed so that we could use the resulting image to test PRs that only touch SYCL End-to-End tests

[SYCL] Remove _CODELOC* macro from API (intel#9847)

f4525e9

Signed-off-by: Tikhomirova, Kseniya <kseniya.tikhomirova@intel.com>

[SYCL] Disable the ZE_DEBUG tests on Windows (intel#9854)

c87e780

We should not run ZE_DEBUG tests on Windows. --------- Signed-off-by: Byoungro So <byoungro.so@intel.com>

[SYCL] Disable ZE_DEBUG test for interop (intel#9857)

447f598

This test uses an interop API to create a kernel. So, ZE_DEBUG should be disabled. Signed-off-by: Byoungro So <byoungro.so@intel.com>

[CI] Remove FPGA Emulator workaround (intel#9855)

c2f0858

I don't think we test it anywhere in our CI pipeline.

[CI] Remove ROCm LD_LIBRARY_PATH setup (intel#9856)

5f0dbfe

Doesn't seem to be needed.

[SYCL][DOC] Add sycl_ext_intel_grf_size extension (intel#9779)

83d0997

This extension is used to specify the register mode on an Intel GPU. Currently we only support specific register mode values on specific GPUs. --------- Signed-off-by: Sarnie, Nick <nick.sarnie@intel.com>

[SYCL][ESIMD][NFC] Do not use deprecated APIs in ESIMD headers and te…

3d866f2

…sts (intel#9859) Co-authored-by: Vyacheslav Klochkov <vyacheslav.n.klochkov@intel.com>

[CI] Another attempt to fix lint task (intel#9885)

c6a9eee

[SYCL][UR][CUDA] Update CODEOWNERS for Unified Runtime CUDA Adapter (i…

c9219ce

…ntel#9883) Mark newly ported UR CUDA plugin as owned by CUDA reviewer group --------- Co-authored-by: Alexey Bader <alexey.bader@intel.com>

Merge remote-tracking branch 'upstream/sycl' into sycl-mlir

2777e7d

[SYCL-MLIR] Update xfail list after rebase

0d222a6

Signed-off-by: Whitney Tsang <whitney.tsang@intel.com>

whitneywhtsang requested a review from etiotto as a code owner June 15, 2023 05:08

whitneywhtsang added disable-lint Skip linter check step and proceed with build jobs sycl-mlir Pull requests or issues for sycl-mlir branch labels Jun 15, 2023

whitneywhtsang mentioned this pull request Jun 15, 2023

[SYCL-MLIR] Investigate e2e test failures after 712cb4e #9535

Open

whitneywhtsang self-assigned this Jun 15, 2023

whitneywhtsang requested review from Naghasan and victor-eds June 15, 2023 05:09

whitneywhtsang closed this Jun 15, 2023

whitneywhtsang reopened this Jun 15, 2023

victor-eds approved these changes Jun 15, 2023

View reviewed changes

whitneywhtsang merged commit 6e30b3a into intel:sycl-mlir Jun 15, 2023

whitneywhtsang deleted the merge branch June 15, 2023 14:04

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[SYCL-MLIR] Merge from intel/llvm sycl branch #9895

[SYCL-MLIR] Merge from intel/llvm sycl branch #9895

Uh oh!

whitneywhtsang commented Jun 15, 2023 •

edited

Loading

Uh oh!

Uh oh!

[SYCL-MLIR] Merge from intel/llvm sycl branch #9895

[SYCL-MLIR] Merge from intel/llvm sycl branch #9895

Uh oh!

Conversation

whitneywhtsang commented Jun 15, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

whitneywhtsang commented Jun 15, 2023 •

edited

Loading