Skip to content

[SYCL-MLIR] Merge from intel/llvm sycl branch #9895

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 31 commits into from
Jun 15, 2023

Conversation

whitneywhtsang
Copy link
Contributor

@whitneywhtsang whitneywhtsang commented Jun 15, 2023

Please only review 0d222a6 and files with conflicts:

        both modified:   devops/actions/cached_checkout/action.yml

Now same as sycl branch:

        both modified:   devops/actions/clang-format/action.yml

Undo a69e515, track in #9535:

        both modified:   sycl/include/sycl/multi_ptr.hpp

Please do not squash and merge this PR.

mmoadeli and others added 30 commits June 13, 2023 15:15
…ly. (intel#9787)

* The `threadsPerBlock` values computed by `guessLocalWorkSize` are not
the most optimal values. In particular the `threadsPerBlock` for `Y` and
`Z` were much below the possible values.
* When Y/Z values of range are prime a very poor performance is
witnessed as shown in the associated
[issue](intel#8018)
* This PR compute `threadsPerBlock` for X/Y/Z to reduce corresponding
`BlocksPerGrid` values.

* Below presents the output of the code in associated issue without the
changes in this PR.

Device = NVIDIA GeForce GTX 1050 Ti
N,   elapsed(ms)

- 1009,4.61658
- 2003,45.6869
- 3001,67.5192
- 4001,88.1543
- 5003,111.338
- 6007,132.848
- 7001,154.697
- 8009,175.452
- 9001,196.237
- 10007,219.39
- 1000,4.59423
- 2000,4.61525
- 3000,4.61935
- 4000,4.62526
- 5000,4.64623
- 6000,4.78904
- 7000,8.92251
- 8000,8.97263
- 9000,9.06992
- 10000,9.03802

 
* And below shows the output with the PR's updates
 Device = NVIDIA GeForce GTX 1050 Ti
N,  elapsed(ms)

- 1009,4.58252
- 2003,4.60139
- 3001,3.47269
- 4001,3.62314
- 5003,4.15179
- 6007,7.07976
- 7001,7.49027
- 8009,8.00097
- 9001,9.08756
- 10007,8.0005
- 1000,4.56335
- 2000,4.60376
- 3000,4.76395
- 4000,4.63283
- 5000,4.64732
- 6000,4.63936
- 7000,8.97499
- 8000,8.9941
- 9000,9.01531
- 10000,9.00935
This patch maps `Device` scope fence to the right NVVM built-in. It
would previously incorrectly use the CTA (threadblock) variant.
* Add ability to skip the merge
* Setup alternates on the filesystem level so that other jobs in the
workflow could work with GIT without setting the environment variable.
The llvm-no-spir-kernel tool is no longer in use. Remove the creation,
tests and driver infrastructure to use the tool. Also remove the
reference from the docs.
Github's allocation of default ubuntu-* runners isn't reliably stable,
so keep moving tasks to self hosted runners. We don't use the cuda
runner currently, so assign those to it for the time being. Later we 
should be able to extend those utility tasks to run on generic `Linux` 
class of self-hosted runners.
We decided to use "zstd" instead.
That is needed so that we could use the resulting image to test PRs that
only touch SYCL End-to-End tests
…upport fp64 (intel#9552)" (intel#9826)

After 2910add, the splitting does the
right thing with `invoke_simd`, and we can use this test to lock down
the functionality which previously didn't work.

This reverts commit 8e19a94.
Signed-off-by: Tikhomirova, Kseniya <kseniya.tikhomirova@intel.com>
We should not run ZE_DEBUG tests on Windows.

---------

Signed-off-by: Byoungro So <byoungro.so@intel.com>
This moves the CUDA plugin implementation to Unified Runtime; and
changes the pi_cuda plugin to use pi2ur to implement PI. The changes to
the implementation have been kept to a minimum and should be
functionally the same. Documentation and comments have been moved
verbatim, other than changing PI references to UR.

This PR is based on top of the Level Zero adapter (intel#8744) so will only
be ready when that is merged.

---------

Co-authored-by: Petr Vesely <petr.vesely@codeplay.com>
Co-authored-by: Omar Ahmed <omar.ahmed@codeplay.com>
Co-authored-by: Martin Morrison-Grant <martin.morrisongrant@codeplay.com>
Co-authored-by: Aaron Greig <aaron.greig@codeplay.com>
…owners (intel#9851)

This commit changes the code-owners of the sycl/docs/design section from
@intel/dpcpp-specification-reviewers to intel/llvm-reviewers-runtime.
With this, the intel/llvm-reviewers-runtime would be responsible for
either reviewing the design changes or assign the appropriate teams to
make a design review.

Signed-off-by: Larsen, Steffen <steffen.larsen@intel.com>
…mask to ESIMD::simd_mask (intel#9830)

Users of `invoke_simd` need to use `std::experimental::simd_mask` for
masks as per the spec, but once they enter ESIMD code they will likely
want to use the ESIMD classes. Provide an implicit conversion from
`std::experimental::simd_mask` to `esimd::simd_mask`

Without this change, you need to use a manual loop, as all you can do is
access `std::experimental::simd_mask` element-by-element.

Signed-off-by: Sarnie, Nick <nick.sarnie@intel.com>
This test uses an interop API to create a kernel.
So, ZE_DEBUG should be disabled.

Signed-off-by: Byoungro So <byoungro.so@intel.com>
Resolves the warnings as errors reported in [post
merge](https://github.com/intel/llvm/actions/runs/5266121277/jobs/9519634360)
as a result of merging intel#9512. Additionally move pre-processor guards to
resolve unused global variables which would also fail in this build
configuration (clang & SYCL_ENABLE_WERROR=ON).
)

This patch implements the accuracy controls for floating-point math
functions in DPC++. Using the -ffp-accuracy command line option, the
user can request an accuracy level for all math functions or for
specific ones. Calls to fpbuiltin intrinsics llvm.fpbuilin.* are then
generated.

Syntax: 

Linux:   -ffp-accuracy=[default|value][:funclist]
Windows: /Qfp-accuracy:[default|value][:funclist]

funclist is an optional comma separated list of math library functions.

-ffp-accuracy=[default|value]
default: Use the implementation defined accuracy for all math library
functions.
            This is equivalent to not using this option.         
value: Use the defined standard accuracy for what each accuracy value
            means for all math library functions.

-ffp-accuracy=[default|value][:funclist]

default: Use the implementation defined accuracy for the math library
functions in funclist.
            This is equivalent to not using this option.
value: Use the defined standard accuracy for what each accuracy value
            means for the math library functions in funclist.

value is one of the following values denoting the library function
accuracy.

high	This is equivalent to max-error = 1.0.
medium	This is equivalent to max-error = 4.
low This is equivalent to accuracy-bits = 11 for single-precision
functions.
accuracy-bits = 26 for double-precision functions.
sycl Determined by the OpenCL specification for math function accuracy:
https://registry.khronos.org/OpenCL/specs/3.0-unified/html/OpenCL_C.html#relative-error-as-ulps
cuda Determined by standard
https://docs.nvidia.com/cuda/cuda-c-programming-guide/#mathematical-functions-appendix
I don't think we test it anywhere in our CI pipeline.
…ntel#9841)

This reverts commit 4447a50. Previous
attempt: intel#8343

What changed: One extra patch is being added to the headers:
intel@ca0595b
with this patch clang won't generate llvm.memcpy for trivial c'tor. So
later on inst combine won't
replace it with a cast to i64 followed by load + store which SROA +
mem2reg won't be able to handle
for target extension types.

It adds:
ConvertSYCLJointMatrixINTELType - Convert SYCL joint_matrix type which
is represented as a pointer to a structure to LLVM extension type with
the parameters that follow SPIR-V JointMatrixINTEL type. The expected
representation is:
target("spirv.JointMatrixINTEL", %element_type, %rows%, %cols%, %scope%,
%use%, (optional) %element_type_interpretation%)

Better approach is to introduce joint matrix type to clang, but it's off
the table now, since we are lacking OpenCL
spec.

Co-authored-by: Joshua Cranmer <joshua.cranmer@intel.com>

---------

Signed-off-by: Sidorov, Dmitry <dmitry.sidorov@intel.com>
Co-authored-by: Alexey Bader <alexey.bader@intel.com>
These operators were changed from aliasing their `std` counterparts in
intel#9298 but a const-qualification was
not added (as required by [4.17.2. Function
objects](https://registry.khronos.org/SYCL/specs/sycl-2020/html/sycl-2020.html#sec:function-objects)).
This extension is used to specify the register mode on an Intel GPU.

Currently we only support specific register mode values on specific
GPUs.

---------

Signed-off-by: Sarnie, Nick <nick.sarnie@intel.com>
intel#9844 somehow caused problems with
lint tasks when `origin/sycl` is newer than PR's merge base with it. I
don't understand how that wasn't a problem before, but let's try to fix
it.

While on it, start using sparse checkout to get
`devops/actions/cached_checkout` instead of "wget".
The sycl_ext_usm_address_spaces extension adds the
ext_intel_global_device_space address space together with additional
multi_ptr constructors for creating a multi_ptr from an accessor.
However, the current implementation fails to construct the multi_ptr
from an accessor when the extended address space decorations are enabled
(through __ENABLE_USM_ADDR_SPACE__) as it attempts to use the normal
global address space decoration.
This commit fixes these constructors by doing a legal cast of the
underlying global-space pointer to a ext_intel_global_device_space
decorated pointer.

Signed-off-by: Larsen, Steffen <steffen.larsen@intel.com>
Manual testing before merging lint-related PRs didn't reveal issues but
it seems to misbehave after merge. Add some debug output to root cause.
I hope to address the issues during the day, if not I'm going to revert
the changes in the evening.
…sts (intel#9859)

Co-authored-by: Vyacheslav Klochkov <vyacheslav.n.klochkov@intel.com>
Modify `is_compatible` to check if specific target is defined with
`-fsycl-targets` and change the result. Previously there was a situation
when kernel is compatible with the device by aspects, but actually it
fails to run on this device as it was compiled for another target
device.

Related spec change: KhronosGroup/SYCL-Docs#381

Resolves intel#7561
…ntel#9883)

Mark newly ported UR CUDA plugin as owned by CUDA reviewer group

---------

Co-authored-by: Alexey Bader <alexey.bader@intel.com>
Signed-off-by: Whitney Tsang <whitney.tsang@intel.com>
@whitneywhtsang whitneywhtsang requested a review from etiotto as a code owner June 15, 2023 05:08
@whitneywhtsang whitneywhtsang added disable-lint Skip linter check step and proceed with build jobs sycl-mlir Pull requests or issues for sycl-mlir branch labels Jun 15, 2023
@whitneywhtsang whitneywhtsang self-assigned this Jun 15, 2023
@whitneywhtsang whitneywhtsang merged commit 6e30b3a into intel:sycl-mlir Jun 15, 2023
@whitneywhtsang whitneywhtsang deleted the merge branch June 15, 2023 14:04
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
disable-lint Skip linter check step and proceed with build jobs sycl-mlir Pull requests or issues for sycl-mlir branch
Projects
None yet
Development

Successfully merging this pull request may close these issues.