-
Notifications
You must be signed in to change notification settings - Fork 787
[SYCL-MLIR] Merge from intel/llvm sycl branch #9895
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Merged
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
…ly. (intel#9787) * The `threadsPerBlock` values computed by `guessLocalWorkSize` are not the most optimal values. In particular the `threadsPerBlock` for `Y` and `Z` were much below the possible values. * When Y/Z values of range are prime a very poor performance is witnessed as shown in the associated [issue](intel#8018) * This PR compute `threadsPerBlock` for X/Y/Z to reduce corresponding `BlocksPerGrid` values. * Below presents the output of the code in associated issue without the changes in this PR. Device = NVIDIA GeForce GTX 1050 Ti N, elapsed(ms) - 1009,4.61658 - 2003,45.6869 - 3001,67.5192 - 4001,88.1543 - 5003,111.338 - 6007,132.848 - 7001,154.697 - 8009,175.452 - 9001,196.237 - 10007,219.39 - 1000,4.59423 - 2000,4.61525 - 3000,4.61935 - 4000,4.62526 - 5000,4.64623 - 6000,4.78904 - 7000,8.92251 - 8000,8.97263 - 9000,9.06992 - 10000,9.03802 * And below shows the output with the PR's updates Device = NVIDIA GeForce GTX 1050 Ti N, elapsed(ms) - 1009,4.58252 - 2003,4.60139 - 3001,3.47269 - 4001,3.62314 - 5003,4.15179 - 6007,7.07976 - 7001,7.49027 - 8009,8.00097 - 9001,9.08756 - 10007,8.0005 - 1000,4.56335 - 2000,4.60376 - 3000,4.76395 - 4000,4.63283 - 5000,4.64732 - 6000,4.63936 - 7000,8.97499 - 8000,8.9941 - 9000,9.01531 - 10000,9.00935
This patch maps `Device` scope fence to the right NVVM built-in. It would previously incorrectly use the CTA (threadblock) variant.
* Add ability to skip the merge * Setup alternates on the filesystem level so that other jobs in the workflow could work with GIT without setting the environment variable.
The llvm-no-spir-kernel tool is no longer in use. Remove the creation, tests and driver infrastructure to use the tool. Also remove the reference from the docs.
Github's allocation of default ubuntu-* runners isn't reliably stable, so keep moving tasks to self hosted runners. We don't use the cuda runner currently, so assign those to it for the time being. Later we should be able to extend those utility tasks to run on generic `Linux` class of self-hosted runners.
We decided to use "zstd" instead.
That is needed so that we could use the resulting image to test PRs that only touch SYCL End-to-End tests
…upport fp64 (intel#9552)" (intel#9826) After 2910add, the splitting does the right thing with `invoke_simd`, and we can use this test to lock down the functionality which previously didn't work. This reverts commit 8e19a94.
Signed-off-by: Tikhomirova, Kseniya <kseniya.tikhomirova@intel.com>
We should not run ZE_DEBUG tests on Windows. --------- Signed-off-by: Byoungro So <byoungro.so@intel.com>
This moves the CUDA plugin implementation to Unified Runtime; and changes the pi_cuda plugin to use pi2ur to implement PI. The changes to the implementation have been kept to a minimum and should be functionally the same. Documentation and comments have been moved verbatim, other than changing PI references to UR. This PR is based on top of the Level Zero adapter (intel#8744) so will only be ready when that is merged. --------- Co-authored-by: Petr Vesely <petr.vesely@codeplay.com> Co-authored-by: Omar Ahmed <omar.ahmed@codeplay.com> Co-authored-by: Martin Morrison-Grant <martin.morrisongrant@codeplay.com> Co-authored-by: Aaron Greig <aaron.greig@codeplay.com>
…owners (intel#9851) This commit changes the code-owners of the sycl/docs/design section from @intel/dpcpp-specification-reviewers to intel/llvm-reviewers-runtime. With this, the intel/llvm-reviewers-runtime would be responsible for either reviewing the design changes or assign the appropriate teams to make a design review. Signed-off-by: Larsen, Steffen <steffen.larsen@intel.com>
…mask to ESIMD::simd_mask (intel#9830) Users of `invoke_simd` need to use `std::experimental::simd_mask` for masks as per the spec, but once they enter ESIMD code they will likely want to use the ESIMD classes. Provide an implicit conversion from `std::experimental::simd_mask` to `esimd::simd_mask` Without this change, you need to use a manual loop, as all you can do is access `std::experimental::simd_mask` element-by-element. Signed-off-by: Sarnie, Nick <nick.sarnie@intel.com>
This test uses an interop API to create a kernel. So, ZE_DEBUG should be disabled. Signed-off-by: Byoungro So <byoungro.so@intel.com>
Resolves the warnings as errors reported in [post merge](https://github.com/intel/llvm/actions/runs/5266121277/jobs/9519634360) as a result of merging intel#9512. Additionally move pre-processor guards to resolve unused global variables which would also fail in this build configuration (clang & SYCL_ENABLE_WERROR=ON).
) This patch implements the accuracy controls for floating-point math functions in DPC++. Using the -ffp-accuracy command line option, the user can request an accuracy level for all math functions or for specific ones. Calls to fpbuiltin intrinsics llvm.fpbuilin.* are then generated. Syntax: Linux: -ffp-accuracy=[default|value][:funclist] Windows: /Qfp-accuracy:[default|value][:funclist] funclist is an optional comma separated list of math library functions. -ffp-accuracy=[default|value] default: Use the implementation defined accuracy for all math library functions. This is equivalent to not using this option. value: Use the defined standard accuracy for what each accuracy value means for all math library functions. -ffp-accuracy=[default|value][:funclist] default: Use the implementation defined accuracy for the math library functions in funclist. This is equivalent to not using this option. value: Use the defined standard accuracy for what each accuracy value means for the math library functions in funclist. value is one of the following values denoting the library function accuracy. high This is equivalent to max-error = 1.0. medium This is equivalent to max-error = 4. low This is equivalent to accuracy-bits = 11 for single-precision functions. accuracy-bits = 26 for double-precision functions. sycl Determined by the OpenCL specification for math function accuracy: https://registry.khronos.org/OpenCL/specs/3.0-unified/html/OpenCL_C.html#relative-error-as-ulps cuda Determined by standard https://docs.nvidia.com/cuda/cuda-c-programming-guide/#mathematical-functions-appendix
I don't think we test it anywhere in our CI pipeline.
Doesn't seem to be needed.
…ntel#9841) This reverts commit 4447a50. Previous attempt: intel#8343 What changed: One extra patch is being added to the headers: intel@ca0595b with this patch clang won't generate llvm.memcpy for trivial c'tor. So later on inst combine won't replace it with a cast to i64 followed by load + store which SROA + mem2reg won't be able to handle for target extension types. It adds: ConvertSYCLJointMatrixINTELType - Convert SYCL joint_matrix type which is represented as a pointer to a structure to LLVM extension type with the parameters that follow SPIR-V JointMatrixINTEL type. The expected representation is: target("spirv.JointMatrixINTEL", %element_type, %rows%, %cols%, %scope%, %use%, (optional) %element_type_interpretation%) Better approach is to introduce joint matrix type to clang, but it's off the table now, since we are lacking OpenCL spec. Co-authored-by: Joshua Cranmer <joshua.cranmer@intel.com> --------- Signed-off-by: Sidorov, Dmitry <dmitry.sidorov@intel.com> Co-authored-by: Alexey Bader <alexey.bader@intel.com>
These operators were changed from aliasing their `std` counterparts in intel#9298 but a const-qualification was not added (as required by [4.17.2. Function objects](https://registry.khronos.org/SYCL/specs/sycl-2020/html/sycl-2020.html#sec:function-objects)).
This extension is used to specify the register mode on an Intel GPU. Currently we only support specific register mode values on specific GPUs. --------- Signed-off-by: Sarnie, Nick <nick.sarnie@intel.com>
intel#9844 somehow caused problems with lint tasks when `origin/sycl` is newer than PR's merge base with it. I don't understand how that wasn't a problem before, but let's try to fix it. While on it, start using sparse checkout to get `devops/actions/cached_checkout` instead of "wget".
The sycl_ext_usm_address_spaces extension adds the ext_intel_global_device_space address space together with additional multi_ptr constructors for creating a multi_ptr from an accessor. However, the current implementation fails to construct the multi_ptr from an accessor when the extended address space decorations are enabled (through __ENABLE_USM_ADDR_SPACE__) as it attempts to use the normal global address space decoration. This commit fixes these constructors by doing a legal cast of the underlying global-space pointer to a ext_intel_global_device_space decorated pointer. Signed-off-by: Larsen, Steffen <steffen.larsen@intel.com>
Manual testing before merging lint-related PRs didn't reveal issues but it seems to misbehave after merge. Add some debug output to root cause. I hope to address the issues during the day, if not I'm going to revert the changes in the evening.
…sts (intel#9859) Co-authored-by: Vyacheslav Klochkov <vyacheslav.n.klochkov@intel.com>
Modify `is_compatible` to check if specific target is defined with `-fsycl-targets` and change the result. Previously there was a situation when kernel is compatible with the device by aspects, but actually it fails to run on this device as it was compiled for another target device. Related spec change: KhronosGroup/SYCL-Docs#381 Resolves intel#7561
…ntel#9883) Mark newly ported UR CUDA plugin as owned by CUDA reviewer group --------- Co-authored-by: Alexey Bader <alexey.bader@intel.com>
Signed-off-by: Whitney Tsang <whitney.tsang@intel.com>
victor-eds
approved these changes
Jun 15, 2023
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Labels
disable-lint
Skip linter check step and proceed with build jobs
sycl-mlir
Pull requests or issues for sycl-mlir branch
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Please only review 0d222a6 and files with conflicts:
Now same as sycl branch:
Undo a69e515, track in #9535:
Please do not squash and merge this PR.