Skip to content

[SYCL][CUDA] Change builtin selection for SYCL #9768

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 13 commits into from
Aug 17, 2023

Conversation

hdelan
Copy link
Contributor

@hdelan hdelan commented Jun 7, 2023

Libdevice for NVPTX was previously working due to the fact that LLVM intrinsics were not selected due to the CUDA toolchain having isMathErrnoDefault evaluate to true. Since LLVM intrinsics were not selected then the symbols could be found when linking with libdevice, which gives special backend specific definitions of CXX stdlib funcs.

Using errno to prevent intrinsic selection was a hack and it gets undone by using -ffast-math, meaning libdevice CXX funcs were not working with -ffast-math.

This change instead explicitly says not to use LLVM intrinsics if compiling SYCL for NVPTX backend. This means that -ffast-math behaviour should now be fixed for CXX stdlib funcs.

@jchlanda

@hdelan hdelan requested review from a team as code owners June 7, 2023 10:48
@hdelan hdelan temporarily deployed to aws June 7, 2023 11:09 — with GitHub Actions Inactive
@hdelan hdelan temporarily deployed to aws June 7, 2023 11:49 — with GitHub Actions Inactive
Copy link
Contributor

@elizabethandrews elizabethandrews left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add a test

Copy link
Contributor

@mdtoguchi mdtoguchi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK for driver

@hdelan hdelan requested a review from a team as a code owner June 12, 2023 09:49
@hdelan hdelan requested a review from steffenlarsen June 12, 2023 09:49
@hdelan
Copy link
Contributor Author

hdelan commented Jun 12, 2023

Please add a test

Test added

@hdelan hdelan temporarily deployed to aws June 12, 2023 10:13 — with GitHub Actions Inactive
@hdelan hdelan temporarily deployed to aws June 12, 2023 10:53 — with GitHub Actions Inactive
if ((FD->hasAttr<ConstAttr>() ||
((ConstWithoutErrnoAndExceptions || ConstWithoutExceptions) &&
(!ConstWithoutErrnoAndExceptions || (!getLangOpts().MathErrno)))) &&
!(getLangOpts().isSYCL() && getTarget().getTriple().isNVPTX())) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All FE changes should have an accompanying FE test. It makes it easier to track down issues in the future if one arises. Can you add a FE test for this change. It can just check these builtins are not present in IR with these options and present without.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure. Test added. Let me know if you think this is sufficient. Note that SPIR-V compilation does use llvm intrinsics (not libdevice) for cmath funcs when in -ffast-math mode (since errno not used). This isn't exactly a problem for SPIR-V since llvm.cos.f32 has a SPIR-V lowering for fast mode, namely approx_cos. It happens to work but I think the idea of using libdevice is for DPC++ to provide its own definitions for cmath and other CXX stdlib funcs. Let me know if you think SPIR-V compilation should also use libdevice func definitions while in fast math mode as well, instead of relying on openCL driver implementations for say approx_cos.

Copy link
Contributor

@elizabethandrews elizabethandrews Jun 20, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let me know if you think SPIR-V compilation should also use libdevice func definitions while in fast math mode as well, instead of relying on openCL driver implementations for say approx_cos.

I'm not familiar enough with this to have a non-naive response one way or the other. Maybe @bader or @steffenlarsen has an input?

@hdelan hdelan temporarily deployed to aws June 15, 2023 21:08 — with GitHub Actions Inactive
@hdelan hdelan temporarily deployed to aws June 15, 2023 21:56 — with GitHub Actions Inactive
@hdelan hdelan closed this Jun 16, 2023
@hdelan hdelan reopened this Jun 16, 2023
@hdelan hdelan temporarily deployed to aws June 16, 2023 10:02 — with GitHub Actions Inactive
@hdelan hdelan temporarily deployed to aws June 16, 2023 16:08 — with GitHub Actions Inactive
@hdelan hdelan temporarily deployed to aws June 21, 2023 10:43 — with GitHub Actions Inactive
@MrSidims
Copy link
Contributor

MrSidims commented Jul 20, 2023

Hi, sorry, missed the ping. There is no direct counterpart for the intrinsic in core SPIR-V. We can add translation of it to OpenCL's ldexp https://registry.khronos.org/SPIR-V/specs/1.0/OpenCL.ExtendedInstructionSet.100.mobile.html but unfortunately vector GPU compiler wouldn't be able to handle it as it appears to be not handling OpenCL.

So the first question would be: do you expect your patch to change compilation for ESIMD? If no - there is plenty of time for us to help you, since intrinsic to OpenCL builtin translation should be trivial (please create an internal feature request for that). If yes - probably we won't be able to help you for this release, from what I see now - it's quite hard to emulate such intrinsic replacing it with a sequence of SPIR-V instructions, so in this case it better to get rid of it before the translator.

@hdelan
Copy link
Contributor Author

hdelan commented Jul 20, 2023

This patch is not changing compilation for SPIR-V targets at all, but the test I have added is highlighting a problem in LLVM-SPIR-V.

If I were to fix this for SPIR-V compilation it would merely avoid using LLVM intrinsics at all in SYCL, which is the default behaviour unless -ffast-math is used.

If we correctly map this intrinsic to an openCL builtin in LLVM-SPIR-V, then my understanding is that this would not work for say a L0 driver which as you say doesn't handle openCL SPIR-V. So we are shifting the problem elsewhere.

@steffenlarsen the long term problem is that libdevice doesn't do anything for ffast-math, maybe I should just disable the test for L0 at the moment, and add a TODO to add ffast-math functionality to libdevice. Once we have ffast-math working in libdevice then we can stop using LLVM intrinsics at all for SYCL SPIR-V compilation. Let me know your thoughts

@steffenlarsen
Copy link
Contributor

Thank you for the clarification, @hdelan ! As long as it is not a regression, I am okay with disabling the test for targets we know don't support it, which I think in this case would be all SPIR-V targets, so maybe we should REQUIRE CUDA or HIP to be on the safe side here. Could you please open an issue here about this?

@hdelan hdelan temporarily deployed to aws July 21, 2023 11:26 — with GitHub Actions Inactive
@hdelan hdelan temporarily deployed to aws July 21, 2023 12:04 — with GitHub Actions Inactive
@hdelan hdelan closed this Jul 24, 2023
@hdelan hdelan reopened this Jul 24, 2023
@hdelan hdelan temporarily deployed to aws July 24, 2023 11:15 — with GitHub Actions Inactive
@hdelan hdelan temporarily deployed to aws July 24, 2023 11:54 — with GitHub Actions Inactive
@hdelan hdelan closed this Aug 2, 2023
@hdelan hdelan reopened this Aug 2, 2023
@hdelan hdelan temporarily deployed to aws August 2, 2023 12:38 — with GitHub Actions Inactive
@hdelan hdelan temporarily deployed to aws August 2, 2023 13:18 — with GitHub Actions Inactive
@hdelan
Copy link
Contributor Author

hdelan commented Aug 16, 2023

Failures are unrelated. Ping @intel/llvm-gatekeepers this can be merged

@stdale-intel
Copy link
Contributor

Failures are unrelated. Ping @intel/llvm-gatekeepers this can be merged

@hdelan , due to the failures that were happening (no gen12 linux tests run at all), we are not able to accept these test results as known issues. I have rekicked off your test run now that issue is fixed. If all comes back clean, gatekeepers will proceed with merge.

@dm-vodopyanov
Copy link
Contributor

SYCL :: DeviceLib/cmath_test.cpp test failed, which was modified in this PR. @hdelan can you please take a look?

@hdelan
Copy link
Contributor Author

hdelan commented Aug 17, 2023

@dm-vodopyanov that should hopefully fix it. Let's wait and see

@hdelan
Copy link
Contributor Author

hdelan commented Aug 17, 2023

@dm-vodopyanov failures are unrelated.

@dm-vodopyanov
Copy link
Contributor

Failed tests on HIP:

Failed Tests (2):
  SYCL :: AtomicRef/max_generic_local_native_fp.cpp
  SYCL :: GroupAlgorithm/leader.cpp

Logs:

$ "env" "ONEAPI_DEVICE_SELECTOR=ext_oneapi_hip:gpu" "/__w/llvm/llvm/build-e2e/AtomicRef/Output/max_generic_local_native_fp.cpp.tmp.out"
# command stderr:
Memory access fault by GPU node-1 (Agent handle: 0xd9[18](https://github.com/intel/llvm/actions/runs/5889588137/job/15974038334?pr=9768#step:22:19)c0) on address 0x7f93ec800000. Reason: Page not present or supervisor privilege.

error: command failed with exit status: -6

--

$ "env" "ONEAPI_DEVICE_SELECTOR=ext_oneapi_hip:gpu" "/__w/llvm/llvm/build-e2e/GroupAlgorithm/Output/leader.cpp.tmp.out"
# command stderr:
Memory access fault by GPU node-1 (Agent handle: 0x[23](https://github.com/intel/llvm/actions/runs/5889588137/job/15974038334?pr=9768#step:22:24)dea[70](https://github.com/intel/llvm/actions/runs/5889588137/job/15974038334?pr=9768#step:22:71)) on address 0x7fe8aa800000. Reason: Page not present or supervisor privilege.

error: command failed with exit status: -6

@dm-vodopyanov dm-vodopyanov changed the title [SYCL][CUDA] Change builtin selection for sycl [SYCL][CUDA] Change builtin selection for SYCL Aug 17, 2023
@dm-vodopyanov dm-vodopyanov merged commit 970a2df into intel:sycl Aug 17, 2023
againull pushed a commit that referenced this pull request Aug 23, 2023
Since #9768 `ffast-math` no longer
chooses llvm intrinsics. This highlighted that `std::round` was missing
in libdevice. This fixes that.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

9 participants