[SYCL][CUDA] Change builtin selection for SYCL #9768

hdelan · 2023-06-07T10:48:52Z

Libdevice for NVPTX was previously working due to the fact that LLVM intrinsics were not selected due to the CUDA toolchain having isMathErrnoDefault evaluate to true. Since LLVM intrinsics were not selected then the symbols could be found when linking with libdevice, which gives special backend specific definitions of CXX stdlib funcs.

Using errno to prevent intrinsic selection was a hack and it gets undone by using -ffast-math, meaning libdevice CXX funcs were not working with -ffast-math.

This change instead explicitly says not to use LLVM intrinsics if compiling SYCL for NVPTX backend. This means that -ffast-math behaviour should now be fixed for CXX stdlib funcs.

@jchlanda

elizabethandrews

Please add a test

mdtoguchi

OK for driver

hdelan · 2023-06-12T09:49:31Z

Please add a test

Test added

elizabethandrews · 2023-06-12T18:50:53Z

clang/lib/CodeGen/CGBuiltin.cpp

+  if ((FD->hasAttr<ConstAttr>() ||
+       ((ConstWithoutErrnoAndExceptions || ConstWithoutExceptions) &&
+        (!ConstWithoutErrnoAndExceptions || (!getLangOpts().MathErrno)))) &&
+      !(getLangOpts().isSYCL() && getTarget().getTriple().isNVPTX())) {


All FE changes should have an accompanying FE test. It makes it easier to track down issues in the future if one arises. Can you add a FE test for this change. It can just check these builtins are not present in IR with these options and present without.

Sure. Test added. Let me know if you think this is sufficient. Note that SPIR-V compilation does use llvm intrinsics (not libdevice) for cmath funcs when in -ffast-math mode (since errno not used). This isn't exactly a problem for SPIR-V since llvm.cos.f32 has a SPIR-V lowering for fast mode, namely approx_cos. It happens to work but I think the idea of using libdevice is for DPC++ to provide its own definitions for cmath and other CXX stdlib funcs. Let me know if you think SPIR-V compilation should also use libdevice func definitions while in fast math mode as well, instead of relying on openCL driver implementations for say approx_cos.

Let me know if you think SPIR-V compilation should also use libdevice func definitions while in fast math mode as well, instead of relying on openCL driver implementations for say approx_cos.

I'm not familiar enough with this to have a non-naive response one way or the other. Maybe @bader or @steffenlarsen has an input?

clang/test/CodeGenSYCL/sycl_libdevice_cmath.cpp

…cation from test

MrSidims · 2023-07-20T17:25:58Z

Hi, sorry, missed the ping. There is no direct counterpart for the intrinsic in core SPIR-V. We can add translation of it to OpenCL's ldexp https://registry.khronos.org/SPIR-V/specs/1.0/OpenCL.ExtendedInstructionSet.100.mobile.html but unfortunately vector GPU compiler wouldn't be able to handle it as it appears to be not handling OpenCL.

So the first question would be: do you expect your patch to change compilation for ESIMD? If no - there is plenty of time for us to help you, since intrinsic to OpenCL builtin translation should be trivial (please create an internal feature request for that). If yes - probably we won't be able to help you for this release, from what I see now - it's quite hard to emulate such intrinsic replacing it with a sequence of SPIR-V instructions, so in this case it better to get rid of it before the translator.

hdelan · 2023-07-20T17:49:31Z

This patch is not changing compilation for SPIR-V targets at all, but the test I have added is highlighting a problem in LLVM-SPIR-V.

If I were to fix this for SPIR-V compilation it would merely avoid using LLVM intrinsics at all in SYCL, which is the default behaviour unless -ffast-math is used.

If we correctly map this intrinsic to an openCL builtin in LLVM-SPIR-V, then my understanding is that this would not work for say a L0 driver which as you say doesn't handle openCL SPIR-V. So we are shifting the problem elsewhere.

@steffenlarsen the long term problem is that libdevice doesn't do anything for ffast-math, maybe I should just disable the test for L0 at the moment, and add a TODO to add ffast-math functionality to libdevice. Once we have ffast-math working in libdevice then we can stop using LLVM intrinsics at all for SYCL SPIR-V compilation. Let me know your thoughts

steffenlarsen · 2023-07-21T07:35:28Z

Thank you for the clarification, @hdelan ! As long as it is not a regression, I am okay with disabling the test for targets we know don't support it, which I think in this case would be all SPIR-V targets, so maybe we should REQUIRE CUDA or HIP to be on the safe side here. Could you please open an issue here about this?

sycl/test-e2e/DeviceLib/cmath_test.cpp

hdelan · 2023-08-16T08:59:43Z

Failures are unrelated. Ping @intel/llvm-gatekeepers this can be merged

stdale-intel · 2023-08-17T04:20:17Z

Failures are unrelated. Ping @intel/llvm-gatekeepers this can be merged

@hdelan , due to the failures that were happening (no gen12 linux tests run at all), we are not able to accept these test results as known issues. I have rekicked off your test run now that issue is fixed. If all comes back clean, gatekeepers will proceed with merge.

dm-vodopyanov · 2023-08-17T10:08:05Z

SYCL :: DeviceLib/cmath_test.cpp test failed, which was modified in this PR. @hdelan can you please take a look?

hdelan · 2023-08-17T10:11:59Z

@dm-vodopyanov that should hopefully fix it. Let's wait and see

hdelan · 2023-08-17T13:15:03Z

@dm-vodopyanov failures are unrelated.

dm-vodopyanov · 2023-08-17T13:38:24Z

Failed tests on HIP:

Failed Tests (2):
  SYCL :: AtomicRef/max_generic_local_native_fp.cpp
  SYCL :: GroupAlgorithm/leader.cpp

Logs:

$ "env" "ONEAPI_DEVICE_SELECTOR=ext_oneapi_hip:gpu" "/__w/llvm/llvm/build-e2e/AtomicRef/Output/max_generic_local_native_fp.cpp.tmp.out"
# command stderr:
Memory access fault by GPU node-1 (Agent handle: 0xd9[18](https://github.com/intel/llvm/actions/runs/5889588137/job/15974038334?pr=9768#step:22:19)c0) on address 0x7f93ec800000. Reason: Page not present or supervisor privilege.

error: command failed with exit status: -6

--

$ "env" "ONEAPI_DEVICE_SELECTOR=ext_oneapi_hip:gpu" "/__w/llvm/llvm/build-e2e/GroupAlgorithm/Output/leader.cpp.tmp.out"
# command stderr:
Memory access fault by GPU node-1 (Agent handle: 0x[23](https://github.com/intel/llvm/actions/runs/5889588137/job/15974038334?pr=9768#step:22:24)dea[70](https://github.com/intel/llvm/actions/runs/5889588137/job/15974038334?pr=9768#step:22:71)) on address 0x7fe8aa800000. Reason: Page not present or supervisor privilege.

error: command failed with exit status: -6

Since #9768 `ffast-math` no longer chooses llvm intrinsics. This highlighted that `std::round` was missing in libdevice. This fixes that.

Hugh Delaney added 2 commits June 7, 2023 11:37

Add initial support for builtin selection

95cd955

Make builtin selection not happen only for NVPTX backend for SYCL

8713b4e

hdelan requested review from a team as code owners June 7, 2023 10:48

hdelan temporarily deployed to aws June 7, 2023 11:09 — with GitHub Actions Inactive

hdelan temporarily deployed to aws June 7, 2023 11:49 — with GitHub Actions Inactive

elizabethandrews reviewed Jun 7, 2023

View reviewed changes

mdtoguchi approved these changes Jun 9, 2023

View reviewed changes

Add test for ffast math with CXX stdlib funcs

4b8f8c5

hdelan requested a review from a team as a code owner June 12, 2023 09:49

hdelan requested a review from steffenlarsen June 12, 2023 09:49

hdelan temporarily deployed to aws June 12, 2023 10:13 — with GitHub Actions Inactive

hdelan temporarily deployed to aws June 12, 2023 10:53 — with GitHub Actions Inactive

elizabethandrews reviewed Jun 12, 2023

View reviewed changes

Hugh Delaney added 2 commits June 15, 2023 10:25

Remove duplicated line.

66aef27

Add cmath test for libdevice

3fa8784

hdelan temporarily deployed to aws June 15, 2023 21:08 — with GitHub Actions Inactive

hdelan temporarily deployed to aws June 15, 2023 21:56 — with GitHub Actions Inactive

hdelan closed this Jun 16, 2023

hdelan reopened this Jun 16, 2023

Merge branch 'sycl' into change-builtin-selection-for-SYCL

d003b1a

hdelan temporarily deployed to aws June 16, 2023 10:02 — with GitHub Actions Inactive

hdelan temporarily deployed to aws June 16, 2023 16:08 — with GitHub Actions Inactive

npmiller requested a review from elizabethandrews June 19, 2023 13:29

elizabethandrews approved these changes Jun 20, 2023

View reviewed changes

elizabethandrews reviewed Jun 20, 2023

View reviewed changes

clang/test/CodeGenSYCL/sycl_libdevice_cmath.cpp Outdated Show resolved Hide resolved

Hugh Delaney added 2 commits June 21, 2023 10:47

Changing name to align with other sycl tests and removing driver invo…

797cbb5

…cation from test

Renaming test to align with other sycl tests

009baac

hdelan temporarily deployed to aws June 21, 2023 10:43 — with GitHub Actions Inactive

Disable test for SPIR-V]

61d9276

hdelan mentioned this pull request Jul 21, 2023

Make libdevice work with -ffast-math #10517

Open

hdelan temporarily deployed to aws July 21, 2023 11:26 — with GitHub Actions Inactive

hdelan temporarily deployed to aws July 21, 2023 12:04 — with GitHub Actions Inactive

hdelan closed this Jul 24, 2023

hdelan reopened this Jul 24, 2023

hdelan temporarily deployed to aws July 24, 2023 11:15 — with GitHub Actions Inactive

hdelan temporarily deployed to aws July 24, 2023 11:54 — with GitHub Actions Inactive

hdelan closed this Aug 2, 2023

hdelan reopened this Aug 2, 2023

hdelan temporarily deployed to aws August 2, 2023 12:38 — with GitHub Actions Inactive

hdelan temporarily deployed to aws August 2, 2023 13:18 — with GitHub Actions Inactive

npmiller reviewed Aug 15, 2023

View reviewed changes

sycl/test-e2e/DeviceLib/cmath_test.cpp Outdated Show resolved Hide resolved

smanna12 approved these changes Aug 15, 2023

View reviewed changes

Hugh Delaney added 2 commits August 15, 2023 14:27

Only run fast cmath test for CUDA

76557c8

Fix typo

420a5f1

Compile fast math only on CUDA

31ebabb

dm-vodopyanov changed the title ~~[SYCL][CUDA] Change builtin selection for sycl~~ [SYCL][CUDA] Change builtin selection for SYCL Aug 17, 2023

dm-vodopyanov merged commit 970a2df into intel:sycl Aug 17, 2023

hdelan mentioned this pull request Aug 21, 2023

[SYCL][Devicelib] Add missing round in devicelib #10904

Merged

againull pushed a commit that referenced this pull request Aug 23, 2023

[SYCL][Devicelib] Add missing round in devicelib (#10904)

8f75531

Since #9768 `ffast-math` no longer chooses llvm intrinsics. This highlighted that `std::round` was missing in libdevice. This fixes that.

[SYCL][CUDA] Change builtin selection for SYCL #9768

[SYCL][CUDA] Change builtin selection for SYCL #9768

Uh oh!

Conversation

hdelan commented Jun 7, 2023

Uh oh!

elizabethandrews left a comment

Choose a reason for hiding this comment

Uh oh!

mdtoguchi left a comment

Choose a reason for hiding this comment

Uh oh!

hdelan commented Jun 12, 2023

Uh oh!

elizabethandrews Jun 12, 2023

Choose a reason for hiding this comment

Uh oh!

hdelan Jun 15, 2023

Choose a reason for hiding this comment

Uh oh!

elizabethandrews Jun 20, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

MrSidims commented Jul 20, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

hdelan commented Jul 20, 2023

Uh oh!

steffenlarsen commented Jul 21, 2023

Uh oh!

Uh oh!

hdelan commented Aug 16, 2023

Uh oh!

stdale-intel commented Aug 17, 2023

Uh oh!

dm-vodopyanov commented Aug 17, 2023

Uh oh!

hdelan commented Aug 17, 2023

Uh oh!

hdelan commented Aug 17, 2023

Uh oh!

dm-vodopyanov commented Aug 17, 2023

Uh oh!

Uh oh!

elizabethandrews Jun 20, 2023 •

edited

Loading

MrSidims commented Jul 20, 2023 •

edited

Loading