Skip to content

[CMake] Disable STL-internal assertions in device code #7236

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Jan 18, 2023

Conversation

ldrumm
Copy link
Contributor

@ldrumm ldrumm commented Oct 31, 2022

SYCL_DISABLE_STL_ASSERTIONS=ON

Unless we maintain our own STL for SYCL we can't know the visibility of the internal assertion mechanism of libstdc++ and friends.

This caused random-looking build failures in device code for which we have no reliable method of avoision when I adopted gcc12.2.

@ldrumm ldrumm requested a review from a team as a code owner October 31, 2022 13:29
@ldrumm ldrumm requested a review from sergey-semenov October 31, 2022 13:29
@ldrumm
Copy link
Contributor Author

ldrumm commented Oct 31, 2022

Before:


  1 FAIL: SYCL :: extensions/properties/properties_kernel_device_has.cpp (160 of 377)
  2 ******************** TEST 'SYCL :: extensions/properties/properties_kernel_device_has.cpp' FAILED ********************
  3 Script:
  4 --
  5 : 'RUN: at line 1';   ~/src/upstream/llvm-project/build/bin/clang --driver-mode=g++  -D_GLIBCXX_ASSERTIONS=1 -fsycl-device-only -S -Xclang -emit-llvm ~/src    /upstream/llvm-project/sycl/test/extensions/properties/properties_kernel_device_has.cpp -o - | FileCheck ~/src/upstream/llvm-project/sycl/test/extensions/p    roperties/properties_kernel_device_has.cpp --check-prefix CHECK-IR
  6 : 'RUN: at line 2';   ~/src/upstream/llvm-project/build/bin/clang --driver-mode=g++  -D_GLIBCXX_ASSERTIONS=1 -fsycl -Xclang -verify ~/src/upstream/llvm-pro    ject/sycl/test/extensions/properties/properties_kernel_device_has.cpp
  7 --
  8 Exit Code: 2
  9
 10 Command Output (stdout):
 11 --
 12 $ ":" "RUN: at line 1"
 13 note: command had no output on stdout or stderr
 14 $ "~/src/upstream/llvm-project/build/bin/clang" "--driver-mode=g++" "-D_GLIBCXX_ASSERTIONS=1" "-fsycl-device-only" "-S" "-Xclang" "-emit-llvm" "~/src/upstr    eam/llvm-project/sycl/test/extensions/properties/properties_kernel_device_has.cpp" "-o" "-"
 15 # command stderr:
 16 In file included from ~/src/upstream/llvm-project/sycl/test/extensions/properties/properties_kernel_device_has.cpp:5:
 17 In file included from ~/src/upstream/llvm-project/build/bin/../include/sycl/sycl.hpp:11:
 18 In file included from ~/src/upstream/llvm-project/build/bin/../include/sycl/accessor.hpp:12:
 19 In file included from ~/src/upstream/llvm-project/build/bin/../include/sycl/atomic.hpp:12:
 20 In file included from ~/src/upstream/llvm-project/build/bin/../include/sycl/access/access.hpp:11:
 21 In file included from ~/src/upstream/llvm-project/build/bin/../include/sycl/detail/common.hpp:168:
 22 In file included from ~/src/upstream/llvm-project/build/bin/../include/sycl/exception.hpp:18:
 23 In file included from ~/src/upstream/llvm-project/build/bin/../include/sycl/stl.hpp:15:
 24 In file included from ~/src/upstream/llvm-project/build/bin/../include/sycl/sycl_span.hpp:129:
 25 /usr/lib/gcc/x86_64-linux-gnu/12/../../../../include/c++/12/array:208:2: error: SYCL kernel cannot call an undefined function without SYCL_EXTERNAL attribu    te
 26         __glibcxx_requires_subscript(__n);
 27         ^
 28 /usr/lib/gcc/x86_64-linux-gnu/12/../../../../include/c++/12/debug/assertions.h:52:3: note: expanded from macro '__glibcxx_requires_subscript'
 29   __glibcxx_assert(_N < this->size())
 30   ^
 31 /usr/lib/gcc/x86_64-linux-gnu/12/../../../../include/x86_64-linux-gnu/c++/12/bits/c++config.h:588:8: note: expanded from macro '__glibcxx_assert'
 32   do { __glibcxx_assert_impl(cond); } while (false)
 33        ^
 34 /usr/lib/gcc/x86_64-linux-gnu/12/../../../../include/x86_64-linux-gnu/c++/12/bits/c++config.h:573:10: note: expanded from macro '__glibcxx_assert_impl'
[No Name] [+]
45 lines yanked


@ldrumm ldrumm force-pushed the SYCL_DISABLE_STL_ASSERTIONS=ON branch from 8f565ab to ffae919 Compare December 1, 2022 15:27
@steffenlarsen
Copy link
Contributor

Looks reasonable. @pvchupin | @cperkinsintel - If we merge this, should we enable the option for some CI configurations?

@pvchupin
Copy link
Contributor

pvchupin commented Dec 1, 2022

@ldrumm, I think #7483 could address this problem. We noticed similar issue on Ubuntu 22.
Can you check?
What GCC version do you have in the system?

@ldrumm
Copy link
Contributor Author

ldrumm commented Dec 2, 2022 via email

@aelovikov-intel
Copy link
Contributor

Would it disable it from both host and device compilation? Ideally, we'd like to do that for device only.

@pvchupin
Copy link
Contributor

pvchupin commented Dec 7, 2022

Looks like the root cause is in assert implementation that was implemented in libstdc++11 in a header as inline function with builtins:

  inline void
  __replacement_assert(const char* __file, int __line,
                       const char* __function, const char* __condition)
  {
    __builtin_printf("%s:%d: %s: Assertion '%s' failed.\n", __file, __line,
                     __function, __condition);
    __builtin_abort();
  }

and in libstdc++ 12 it is a library call:

  void
  __glibcxx_assert_fail(const char* __file, int __line,
                        const char* __function, const char* __condition)

which is basically:

  void
  __glibcxx_assert_fail(const char* file, int line,
                        const char* function, const char* condition) noexcept
  {
    if (file && function && condition)
      fprintf(stderr, "%s:%d: %s: Assertion '%s' failed.\n",
              file, line, function, condition);
    else if (function)
      fprintf(stderr, "%s: Undefined behavior detected.\n", function);
    abort();
  }

I agree that it makes sense to shut it down for device code only.
Unlike in #7483 where we just allowed __builtin_printf in device code, I don't see much alternatives here.
Interestingly we have some workarounds like https://github.com/intel/llvm/blob/sycl/sycl/include/sycl/sycl_span.hpp#L146-L150

@ldrumm
Copy link
Contributor Author

ldrumm commented Dec 7, 2022

I've dug around a bit today, and see that this macro is only used by being appended to SYCL_CLANG_EXTRA_FLAGS which itself is only used in lit tests.

I can't think of a way to enable it only for device code without putting logic in the driver to strip it off (and maybe warn) which I don't feel right about - especially when its value is questionable.

This might be like saying "we've never had to use that fire extinguisher so let's get rid of it", but: has this ever caught an error in how the C++ standard libraries are used?

The more I look at it, the more I question whether it should exist at all. So I'm happy to disable it by default, or delete all references, but I'm not happy about specializing it when it's only used by lit tests

@pvchupin
Copy link
Contributor

pvchupin commented Dec 8, 2022

This might be like saying "we've never had to use that fire extinguisher so let's get rid of it", but: has this ever caught an error in how the C++ standard libraries are used?

:) Good question.
@romanovvlad, @steffenlarsen, @aelovikov-intel, @AlexeySachkov, @bader, @alexbatashev, do you remember cases where it really helped?

We need to figure out next steps. It will hit us on the next OS uplift anyway.

@bader bader changed the title [cmake] Device code can't support STL-internal assertions. Default-off [CMake] Disable STL-internal assertions in device code Dec 8, 2022
@bader
Copy link
Contributor

bader commented Dec 8, 2022

This might be like saying "we've never had to use that fire extinguisher so let's get rid of it", but: has this ever caught an error in how the C++ standard libraries are used?

:) Good question. @romanovvlad, @steffenlarsen, @aelovikov-intel, @AlexeySachkov, @bader, @alexbatashev, do you remember cases where it really helped?

We need to figure out next steps. It will hit us on the next OS uplift anyway.

What exactly are we talking about here? Testing approach or libstdc++ support?

@alexbatashev
Copy link
Contributor

@pvchupin yes: #2310
Took me many hours to find this one. Would have never happened with STL assertions on. Basically, that is the reason I have enabled assertions inside runtime by default.

@pvchupin
Copy link
Contributor

pvchupin commented Dec 8, 2022

What exactly are we talking about here? Testing approach or libstdc++ support?

Both.
Our current testing approach doesn't scale well for newer libstdc++. It doesn't look like there is any issue on default libstdc++ use, but _GLIBCXX_ASSERTIONS that we use in testing doesn't work in device compilation mode with the newer library (libstdc++ 12).

  • This PR just turns it off. It will reduce testing we are doing.
  • We can try to turn it off for device compilation only
  • We can probably turn it off if libstdc++ version is 12 or higher
  • Anything else?

@bader
Copy link
Contributor

bader commented Dec 9, 2022

What exactly are we talking about here? Testing approach or libstdc++ support?

Both. Our current testing approach doesn't scale well for newer libstdc++. It doesn't look like there is any issue on default libstdc++ use, but _GLIBCXX_ASSERTIONS that we use in testing doesn't work in device compilation mode with the newer library (libstdc++ 12).

  • This PR just turns it off. It will reduce testing we are doing.
  • We can try to turn it off for device compilation only
  • We can probably turn it off if libstdc++ version is 12 or higher
  • Anything else?

Why do we want to turn it off? I understand that it fails with libstdc++12, but we have a test that tells us about it. That's exactly what we added test for. I think the right course of actions is to fix compilation for libstdc++12.

@pvchupin
Copy link
Contributor

I think the right course of actions is to fix compilation for libstdc++12

How?
Should we do something similar to #3774 ?
Tagging @intel/dpcpp-cfe-reviewers

@premanandrao
Copy link
Contributor

I think the right course of actions is to fix compilation for libstdc++12

How? Should we do something similar to #3774 ? Tagging @intel/dpcpp-cfe-reviewers

Allow calls on device to __glibcxx-prefixed routines?

@AlexeySachkov
Copy link
Contributor

I think the right course of actions is to fix compilation for libstdc++12

How? Should we do something similar to #3774 ? Tagging @intel/dpcpp-cfe-reviewers

Allow calls on device to __glibcxx-prefixed routines?

So we will simply fail at JIT stage instead of compilation stage? For LIT tests that's probably not an issue, but do we really want to tweak our FE simply to support our unit-tests and moreover their host part?

I would suggest that we simply set this macro only for host compilation but not for device: this way RT folks will have instrumentations which helps during analysis/debugging/development but at the same time we won't be forced to deal with new built-ins or unsupported constructs coming from new versions of headers which we don't control.

@bader
Copy link
Contributor

bader commented Dec 13, 2022

I don't think there is a good protection from potential issues new versions of libstdc++ might bring. I thought our strategy is reactive i.e. enable new versions by implementing missing compiler built-ins for GPU. Tagging @jinge90.
The alternative is to provide our own standard library implementation.

@pvchupin
Copy link
Contributor

This issue is currently observed at https://github.com/intel/llvm/actions/runs/3753931926/jobs/6377676704 and blocking #7671

@ldrumm
Copy link
Contributor Author

ldrumm commented Dec 22, 2022

It seems we're split on this. I personally think changing the default and then having those who care otherwise pass the cmake switch during their own configure to be a reasonable compromise. As Pavel notes: this is causing blocking issues right now. I think we should merge the present patch, and if anyone has better ideas for solving this in a clean way in-general they can come back to it later

@ldrumm
Copy link
Contributor Author

ldrumm commented Jan 17, 2023

What's the current state of things here?

Both viewpoints are valid, but I'd like people to remain pragmatic and understand that if you need the old behaviour you can simply flip the switch, but for new users it's a big barrier to actually using SYCL on newer machines.

Would it satisfy the "no"s in the room if I ensure that the buildbots re-enable this flag for CI?

@aelovikov-intel
Copy link
Contributor

I think @bader and @sergey-semenov (plus maybe @steffenlarsen) have to make the decision.

@bader
Copy link
Contributor

bader commented Jan 18, 2023

I'm okay with merging this PR to unblock #7671, but we need to file an issue to enable assertions back.

@sergey-semenov
Copy link
Contributor

I'm okay with merging this PR to unblock #7671, but we need to file an issue to enable assertions back.

Agreed.

@bader
Copy link
Contributor

bader commented Jan 18, 2023

@ldrumm, are pre-commit failures related to the change?

Could you also open an issue to re-enable assertions, please?

@ldrumm
Copy link
Contributor Author

ldrumm commented Jan 18, 2023

@ldrumm, are pre-commit failures related to the change?

They mention assert, so they're probably relevant. I will investigate

Could you also open an issue to re-enable assertions, please?

#8044

@bader
Copy link
Contributor

bader commented Jan 18, 2023

@ldrumm, are pre-commit failures related to the change?

They mention assert, so they're probably relevant. I will investigate

I suggest you merge the tip of the sycl branch to re-run testing. The latest results are 1.5 months old, so we probably should get recent results before doing the analysis.

SYCL_DISABLE_STL_ASSERTIONS=ON

Unless we maintain our own STL for SYCL we can't know the visibility of
the internal assertion mechanism of libstdc++ and friends. This caused
random-looking build failures in device code for which we have no
reliable method of avoision when I adopted gcc12.2.
@ldrumm ldrumm force-pushed the SYCL_DISABLE_STL_ASSERTIONS=ON branch from ffae919 to 30444db Compare January 18, 2023 18:55
@ldrumm ldrumm temporarily deployed to aws January 18, 2023 19:21 — with GitHub Actions Inactive
@ldrumm ldrumm temporarily deployed to aws January 18, 2023 19:53 — with GitHub Actions Inactive
@bader
Copy link
Contributor

bader commented Jan 18, 2023

There is only one failure, which is known issue, so this PR is ready to go.

Failed Tests (1):
SYCL :: BFloat16/bfloat16_builtins.cpp

@bader bader merged commit ab69ac8 into intel:sycl Jan 18, 2023
@ldrumm ldrumm deleted the SYCL_DISABLE_STL_ASSERTIONS=ON branch January 19, 2023 09:41
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

9 participants