[SYCL] WA for RT hang when exit(1) unexpectedly called in backend & SYCL_PI_TRACE enabled #5630

KseniyaTikhomirova · 2022-02-22T14:44:11Z

WAs the case when underlying layer calls exit(1). The only intension of this commit is to eliminate RT hang in this case.

…xit() If exit() is called by module inside pi call wrapped by mutex for pi traces RT hangs on tear down. Introduces mutex lock skip in this case. Signed-off-by: Tikhomirova, Kseniya <kseniya.tikhomirova@intel.com>

…lock Signed-off-by: Tikhomirova, Kseniya <kseniya.tikhomirova@intel.com>

sycl/source/detail/global_handler.cpp

sycl/source/detail/plugin.hpp

smaslov-intel · 2022-02-22T15:20:39Z

sycl/source/detail/plugin.hpp

 private:
  std::shared_ptr<RT::PiPlugin> MPlugin;
  backend MBackend;
  void *MLibraryHandle; // the handle returned from dlopen
-  std::shared_ptr<std::mutex> TracingMutex;
+  std::shared_ptr<SpinLock> MTracingSLock;


How using SpinLock here is better than std::mutex?

In our case the thread terminates while owning lock - undefined behavior for std::mutex; or if we skip mutex lock/unlock as I did - undefined behavior for std::mutex.

https://en.cppreference.com/w/cpp/thread/mutex/~mutex

"Destroys the mutex.

The behavior is undefined if the mutex is owned by any thread or if any thread terminates while holding any ownership of the mutex."

And what is the behavior of SpinLock?

in common, lock/unlock just change stored value in std::atomic_flag. no limitations as with mutex is documented.

smaslov-intel · 2022-02-22T15:22:54Z

sycl/source/detail/plugin.hpp

+      // call sync primitive can be still locked.
+      auto Guard = MShutdownModeEnabled
+                       ? std::unique_lock<SpinLock>()
+                       : std::unique_lock<SpinLock>(*MTracingSLock);


So, you avoid locking if tracing called at shutdown, right? Can you explain what is the UB being addressed here? Also, is this really the only lock that is problematic in case of process termination?

sure, you understanding is correct. If anyone inside pi call calls exit(1) (e.g. assert failure) and traces are enabled we will have a hang because we have:

mutex.lock

pi call that calls exit() and never returns

unreachable mutex.unlock;
Exit() causes normal process termination and provides opportunity to release resources. In this case we do shutdown and call piTearDown that is also wrapped by the same mutex.
In this case right now RT just hangs.

For the problematic places:
the situation with exit in backends may affect two places that I know:

PI calls tracing - environment variable is documented; addressed by this PR

ze calls tracing if it is called in piTearDown (?) - ZE_SERIALIZE=1, not documented, fully internal, not addressed. https://github.com/intel/llvm/blob/sycl/sycl/plugins/level_zero/pi_level_zero.cpp#L51

double checked - ze calls tracing is not affected, piTearDown doesn't call any

If anyone inside pi call calls exit(1)

Which code does that? Maybe we should disallow exit() from plugins? It should only return error codes to SYCL RT or terminate() in case of fatal error (and no shutdown() in this case).

Maybe we should just turn the mutex around PI call to be "timed_mutex"? Also, holding the mutex around some PI calls (e.g. piProgramBuild) may be too expensive and freeze other threads for no good reasons. The "timed_mutex" will only hold it for, say 1 sec. Yes, some lengthy calls will have it's output mixed with other threads output. That would be addressed in long-term by using XPTI for tracing.

Why is there a need for mutex at all?
Is it only due multiline output for PI tracing?
Is it feasible to switch to single-line output like e.g. strace?
Any implications for such a change?

@alexbatashev , do you have anything to advise here?

The mutex is to make trace readable, i.e. such that output from multiple threads is not mixed. Remember, some prints are before PI call, some after, and some other are from within the PI call.

@s-kanaev I'm not sure if we need to change anything at all. The original hang was caused by a bug in the underlying runtime. This workaround is for that specific case, there's no guarantee that it won't fire in another case. XPTI tracing apparently is not affected by this issue, and it is almost ready to be merged: #5389, so, we should probably focus on improving that tool, rather than putting some questionable w/as all over runtime. If we still want to fix SYCL_PI_TRACE, @smaslov-intel has a good advice - timed mutex with long enough timeout. That'll be less of a hack than what's proposed by this PR.

https://en.cppreference.com/w/cpp/thread/timed_mutex/~timed_mutex

timed mutex has the same UB conditions as std::mutex. Not sure if it would be safe enough to use it.

The behavior is undefined if the mutex is owned by any thread or if any thread terminates while holding any ownership of the mutex.

Signed-off-by: Tikhomirova, Kseniya <kseniya.tikhomirova@intel.com>

KseniyaTikhomirova added 2 commits February 21, 2022 18:45

Restore global handler destruction logic and replace mutex with spin …

116927f

…lock Signed-off-by: Tikhomirova, Kseniya <kseniya.tikhomirova@intel.com>

KseniyaTikhomirova requested a review from a team as a code owner February 22, 2022 14:44

KseniyaTikhomirova requested review from smaslov-intel, s-kanaev, alexbatashev and dm-vodopyanov February 22, 2022 14:44

s-kanaev reviewed Feb 22, 2022

View reviewed changes

sycl/source/detail/global_handler.cpp Outdated Show resolved Hide resolved

sycl/source/detail/plugin.hpp Show resolved Hide resolved

smaslov-intel reviewed Feb 22, 2022

View reviewed changes

Fix typo

893787c

Signed-off-by: Tikhomirova, Kseniya <kseniya.tikhomirova@intel.com>

KseniyaTikhomirova closed this Feb 24, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[SYCL] WA for RT hang when exit(1) unexpectedly called in backend & SYCL_PI_TRACE enabled #5630

[SYCL] WA for RT hang when exit(1) unexpectedly called in backend & SYCL_PI_TRACE enabled #5630

Uh oh!

KseniyaTikhomirova commented Feb 22, 2022

Uh oh!

Uh oh!

Uh oh!

smaslov-intel Feb 22, 2022

Uh oh!

KseniyaTikhomirova Feb 22, 2022

Uh oh!

smaslov-intel Feb 22, 2022

Uh oh!

KseniyaTikhomirova Feb 22, 2022

Uh oh!

smaslov-intel Feb 22, 2022

Uh oh!

KseniyaTikhomirova Feb 22, 2022

Uh oh!

KseniyaTikhomirova Feb 22, 2022 •

edited

Loading

Uh oh!

KseniyaTikhomirova Feb 22, 2022

Uh oh!

smaslov-intel Feb 22, 2022

Uh oh!

smaslov-intel Feb 22, 2022

Uh oh!

s-kanaev Feb 22, 2022

Uh oh!

smaslov-intel Feb 22, 2022

Uh oh!

alexbatashev Feb 23, 2022 •

edited

Loading

Uh oh!

KseniyaTikhomirova Feb 24, 2022

Uh oh!

Uh oh!

[SYCL] WA for RT hang when exit(1) unexpectedly called in backend & SYCL_PI_TRACE enabled #5630

[SYCL] WA for RT hang when exit(1) unexpectedly called in backend & SYCL_PI_TRACE enabled #5630

Uh oh!

Conversation

KseniyaTikhomirova commented Feb 22, 2022

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

KseniyaTikhomirova Feb 22, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

alexbatashev Feb 23, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

KseniyaTikhomirova Feb 22, 2022 •

edited

Loading

alexbatashev Feb 23, 2022 •

edited

Loading