Skip to content

[SYCL] WA for RT hang when exit(1) unexpectedly called in backend & SYCL_PI_TRACE enabled #5630

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
6 changes: 6 additions & 0 deletions sycl/source/detail/global_handler.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -143,6 +143,12 @@ void shutdown() {
// there's no need to load and unload plugins.
if (GlobalHandler::instance().MPlugins.Inst) {
for (plugin &Plugin : GlobalHandler::instance().getPlugins()) {
// shutdown() is called once and only when process is terminating.
// Till the time it is called all threads using RT must be closed so it
// should be safe to work with plugin without multi thread protection.
// Shutdown mode allows to skip some potentially unsafe code
// (lock/unlock).
Plugin.enableShutdownMode();
// PluginParameter is reserved for future use that can control
// some parameters in the plugin tear-down process.
// Currently, it is not used.
Expand Down
17 changes: 14 additions & 3 deletions sycl/source/detail/plugin.hpp
Original file line number Diff line number Diff line change
Expand Up @@ -10,6 +10,7 @@
#include <CL/sycl/backend_types.hpp>
#include <CL/sycl/detail/common.hpp>
#include <CL/sycl/detail/pi.hpp>
#include <CL/sycl/detail/spinlock.hpp>
#include <CL/sycl/detail/type_traits.hpp>
#include <CL/sycl/stl.hpp>
#include <detail/plugin_printers.hpp>
Expand Down Expand Up @@ -92,7 +93,7 @@ class plugin {
plugin(const std::shared_ptr<RT::PiPlugin> &Plugin, backend UseBackend,
void *LibraryHandle)
: MPlugin(Plugin), MBackend(UseBackend), MLibraryHandle(LibraryHandle),
TracingMutex(std::make_shared<std::mutex>()),
MTracingSLock(std::make_shared<SpinLock>()),
MPluginMutex(std::make_shared<std::mutex>()) {}

plugin &operator=(const plugin &) = default;
Expand Down Expand Up @@ -163,7 +164,13 @@ class plugin {
#endif
RT::PiResult R;
if (pi::trace(pi::TraceLevel::PI_TRACE_CALLS)) {
std::lock_guard<std::mutex> Guard(*TracingMutex);
// MShutdownModeEnabled signals if process is terminating now. It covers
// normal case and undefined behavior when it happens by initiative from
// underlying modules (calling exit(1)). In case of unexpected exit(1)
// call sync primitive can be still locked.
auto Guard = MShutdownModeEnabled
? std::unique_lock<SpinLock>()
: std::unique_lock<SpinLock>(*MTracingSLock);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So, you avoid locking if tracing called at shutdown, right? Can you explain what is the UB being addressed here? Also, is this really the only lock that is problematic in case of process termination?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sure, you understanding is correct. If anyone inside pi call calls exit(1) (e.g. assert failure) and traces are enabled we will have a hang because we have:

  • mutex.lock
  • pi call that calls exit() and never returns
  • unreachable mutex.unlock;
    Exit() causes normal process termination and provides opportunity to release resources. In this case we do shutdown and call piTearDown that is also wrapped by the same mutex.
    In this case right now RT just hangs.

Copy link
Contributor Author

@KseniyaTikhomirova KseniyaTikhomirova Feb 22, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For the problematic places:
the situation with exit in backends may affect two places that I know:

  1. PI calls tracing - environment variable is documented; addressed by this PR
  2. ze calls tracing if it is called in piTearDown (?) - ZE_SERIALIZE=1, not documented, fully internal, not addressed. https://github.com/intel/llvm/blob/sycl/sycl/plugins/level_zero/pi_level_zero.cpp#L51

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

double checked - ze calls tracing is not affected, piTearDown doesn't call any

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If anyone inside pi call calls exit(1)

Which code does that? Maybe we should disallow exit() from plugins? It should only return error codes to SYCL RT or terminate() in case of fatal error (and no shutdown() in this case).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we should just turn the mutex around PI call to be "timed_mutex"? Also, holding the mutex around some PI calls (e.g. piProgramBuild) may be too expensive and freeze other threads for no good reasons. The "timed_mutex" will only hold it for, say 1 sec. Yes, some lengthy calls will have it's output mixed with other threads output. That would be addressed in long-term by using XPTI for tracing.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is there a need for mutex at all?
Is it only due multiline output for PI tracing?
Is it feasible to switch to single-line output like e.g. strace?
Any implications for such a change?

@alexbatashev , do you have anything to advise here?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The mutex is to make trace readable, i.e. such that output from multiple threads is not mixed. Remember, some prints are before PI call, some after, and some other are from within the PI call.

Copy link
Contributor

@alexbatashev alexbatashev Feb 23, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@s-kanaev I'm not sure if we need to change anything at all. The original hang was caused by a bug in the underlying runtime. This workaround is for that specific case, there's no guarantee that it won't fire in another case. XPTI tracing apparently is not affected by this issue, and it is almost ready to be merged: #5389, so, we should probably focus on improving that tool, rather than putting some questionable w/as all over runtime. If we still want to fix SYCL_PI_TRACE, @smaslov-intel has a good advice - timed mutex with long enough timeout. That'll be less of a hack than what's proposed by this PR.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

https://en.cppreference.com/w/cpp/thread/timed_mutex/~timed_mutex

timed mutex has the same UB conditions as std::mutex. Not sure if it would be safe enough to use it.

The behavior is undefined if the mutex is owned by any thread or if any thread terminates while holding any ownership of the mutex.

const char *FnName = PiCallInfo.getFuncName();
std::cout << "---> " << FnName << "(" << std::endl;
RT::printArgs(Args...);
Expand Down Expand Up @@ -245,11 +252,13 @@ class plugin {

std::shared_ptr<std::mutex> getPluginMutex() { return MPluginMutex; }

void enableShutdownMode() { MShutdownModeEnabled = true; }

private:
std::shared_ptr<RT::PiPlugin> MPlugin;
backend MBackend;
void *MLibraryHandle; // the handle returned from dlopen
std::shared_ptr<std::mutex> TracingMutex;
std::shared_ptr<SpinLock> MTracingSLock;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How using SpinLock here is better than std::mutex?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In our case the thread terminates while owning lock - undefined behavior for std::mutex; or if we skip mutex lock/unlock as I did - undefined behavior for std::mutex.

https://en.cppreference.com/w/cpp/thread/mutex/~mutex

"Destroys the mutex.

The behavior is undefined if the mutex is owned by any thread or if any thread terminates while holding any ownership of the mutex."

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And what is the behavior of SpinLock?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

in common, lock/unlock just change stored value in std::atomic_flag. no limitations as with mutex is documented.

// Mutex to guard PiPlatforms and LastDeviceIds.
// Note that this is a temporary solution until we implement the global
// Device/Platform cache later.
Expand All @@ -259,6 +268,8 @@ class plugin {
// represents the unique ids of the last device of each platform
// index of this vector corresponds to the index in PiPlatforms vector.
std::vector<int> LastDeviceIds;

bool MShutdownModeEnabled = false;
}; // class plugin
} // namespace detail
} // namespace sycl
Expand Down