Skip to content

std::this_thread::sleep_until may block forever #43183

Open
@llvmbot

Description

@llvmbot
Bugzilla Link 43838
Version 7.0
OS Linux
Attachments Example code to hit the race condition. May need adjustment of loop_delay. Compile with e.g. clang++ -o test -O2 -stdlib=libc++ main.cpp on a machine with an affected libcxx version.
Reporter LLVM Bugzilla Contributor
CC @zmodem,@mclow,@shubhdev

Extended Description

There is a race condition in condition_variable::wait_until in __mutex_base that that can result in an infinite wait.
Potentially all released libcxx versions are affected.

The issue surfaces quite easily (and often) when using high resolution (nanosecond resolution) clocks and made us ban all uses of standard library functions that can hit the code in question.

Code in question is (__mutex_base, version 7.0):

378 template <class _Clock, class _Duration>
379 cv_status
380 condition_variable::wait_until(unique_lock& __lk,
381 const chrono::time_point<_Clock, _Duration>& __t)
382 {
383 using namespace chrono;
384 wait_for(__lk, __t - _Clock::now());
385 return _Clock::now() < __t ? cv_status::no_timeout : cv_status::timeout;
386 }

where, depending on the clock representation, an underflow can occur in line 384 (time is advancing, and if __t was too close to the current time point, the underflow will hit). Seems like all released version (up to 9.0) have this piece of code. There is a commit on master that will prevent the underflow happening that's not yet in the released versions.

Example stack traces based on the example in the attachment:

(lldb) bt

  • thread #​1, name = 'test', stop reason = signal SIGSTOP
    frame #​0: 0x00007fb5875bff85 libpthread.so.0__pthread_cond_timedwait at futex-internal.h:205 frame #&#8203;1: 0x00007fb5875bff60 libpthread.so.0__pthread_cond_timedwait at pthread_cond_wait.c:539
    frame #​2: 0x00007fb5875bfe00 libpthread.so.0__pthread_cond_timedwait(cond=0x00007ffdcca62780, mutex=0x00007ffdcca627b0, abstime=0x00007ffdcca61a50) at pthread_cond_wait.c:667 frame #&#8203;3: 0x00007fb5883e7f15 libc++.so.1std::__1::condition_variable::__do_timed_wait(std::__1::unique_lockstd::__1::mutex&, std::__1::chrono::time_point<std::__1::chrono::system_clock, std::__1::chrono::duration<long long, std::__1::ratio<1l, 1000000000l> > >) + 101
    • frame #​4: 0x00000000004036da teststd::__1::cv_status std::__1::condition_variable::wait_for<unsigned long long, std::__1::ratio<1l, 1000000000l> >(this=0x00007ffdcca62780, __lk=0x00007ffdcca62770, __d=0x00007ffdcca624b8) at __mutex_base:418 frame #&#8203;5: 0x000000000040286c teststd::__1::cv_status std::__1::condition_variable::wait_until<tai_clock, std::__1::chrono::duration<unsigned long long, std::__1::ratio<1l, 1000000000l> > >(this=0x00007ffdcca62780, __lk=0x00007ffdcca62770, __t=0x00007ffdcca629a0) at __mutex_base:384
      frame #​6: 0x0000000000401a20 testvoid std::__1::this_thread::sleep_until<tai_clock, std::__1::chrono::duration<unsigned long long, std::__1::ratio<1l, 1000000000l> > >(__t=0x00007ffdcca629a0) at thread:461 frame #&#8203;7: 0x00000000004015a5 testmain(argc=1, argv=0x00007ffdcca62c88) at main.cpp:41
      frame #​8: 0x00007fb5877f2b97 libc.so.6__libc_start_main(main=(testmain at main.cpp:36), argc=1, argv=0x00007ffdcca62c88, init=, fini=, rtld_fini=, stack_end=0x00007ffdcca62c78) at libc-start.c:310
      frame #​9: 0x000000000040113a test`_start + 42

With the following stack frame variables (in wait_for, called by wait_until):

(lldb) frame info
frame #​4: 0x00000000004036da test`std::__1::cv_status std::__1::condition_variable::wait_for<unsigned long long, std::__1::ratio<1l, 1000000000l> >(this=0x00007ffdcca62780, __lk=0x00007ffdcca62770, __d=0x00007ffdcca624b8) at __mutex_base:418

(lldb) frame variable
(std::__1::condition_variable *) this = 0x00007ffdcca62780
(std::__1::unique_lockstd::__1::mutex &) __lk = 0x00007ffdcca62770: {
_m = 0x00007ffdcca627b0
_owns = true
}
(const std::__1::chrono::duration<unsigned long long, std::__1::ratio<1, 1000000000> > &) __d = 0x00007ffdcca624b8 (_rep = 18446744073709549768)
(__sys_tpf) _Max = {
_d = (_rep = 9223372036854775807)
}
(std::__1::chrono::steady_clock::time_point) __c_now = {
_d = (_rep = 141408845845668)
}
(std::__1::chrono::system_clock::time_point) __s_now = {
_d = (_rep = 1572277999316767)
}

This occurs quite frequently in our setup that uses a nanosecond resolution clock with a uint64_t representation type (see example code) and we had to ban all uses of functions that can hit the wait_until eventually.

The standard system_clock/steady_clock looks to be a microsecond resolution clock. For those clocks the probability of the underflow happening is pretty low (but is non-zero).

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugzillaIssues migrated from bugzillalibc++libc++ C++ Standard Library. Not GNU libstdc++. Not libc++abi.threadingissues related to threading

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions