Skip to content

Sporadic deadlock in HostAccessorDeadLockTest.CheckThreadOrder unit test #6944

Closed
@AlexeySachkov

Description

@AlexeySachkov

Describe the bug

HostAccessorDeadLockTest.CheckThreadOrder unit test sporadically hangs when launched on M1 hardware.

To Reproduce

Just build the compiler and launch the test on M1 hardware. The hang is not stable and the test sometimes passes:

asachkov@Alexeys-MacBook-Pro build % ./tools/sycl/unittests/thread_safety/ThreadSafetyTests
[==========] Running 1 test from 1 test suite.                                             
[----------] Global test environment set-up.                                               
[----------] 1 test from HostAccessorDeadLockTest                                          
[ RUN      ] HostAccessorDeadLockTest.CheckThreadOrder                                     
^C                                                                                         
asachkov@Alexeys-MacBook-Pro build % ./tools/sycl/unittests/thread_safety/ThreadSafetyTests
[==========] Running 1 test from 1 test suite.                                             
[----------] Global test environment set-up.                                               
[----------] 1 test from HostAccessorDeadLockTest                                          
[ RUN      ] HostAccessorDeadLockTest.CheckThreadOrder                                     
[       OK ] HostAccessorDeadLockTest.CheckThreadOrder (1 ms)                              
[----------] 1 test from HostAccessorDeadLockTest (1 ms total)                             
                                                                                           
[----------] Global test environment tear-down                                             
[==========] 1 test from 1 test suite ran. (1 ms total)                                    
[  PASSED  ] 1 test.                                                                       
asachkov@Alexeys-MacBook-Pro build % ./tools/sycl/unittests/thread_safety/ThreadSafetyTests
[==========] Running 1 test from 1 test suite.                                             
[----------] Global test environment set-up.                                               
[----------] 1 test from HostAccessorDeadLockTest                                          
[ RUN      ] HostAccessorDeadLockTest.CheckThreadOrder                                     
^C                                                                                         
asachkov@Alexeys-MacBook-Pro build % ./tools/sycl/unittests/thread_safety/ThreadSafetyTests
[==========] Running 1 test from 1 test suite.                                             
[----------] Global test environment set-up.                                               
[----------] 1 test from HostAccessorDeadLockTest                                          
[ RUN      ] HostAccessorDeadLockTest.CheckThreadOrder                                     
[       OK ] HostAccessorDeadLockTest.CheckThreadOrder (1 ms)                              
[----------] 1 test from HostAccessorDeadLockTest (1 ms total)                             
                                                                                           
[----------] Global test environment tear-down                                             
[==========] 1 test from 1 test suite ran. (2 ms total)                                    
[  PASSED  ] 1 test.                                                                       
asachkov@Alexeys-MacBook-Pro build % ./tools/sycl/unittests/thread_safety/ThreadSafetyTests
[==========] Running 1 test from 1 test suite.                                             
[----------] Global test environment set-up.                                               
[----------] 1 test from HostAccessorDeadLockTest                                          
[ RUN      ] HostAccessorDeadLockTest.CheckThreadOrder                                     
[       OK ] HostAccessorDeadLockTest.CheckThreadOrder (1 ms)                              
[----------] 1 test from HostAccessorDeadLockTest (1 ms total)                             
                                                                                           
[----------] Global test environment tear-down                                             
[==========] 1 test from 1 test suite ran. (1 ms total)                                    
[  PASSED  ] 1 test.                                                                       
asachkov@Alexeys-MacBook-Pro build %                                                       
asachkov@Alexeys-MacBook-Pro build % ./tools/sycl/unittests/thread_safety/ThreadSafetyTests
[==========] Running 1 test from 1 test suite.                                             
[----------] Global test environment set-up.                                               
[----------] 1 test from HostAccessorDeadLockTest                                          
[ RUN      ] HostAccessorDeadLockTest.CheckThreadOrder                                     
^C                                                                                         
asachkov@Alexeys-MacBook-Pro build % ./tools/sycl/unittests/thread_safety/ThreadSafetyTests
[==========] Running 1 test from 1 test suite.                                             
[----------] Global test environment set-up.                                               
[----------] 1 test from HostAccessorDeadLockTest                                          
[ RUN      ] HostAccessorDeadLockTest.CheckThreadOrder                                     
[       OK ] HostAccessorDeadLockTest.CheckThreadOrder (1 ms)                              
[----------] 1 test from HostAccessorDeadLockTest (1 ms total)                             
                                                                                           
[----------] Global test environment tear-down                                             
[==========] 1 test from 1 test suite ran. (1 ms total)                                    
[  PASSED  ] 1 test.                                                                       
asachkov@Alexeys-MacBook-Pro build % ./tools/sycl/unittests/thread_safety/ThreadSafetyTests
[==========] Running 1 test from 1 test suite.                                             
[----------] Global test environment set-up.                                               
[----------] 1 test from HostAccessorDeadLockTest                                          
[ RUN      ] HostAccessorDeadLockTest.CheckThreadOrder                                     
^C                                                                                         
asachkov@Alexeys-MacBook-Pro build % ./tools/sycl/unittests/thread_safety/ThreadSafetyTests
[==========] Running 1 test from 1 test suite.                                             
[----------] Global test environment set-up.                                               
[----------] 1 test from HostAccessorDeadLockTest                                          
[ RUN      ] HostAccessorDeadLockTest.CheckThreadOrder                                     
[       OK ] HostAccessorDeadLockTest.CheckThreadOrder (0 ms)                              
[----------] 1 test from HostAccessorDeadLockTest (0 ms total)                             
                                                                                           
[----------] Global test environment tear-down                                             
[==========] 1 test from 1 test suite ran. (0 ms total)                                    
[  PASSED  ] 1 test.                                                                       

As you can see from the log above, it failed in ~50% cases for me. If you reduce the number of threads in the test down to two, it almost always passes (if not always).

Environment (please complete the following information):

  • OS: MacOS 12.6
  • Target device and vendor: Apple M1 Max
  • DPC++ version: nearly the most up-to-date commit at the time of writing this

Additional context

Per-thread backtrace:

Thread 1:

* thread #1, queue = 'com.apple.main-thread', stop reason = signal SIGSTOP                                 
  * frame #0: 0x000000018d4b0834 libsystem_kernel.dylib`__ulock_wait + 8                                   
    frame #1: 0x000000018d4ee5a0 libsystem_pthread.dylib`_pthread_join + 444                               
    frame #2: 0x000000018d447a14 libc++.1.dylib`std::__1::thread::join() + 36                              
    frame #3: 0x00000001000050ac ThreadSafetyTests`(anonymous namespace)::HostAccessorDeadLockTest_CheckThr
eadOrder_Test::TestBody() + 1028                                                                           
    frame #4: 0x00000001000c5444 ThreadSafetyTests`testing::Test::Run() + 488                              
    frame #5: 0x00000001000c62e4 ThreadSafetyTests`testing::TestInfo::Run() + 528                          
    frame #6: 0x00000001000c6a48 ThreadSafetyTests`testing::TestSuite::Run() + 280                         
    frame #7: 0x00000001000d3714 ThreadSafetyTests`testing::internal::UnitTestImpl::RunAllTests() + 1740   
    frame #8: 0x00000001000d2fd0 ThreadSafetyTests`testing::UnitTest::Run() + 124                          
    frame #9: 0x00000001000c106c ThreadSafetyTests`main + 148                                              
    frame #10: 0x00000001001b908c dyld`start + 520  

Thread 2:

  thread #2                                                                                                
    frame #0: 0x000000018d4b2270 libsystem_kernel.dylib`__psynch_cvwait + 8                                
    frame #1: 0x000000018d4ec83c libsystem_pthread.dylib`_pthread_cond_wait + 1236                         
    frame #2: 0x000000018d43b284 libc++.1.dylib`std::__1::condition_variable::wait(std::__1::unique_lock<st
d::__1::mutex>&) + 28                                                                                      
    frame #3: 0x000000018d43df2c libc++.1.dylib`std::__1::__shared_mutex_base::lock_shared() + 92          
    frame #4: 0x000000010007c370 ThreadSafetyTests`sycl::_V1::detail::Scheduler::releaseHostAccessor(sycl::
_V1::detail::AccessorImplHost*) + 52                                                                       
    frame #5: 0x000000010000b67c ThreadSafetyTests`sycl::_V1::detail::AccessorImplHost::~AccessorImplHost()
 + 44                                                                                                      
    frame #6: 0x0000000100089fd4 ThreadSafetyTests`std::__1::__shared_ptr_pointer<sycl::_V1::detail::Access
orImplHost*, std::__1::shared_ptr<sycl::_V1::detail::AccessorImplHost>::__shared_ptr_default_delete<sycl::_
V1::detail::AccessorImplHost, sycl::_V1::detail::AccessorImplHost>, std::__1::allocator<sycl::_V1::detail::
AccessorImplHost> >::__on_zero_shared() + 20                                                               
    frame #7: 0x0000000100005928 ThreadSafetyTests`void* std::__1::__thread_proxy<std::__1::tuple<std::__1:
:unique_ptr<std::__1::__thread_struct, std::__1::default_delete<std::__1::__thread_struct> >, (anonymous na
mespace)::HostAccessorDeadLockTest_CheckThreadOrder_Test::TestBody()::$_0, unsigned long> >(void*) + 392   
    frame #8: 0x000000018d4ec26c libsystem_pthread.dylib`_pthread_start + 148 

Thread 3:

  thread #3                                                                                                
    frame #0: 0x000000018d4b2270 libsystem_kernel.dylib`__psynch_cvwait + 8                                
    frame #1: 0x000000018d4ec83c libsystem_pthread.dylib`_pthread_cond_wait + 1236                         
    frame #2: 0x000000018d43b284 libc++.1.dylib`std::__1::condition_variable::wait(std::__1::unique_lock<st
d::__1::mutex>&) + 28                                                                                      
    frame #3: 0x000000018d43de18 libc++.1.dylib`std::__1::__shared_mutex_base::lock() + 100                
    frame #4: 0x000000010007c0d0 ThreadSafetyTests`sycl::_V1::detail::Scheduler::addHostAccessor(sycl::_V1:
:detail::AccessorImplHost*) + 60                                                                           
    frame #5: 0x000000010000b748 ThreadSafetyTests`sycl::_V1::detail::addHostAccessorAndWait(sycl::_V1::det
ail::AccessorImplHost*) + 36                                                                               
    frame #6: 0x0000000100005cd4 ThreadSafetyTests`sycl::_V1::accessor<unsigned long, 1, (sycl::_V1::access
::mode)1026, (sycl::_V1::access::target)2018, (sycl::_V1::access::placeholder)0, sycl::_V1::ext::oneapi::ac
cessor_property_list<> >::accessor<unsigned long, 1, sycl::_V1::detail::aligned_allocator<unsigned long>, v
oid>(sycl::_V1::buffer<unsigned long, 1, sycl::_V1::detail::aligned_allocator<unsigned long>, std::__1::ena
ble_if<(((1) > (0))) && ((1) <= (3)), void>::type>&, sycl::_V1::property_list const&, sycl::_V1::detail::co
de_location) + 488                                                                                         
    frame #7: 0x0000000100005a4c ThreadSafetyTests`sycl::_V1::accessor<unsigned long, 1, (sycl::_V1::access
::mode)1026, (sycl::_V1::access::target)2018, (sycl::_V1::access::placeholder)0, sycl::_V1::ext::oneapi::ac
cessor_property_list<> > sycl::_V1::buffer<unsigned long, 1, sycl::_V1::detail::aligned_allocator<unsigned 
long>, void>::get_access<(sycl::_V1::access::mode)1026>(sycl::_V1::detail::code_location) + 64             
    frame #8: 0x0000000100005888 ThreadSafetyTests`void* std::__1::__thread_proxy<std::__1::tuple<std::__1:
:unique_ptr<std::__1::__thread_struct, std::__1::default_delete<std::__1::__thread_struct> >, (anonymous na
mespace)::HostAccessorDeadLockTest_CheckThreadOrder_Test::TestBody()::$_0, unsigned long> >(void*) + 232   
    frame #9: 0x000000018d4ec26c libsystem_pthread.dylib`_pthread_start + 148 

Thread 4:

  thread #4                                                                                                
    frame #0: 0x0000000100060508 ThreadSafetyTests`sycl::_V1::detail::Command::enqueue(sycl::_V1::detail::E
nqueueResultT&, sycl::_V1::detail::BlockingT, std::__1::vector<sycl::_V1::detail::Command*, std::__1::alloc
ator<sycl::_V1::detail::Command*> >&) + 476                                                                
    frame #1: 0x000000010007e900 ThreadSafetyTests`sycl::_V1::detail::Scheduler::GraphProcessor::enqueueCom
mand(sycl::_V1::detail::Command*, sycl::_V1::detail::EnqueueResultT&, std::__1::vector<sycl::_V1::detail::C
ommand*, std::__1::allocator<sycl::_V1::detail::Command*> >&, sycl::_V1::detail::BlockingT) + 272          
    frame #2: 0x000000010007e704 ThreadSafetyTests`sycl::_V1::detail::Scheduler::GraphProcessor::waitForEve
nt(std::__1::shared_ptr<sycl::_V1::detail::event_impl>, std::__1::shared_lock<std::__1::shared_timed_mutex>
&, std::__1::vector<sycl::_V1::detail::Command*, std::__1::allocator<sycl::_V1::detail::Command*> >&, bool)
 + 76                                                                                                      
    frame #3: 0x000000010007bbc0 ThreadSafetyTests`sycl::_V1::detail::Scheduler::waitForEvent(std::__1::sha
red_ptr<sycl::_V1::detail::event_impl>) + 84                                                               
    frame #4: 0x0000000100035d88 ThreadSafetyTests`sycl::_V1::detail::event_impl::wait(std::__1::shared_ptr
<sycl::_V1::detail::event_impl>) + 476                                                                     
    frame #5: 0x000000010000b768 ThreadSafetyTests`sycl::_V1::detail::addHostAccessorAndWait(sycl::_V1::det
ail::AccessorImplHost*) + 68                                                                               
    frame #6: 0x0000000100005cd4 ThreadSafetyTests`sycl::_V1::accessor<unsigned long, 1, (sycl::_V1::access
::mode)1026, (sycl::_V1::access::target)2018, (sycl::_V1::access::placeholder)0, sycl::_V1::ext::oneapi::ac
cessor_property_list<> >::accessor<unsigned long, 1, sycl::_V1::detail::aligned_allocator<unsigned long>, v
oid>(sycl::_V1::buffer<unsigned long, 1, sycl::_V1::detail::aligned_allocator<unsigned long>, std::__1::ena
ble_if<(((1) > (0))) && ((1) <= (3)), void>::type>&, sycl::_V1::property_list const&, sycl::_V1::detail::co
de_location) + 488                                                                                         
    frame #7: 0x0000000100005a4c ThreadSafetyTests`sycl::_V1::accessor<unsigned long, 1, (sycl::_V1::access
::mode)1026, (sycl::_V1::access::target)2018, (sycl::_V1::access::placeholder)0, sycl::_V1::ext::oneapi::ac
cessor_property_list<> > sycl::_V1::buffer<unsigned long, 1, sycl::_V1::detail::aligned_allocator<unsigned 
long>, void>::get_access<(sycl::_V1::access::mode)1026>(sycl::_V1::detail::code_location) + 64             
    frame #8: 0x0000000100005888 ThreadSafetyTests`void* std::__1::__thread_proxy<std::__1::tuple<std::__1:
:unique_ptr<std::__1::__thread_struct, std::__1::default_delete<std::__1::__thread_struct> >, (anonymous na
mespace)::HostAccessorDeadLockTest_CheckThreadOrder_Test::TestBody()::$_0, unsigned long> >(void*) + 232   
    frame #9: 0x000000018d4ec26c libsystem_pthread.dylib`_pthread_start + 148  

So it looks like at least two of those threads are blocked on the following locks:

ReadLockT ReadLock(MGraphLock);

ReadLockT Lock(MGraphLock);

We already had this problem reported and fixed in #889, but the lock was returned back (with additional changes) in #1597 and it was later refactored to operated on separate read/write locks in #2292

Metadata

Metadata

Labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions