-
Notifications
You must be signed in to change notification settings - Fork 701
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix: In Reactor Add A Mutex To Protect HandlerList Internal Array #1229
Conversation
Prompted by @tyler92's suggested PR#1228 "fix: data race on client/server shutdown": Looking at reactor.cc HandlerList, it does not attempt any multithread protection for its underlying std::array. In this experimental branch, we have added a mutex to HandlerList, and required that HandlerList members that access the array claim that mutex. Other related changes: Removed HandlerList default move constructors; mutex has a deleted move constructor. Removed HandlerList::clone(), an unused method of HandlerList that required a move constructor. Added a non-const form of "at" in HandlerList; further explanation in the comments. Guard the mutex for HandlerList::add, removeAll, at, and forEachHandler
src/common/reactor.cc
Outdated
std::shared_ptr<Handler> at(size_t index) const | ||
{ | ||
std::shared_ptr<Handler> at(size_t index) | ||
{ // It's const, except that the mutex is locked and unlocked |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@dgreatwood, if you want to guarantee to the caller that the method is logically const, you could try leaving it with a const
qualifier, but setting the mutex as mutable
.
src/common/reactor.cc
Outdated
@@ -412,15 +412,18 @@ namespace Pistache::Aio | |||
} | |||
|
|||
template <typename Func> | |||
void forEachHandler(Func func) const | |||
{ | |||
void forEachHandler(Func func) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same thing here. Up to you, but you could optionally keep the const
qualifier.
Thanks @dgreatwood. I've added a minor thought. Otherwise it looks fine to me, notwithstanding any feedback @Tachi107 might have. |
Good point.
Let's see what @tyler92 <https://github.com/tyler92> makes of it - if it
fixes his issue. Then we can make any final changes. I have run it on macOS
and Linux, but I should exercise it a bit more before we accept it as well.
Best.
…On Tue, Aug 20, 2024 at 4:35 PM Kip ***@***.***> wrote:
Thanks @dgreatwood <https://github.com/dgreatwood>. I've added a minor
thought. Otherwise it looks fine to me, notwithstanding any feedback
@Tachi107 <https://github.com/Tachi107> might have.
—
Reply to this email directly, view it on GitHub
<#1229 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAFMA23FNSEVNKY3QO26PVLZSPHCVAVCNFSM6AAAAABM23IYVWVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDEOJZHEZTAMRQGY>
.
You are receiving this because you were mentioned.Message ID:
***@***.***>
--
NOTICE: This email and its attachments may contain privileged and
confidential information, only for the viewing and use of the intended
recipient. If you are not the intended recipient, you are hereby notified
that any disclosure, copying, distribution, acting upon, or use of the
information contained in this email and its attachments is strictly
prohibited and that this email and its attachments must be immediately
returned to the sender and deleted from your system. If you received this
email erroneously, please notify the sender immediately. Xage Security,
Inc. and its affiliates will never request personal information (e.g.,
passwords, Social Security numbers) via email. Report suspicious emails to
***@***.*** ***@***.***>
|
@dgreatwood Thank you, I will check your branch and let you know |
src/common/reactor.cc
Outdated
// is const in that the class instance is in the same state | ||
// after the call vs. before (mutex is unlocked again before | ||
// non-const "at" exits) - so this is OK. | ||
return(((HandlerList *)this)->at(index)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please avoid using C-style casts. In this case I think const_cast
would be more appropriate. Also, you could consider @kiplingw's suggestion; making a mutex mutable is a common pattern.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good eye @Tachi107. I missed that.
@dgreatwood ASAN is happy with your branch. Checking it with TSAN will be more correct, but I think the fix works for the original issue. |
Btw, probably someday we can try to enable TSAN here https://github.com/pistacheio/pistache/blob/master/.github/workflows/linux.yaml#L30 to verify if everything is OK |
Probably there is a deadlock between two mutexes - |
OK. I'll look for it once you have it :-)
…On Wed, Aug 21, 2024 at 12:52 PM tyler92 ***@***.***> wrote:
Probably there is a deadlock between two mutexes - handlers_arr_mutex_
and reg_unreg_mutex_. I don't have any evidence yet, probably later
—
Reply to this email directly, view it on GitHub
<#1229 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAFMA2527NOID42TELPD2OTZSTVXJAVCNFSM6AAAAABM23IYVWVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDGMBSHA4TSMZSHE>
.
You are receiving this because you were mentioned.Message ID:
***@***.***>
--
NOTICE: This email and its attachments may contain privileged and
confidential information, only for the viewing and use of the intended
recipient. If you are not the intended recipient, you are hereby notified
that any disclosure, copying, distribution, acting upon, or use of the
information contained in this email and its attachments is strictly
prohibited and that this email and its attachments must be immediately
returned to the sender and deleted from your system. If you received this
email erroneously, please notify the sender immediately. Xage Security,
Inc. and its affiliates will never request personal information (e.g.,
passwords, Social Security numbers) via email. Report suspicious emails to
***@***.*** ***@***.***>
|
Added debug logging for poller.reg_unreg_mutex_ using the GUARD_AND_DBG_LOG macro. Made "std::mutex handlers_arr_mutex_" mutable for easier const function handling.
Hi @tyler92 <https://github.com/tyler92> -
I have made a small update to the branch for debug logging.
https://github.com/dgreatwood/pistachefork/tree/HandlerListMultiThread
Presuming you're on Linux, you can use the convenience
script bldscripts/mesbuilddebug.sh to create a debug build, or you can
look at that script to see which options to set for creating a debug build.
If you're interested, once you have the debug pistache installed/running,
if you can generate a deadlock case, you can look in syslog (typically
/var/log/syslog of course) and you'll see entries for the locking and
unlocking that contain these strings respectively as follows (for
"reg_unreg_mutex", note there is no dot and no trailing underscore):
1. poller_reg_unreg_mutex
2. handlers_arr_mutex_
If you have a deadlock between the two mutexes, you should be able to see
one thread where they're being locked in one order, and another thread
where they're being locked in the opposite order.
Or you can just send me the log :-)
In any case, when I have a little more time I will review the code for a
possible deadlock.
Thanks once more.
On Wed, Aug 21, 2024 at 4:21 PM Duncan Greatwood ***@***.***>
wrote:
… OK. I'll look for it once you have it :-)
On Wed, Aug 21, 2024 at 12:52 PM tyler92 ***@***.***> wrote:
> Probably there is a deadlock between two mutexes - handlers_arr_mutex_
> and reg_unreg_mutex_. I don't have any evidence yet, probably later
>
> —
> Reply to this email directly, view it on GitHub
> <#1229 (comment)>,
> or unsubscribe
> <https://github.com/notifications/unsubscribe-auth/AAFMA2527NOID42TELPD2OTZSTVXJAVCNFSM6AAAAABM23IYVWVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDGMBSHA4TSMZSHE>
> .
> You are receiving this because you were mentioned.Message ID:
> ***@***.***>
>
--
NOTICE: This email and its attachments may contain privileged and
confidential information, only for the viewing and use of the intended
recipient. If you are not the intended recipient, you are hereby notified
that any disclosure, copying, distribution, acting upon, or use of the
information contained in this email and its attachments is strictly
prohibited and that this email and its attachments must be immediately
returned to the sender and deleted from your system. If you received this
email erroneously, please notify the sender immediately. Xage Security,
Inc. and its affiliates will never request personal information (e.g.,
passwords, Social Security numbers) via email. Report suspicious emails to
***@***.*** ***@***.***>
|
Hi @dgreatwood thank you for the information, I have TSAN report about the deadlock which is quite clear I think:
Just FYI: this is a fuzzing test (you can see it on the bottom of the stacks) but it doesn't matter right now. |
Very good. I’m out for the next few days but will review this on my return.
Shouldn’t be too difficult 🤞
…On Thu, Aug 22, 2024 at 12:07 PM tyler92 ***@***.***> wrote:
Hi @dgreatwood <https://github.com/dgreatwood> thank you for the
information, I have TSAN report about the deadlock which is quite clear I
think:
WARNING: ThreadSanitizer: lock-order-inversion (potential deadlock) (pid=2069737)
Cycle in lock order graph: M0 (0x7b8400002438) => M1 (0x7b8400002408) => M0
Mutex M1 acquired here while holding mutex M0 in thread T4:
#0 pthread_mutex_lock <null> (fuzz_server+0x199b8b) (BuildId: dfa875f2a84736d29effb9b4566398b47d5efdbb)
#1 __gthread_mutex_lock(pthread_mutex_t*) /usr/bin/../lib/gcc/x86_64-linux-gnu/12/../../../../include/x86_64-linux-gnu/c++/12/bits/gthr-default.h:749:12 (fuzz_server+0x320130) (BuildId: dfa875f2a84736d29effb9b4566398b47d5efdbb)
#2 std::mutex::lock() /usr/bin/../lib/gcc/x86_64-linux-gnu/12/../../../../include/c++/12/bits/std_mutex.h:100:17 (fuzz_server+0x31ff78) (BuildId: dfa875f2a84736d29effb9b4566398b47d5efdbb)
#3 std::lock_guard<std::mutex>::lock_guard(std::mutex&) /usr/bin/../lib/gcc/x86_64-linux-gnu/12/../../../../include/c++/12/bits/std_mutex.h:229:19 (fuzz_server+0x431d3c) (BuildId: dfa875f2a84736d29effb9b4566398b47d5efdbb)
#4 Pistache::Aio::SyncImpl::HandlerList::at(unsigned long) /pistache/src/common/reactor.cc:364:17 (fuzz_server+0x6255ba) (BuildId: dfa875f2a84736d29effb9b4566398b47d5efdbb)
#5 Pistache::Aio::SyncImpl::HandlerList::at(unsigned long) const /pistache/src/common/reactor.cc:377:47 (fuzz_server+0x6252e8) (BuildId: dfa875f2a84736d29effb9b4566398b47d5efdbb)
#6 Pistache::Aio::SyncImpl::handleFds(std::vector<Pistache::Polling::Event, std::allocator<Pistache::Polling::Event>>) const /pistache/src/common/reactor.cc:289:27 (fuzz_server+0x625d9b) (BuildId: dfa875f2a84736d29effb9b4566398b47d5efdbb)
#7 Pistache::Aio::SyncImpl::runOnce() /pistache/src/common/reactor.cc:245:25 (fuzz_server+0x620036) (BuildId: dfa875f2a84736d29effb9b4566398b47d5efdbb)
#8 Pistache::Aio::SyncImpl::run() /pistache/src/common/reactor.cc:258:17 (fuzz_server+0x6203a8) (BuildId: dfa875f2a84736d29effb9b4566398b47d5efdbb)
#9 Pistache::Aio::AsyncImpl::Worker::run()::'lambda'()::operator()() const /pistache/src/common/reactor.cc:662:27 (fuzz_server+0x6415e8) (BuildId: dfa875f2a84736d29effb9b4566398b47d5efdbb)
#10 void std::__invoke_impl<void, Pistache::Aio::AsyncImpl::Worker::run()::'lambda'()>(std::__invoke_other, Pistache::Aio::AsyncImpl::Worker::run()::'lambda'()&&) /usr/bin/../lib/gcc/x86_64-linux-gnu/12/../../../../include/c++/12/bits/invoke.h:61:14 (fuzz_server+0x6413e8) (BuildId: dfa875f2a84736d29effb9b4566398b47d5efdbb)
#11 std::__invoke_result<Pistache::Aio::AsyncImpl::Worker::run()::'lambda'()>::type std::__invoke<Pistache::Aio::AsyncImpl::Worker::run()::'lambda'()>(Pistache::Aio::AsyncImpl::Worker::run()::'lambda'()&&) /usr/bin/../lib/gcc/x86_64-linux-gnu/12/../../../../include/c++/12/bits/invoke.h:96:14 (fuzz_server+0x641238) (BuildId: dfa875f2a84736d29effb9b4566398b47d5efdbb)
#12 void std::thread::_Invoker<std::tuple<Pistache::Aio::AsyncImpl::Worker::run()::'lambda'()>>::_M_invoke<0ul>(std::_Index_tuple<0ul>) /usr/bin/../lib/gcc/x86_64-linux-gnu/12/../../../../include/c++/12/bits/std_thread.h:279:13 (fuzz_server+0x641150) (BuildId: dfa875f2a84736d29effb9b4566398b47d5efdbb)
#13 std::thread::_Invoker<std::tuple<Pistache::Aio::AsyncImpl::Worker::run()::'lambda'()>>::operator()() /usr/bin/../lib/gcc/x86_64-linux-gnu/12/../../../../include/c++/12/bits/std_thread.h:286:11 (fuzz_server+0x641068) (BuildId: dfa875f2a84736d29effb9b4566398b47d5efdbb)
#14 std::thread::_State_impl<std::thread::_Invoker<std::tuple<Pistache::Aio::AsyncImpl::Worker::run()::'lambda'()>>>::_M_run() /usr/bin/../lib/gcc/x86_64-linux-gnu/12/../../../../include/c++/12/bits/std_thread.h:231:13 (fuzz_server+0x640cdc) (BuildId: dfa875f2a84736d29effb9b4566398b47d5efdbb)
#15 <null> <null> (libstdc++.so.6+0xdc252) (BuildId: e37fe1a879783838de78cbc8c80621fa685d58a2)
Hint: use TSAN_OPTIONS=second_deadlock_stack=1 to get more informative warning message
Mutex M0 acquired here while holding mutex M1 in main thread:
#0 pthread_mutex_lock <null> (fuzz_server+0x199b8b) (BuildId: dfa875f2a84736d29effb9b4566398b47d5efdbb)
#1 __gthread_mutex_lock(pthread_mutex_t*) /usr/bin/../lib/gcc/x86_64-linux-gnu/12/../../../../include/x86_64-linux-gnu/c++/12/bits/gthr-default.h:749:12 (fuzz_server+0x320130) (BuildId: dfa875f2a84736d29effb9b4566398b47d5efdbb)
#2 std::mutex::lock() /usr/bin/../lib/gcc/x86_64-linux-gnu/12/../../../../include/c++/12/bits/std_mutex.h:100:17 (fuzz_server+0x31ff78) (BuildId: dfa875f2a84736d29effb9b4566398b47d5efdbb)
#3 std::lock_guard<std::mutex>::lock_guard(std::mutex&) /usr/bin/../lib/gcc/x86_64-linux-gnu/12/../../../../include/c++/12/bits/std_mutex.h:229:19 (fuzz_server+0x431d3c) (BuildId: dfa875f2a84736d29effb9b4566398b47d5efdbb)
#4 Pistache::Aio::SyncImpl::detachFromReactor(std::shared_ptr<Pistache::Aio::Handler> const&) /pistache/src/common/reactor.cc:122:41 (fuzz_server+0x61f046) (BuildId: dfa875f2a84736d29effb9b4566398b47d5efdbb)
#5 Pistache::Aio::SyncImpl::detachAndRemoveAllHandlers()::'lambda'(std::shared_ptr<Pistache::Aio::Handler>)::operator()(std::shared_ptr<Pistache::Aio::Handler>) const /pistache/src/common/reactor.cc:135:17 (fuzz_server+0x622acf) (BuildId: dfa875f2a84736d29effb9b4566398b47d5efdbb)
#6 void Pistache::Aio::SyncImpl::HandlerList::forEachHandler<Pistache::Aio::SyncImpl::detachAndRemoveAllHandlers()::'lambda'(std::shared_ptr<Pistache::Aio::Handler>)>(Pistache::Aio::SyncImpl::detachAndRemoveAllHandlers()::'lambda'(std::shared_ptr<Pistache::Aio::Handler>)) /pistache/src/common/reactor.cc:420:21 (fuzz_server+0x6225e2) (BuildId: dfa875f2a84736d29effb9b4566398b47d5efdbb)
#7 Pistache::Aio::SyncImpl::detachAndRemoveAllHandlers() /pistache/src/common/reactor.cc:133:23 (fuzz_server+0x61f27b) (BuildId: dfa875f2a84736d29effb9b4566398b47d5efdbb)
#8 Pistache::Aio::AsyncImpl::detachAndRemoveAllHandlers() /pistache/src/common/reactor.cc:520:28 (fuzz_server+0x636a3d) (BuildId: dfa875f2a84736d29effb9b4566398b47d5efdbb)
#9 Pistache::Aio::Reactor::detachAndRemoveAllHandlers() /pistache/src/common/reactor.cc:732:21 (fuzz_server+0x61aac2) (BuildId: dfa875f2a84736d29effb9b4566398b47d5efdbb)
#10 Pistache::Aio::Reactor::~Reactor() /pistache/src/common/reactor.cc:691:9 (fuzz_server+0x61a8cc) (BuildId: dfa875f2a84736d29effb9b4566398b47d5efdbb)
#11 void std::_Destroy<Pistache::Aio::Reactor>(Pistache::Aio::Reactor*) /usr/bin/../lib/gcc/x86_64-linux-gnu/12/../../../../include/c++/12/bits/stl_construct.h:151:19 (fuzz_server+0x4cca08) (BuildId: dfa875f2a84736d29effb9b4566398b47d5efdbb)
#12 void std::allocator_traits<std::allocator<void>>::destroy<Pistache::Aio::Reactor>(std::allocator<void>&, Pistache::Aio::Reactor*) /usr/bin/../lib/gcc/x86_64-linux-gnu/12/../../../../include/c++/12/bits/alloc_traits.h:648:4 (fuzz_server+0x4cc7f4) (BuildId: dfa875f2a84736d29effb9b4566398b47d5efdbb)
#13 std::_Sp_counted_ptr_inplace<Pistache::Aio::Reactor, std::allocator<void>, (__gnu_cxx::_Lock_policy)2>::_M_dispose() /usr/bin/../lib/gcc/x86_64-linux-gnu/12/../../../../include/c++/12/bits/shared_ptr_base.h:613:2 (fuzz_server+0x4cc359) (BuildId: dfa875f2a84736d29effb9b4566398b47d5efdbb)
#14 std::_Sp_counted_base<(__gnu_cxx::_Lock_policy)2>::_M_release_last_use() /usr/bin/../lib/gcc/x86_64-linux-gnu/12/../../../../include/c++/12/bits/shared_ptr_base.h:175:2 (fuzz_server+0x22b210) (BuildId: dfa875f2a84736d29effb9b4566398b47d5efdbb)
#15 std::_Sp_counted_base<(__gnu_cxx::_Lock_policy)2>::_M_release() /usr/bin/../lib/gcc/x86_64-linux-gnu/12/../../../../include/c++/12/bits/shared_ptr_base.h:361:4 (fuzz_server+0x22b12f) (BuildId: dfa875f2a84736d29effb9b4566398b47d5efdbb)
#16 std::__shared_count<(__gnu_cxx::_Lock_policy)2>::~__shared_count() /usr/bin/../lib/gcc/x86_64-linux-gnu/12/../../../../include/c++/12/bits/shared_ptr_base.h:1071:11 (fuzz_server+0x22ae13) (BuildId: dfa875f2a84736d29effb9b4566398b47d5efdbb)
#17 std::__shared_ptr<Pistache::Aio::Reactor, (__gnu_cxx::_Lock_policy)2>::~__shared_ptr() /usr/bin/../lib/gcc/x86_64-linux-gnu/12/../../../../include/c++/12/bits/shared_ptr_base.h:1524:31 (fuzz_server+0x4cd76c) (BuildId: dfa875f2a84736d29effb9b4566398b47d5efdbb)
#18 std::shared_ptr<Pistache::Aio::Reactor>::~shared_ptr() /usr/bin/../lib/gcc/x86_64-linux-gnu/12/../../../../include/c++/12/bits/shared_ptr.h:175:11 (fuzz_server+0x4b3c78) (BuildId: dfa875f2a84736d29effb9b4566398b47d5efdbb)
#19 Pistache::Tcp::Listener::~Listener() /pistache/src/server/listener.cc:274:5 (fuzz_server+0x4b474a) (BuildId: dfa875f2a84736d29effb9b4566398b47d5efdbb)
#20 Pistache::Http::Endpoint::~Endpoint() /pistache/src/server/endpoint.cc:360:41 (fuzz_server+0x4a0dd2) (BuildId: dfa875f2a84736d29effb9b4566398b47d5efdbb)
#21 LLVMFuzzerTestOneInput /pistache/build/fuzzing/../../tests/fuzzers/fuzz_server.cpp:64:1 (fuzz_server+0x221d5f) (BuildId: dfa875f2a84736d29effb9b4566398b47d5efdbb)
#22 fuzzer::Fuzzer::ExecuteCallback(unsigned char const*, unsigned long) <null> (fuzz_server+0x14f4e0) (BuildId: dfa875f2a84736d29effb9b4566398b47d5efdbb)
Just FYI: this is a fuzzing test (you can see it on the bottom of the
stacks) but it doesn't matter right now.
—
Reply to this email directly, view it on GitHub
<#1229 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAFMA27ADHGRGCWGEITK23LZSYZGBAVCNFSM6AAAAABM23IYVWVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDGMBVGQ2DOOBXHE>
.
You are receiving this because you were mentioned.Message ID:
***@***.***>
--
NOTICE: This email and its attachments may contain privileged and
confidential information, only for the viewing and use of the intended
recipient. If you are not the intended recipient, you are hereby notified
that any disclosure, copying, distribution, acting upon, or use of the
information contained in this email and its attachments is strictly
prohibited and that this email and its attachments must be immediately
returned to the sender and deleted from your system. If you received this
email erroneously, please notify the sender immediately. Xage Security,
Inc. and its affiliates will never request personal information (e.g.,
passwords, Social Security numbers) via email. Report suspicious emails to
***@***.*** ***@***.***>
|
…_ Instead Reason for Reversion ==================== handlers_arr_mutex_ and poller.reg_unreg_mutex_ were getting into a deadlock with each other. - In Reactor::~Reactor(), HandlerList::forEachHandler (called from detachAndRemoveAllHandlers) would lock handlers_arr_mutex_, and then detachFromReactor would lock poller.reg_unreg_mutex_. - Meanwhile, SyncImpl::runOnce() locks poller.reg_unreg_mutex_ and then calls SyncImpl::handleFds which then calls HandlerList::at, locking handlers_arr_mutex_. Since the two locks are in opposite order, of course we have a deadlock if they happen simultaneously. Further review suggests that handlers_arr_mutex_ can be removed, provided we use poller.reg_unreg_mutex_ to guard the dangerous accesses to the handlers_ list. That's the strategy we've implemented on this commit. Specifics ========= handlers_arr_mutex_ removed In os.h, reg_unreg_mutex_ made mutable so it can be locked in const functions. In reactor.cc: In SyncImpl::addHandler we lock poller.reg_unreg_mutex_ (addHandler calls handlers_.add, which of course modifies handlers_ and so requires protection). In SyncImpl::detachFromReactor, remove lock of poller.reg_unreg_mutex_. We are now locking poller.reg_unreg_mutex_ in detachAndRemoveAllHandlers instead, detachAndRemoveAllHandlers being the function that calls detachFromReactor. As above, in detachAndRemoveAllHandlers, lock poller.reg_unreg_mutex_. reg_unreg_mutex_ must be locked here since detachAndRemoveAllHandlers calls handlers_.forEachHandler, which requires handlers_ to be protected. Remove mutex locks from HandlerList functions "add", "removeAll", "at", and "forEachHandler". Also removed the HandlerList function "operator[]" (a synonym of "at") since it is not used and, I found, it made it harder to browse the code.
Hi @tyler92 <https://github.com/tyler92> -
I have made a further update to the branch.
https://github.com/dgreatwood/pistachefork/tree/HandlerListMultiThread
Thanks for your TSAN report, which as you said was quite clear.
You can see the reasoning of this new change in the commit notice:
382de53
Since I have removed handlers_arr_mutex_, I have presumably removed the
deadlock between handlers_arr_mutex_ and poller.reg_unreg_mutex_.
More interesting will be if your originally reported issue (per PR #1228
<#1228>) is fixed, again without
using your "join" code.
**** Could you please try it again?*
Also, please keep an eye out for any newly introduced deadlock.
Thanks once more!
On Thu, Aug 22, 2024 at 2:24 PM Duncan Greatwood ***@***.***>
wrote:
… Very good. I’m out for the next few days but will review this on my
return. Shouldn’t be too difficult 🤞
On Thu, Aug 22, 2024 at 12:07 PM tyler92 ***@***.***> wrote:
> Hi @dgreatwood <https://github.com/dgreatwood> thank you for the
> information, I have TSAN report about the deadlock which is quite clear I
> think:
>
> WARNING: ThreadSanitizer: lock-order-inversion (potential deadlock) (pid=2069737)
> Cycle in lock order graph: M0 (0x7b8400002438) => M1 (0x7b8400002408) => M0
>
> Mutex M1 acquired here while holding mutex M0 in thread T4:
> #0 pthread_mutex_lock <null> (fuzz_server+0x199b8b) (BuildId: dfa875f2a84736d29effb9b4566398b47d5efdbb)
> #1 __gthread_mutex_lock(pthread_mutex_t*) /usr/bin/../lib/gcc/x86_64-linux-gnu/12/../../../../include/x86_64-linux-gnu/c++/12/bits/gthr-default.h:749:12 (fuzz_server+0x320130) (BuildId: dfa875f2a84736d29effb9b4566398b47d5efdbb)
> #2 std::mutex::lock() /usr/bin/../lib/gcc/x86_64-linux-gnu/12/../../../../include/c++/12/bits/std_mutex.h:100:17 (fuzz_server+0x31ff78) (BuildId: dfa875f2a84736d29effb9b4566398b47d5efdbb)
> #3 std::lock_guard<std::mutex>::lock_guard(std::mutex&) /usr/bin/../lib/gcc/x86_64-linux-gnu/12/../../../../include/c++/12/bits/std_mutex.h:229:19 (fuzz_server+0x431d3c) (BuildId: dfa875f2a84736d29effb9b4566398b47d5efdbb)
> #4 Pistache::Aio::SyncImpl::HandlerList::at(unsigned long) /pistache/src/common/reactor.cc:364:17 (fuzz_server+0x6255ba) (BuildId: dfa875f2a84736d29effb9b4566398b47d5efdbb)
> #5 Pistache::Aio::SyncImpl::HandlerList::at(unsigned long) const /pistache/src/common/reactor.cc:377:47 (fuzz_server+0x6252e8) (BuildId: dfa875f2a84736d29effb9b4566398b47d5efdbb)
> #6 Pistache::Aio::SyncImpl::handleFds(std::vector<Pistache::Polling::Event, std::allocator<Pistache::Polling::Event>>) const /pistache/src/common/reactor.cc:289:27 (fuzz_server+0x625d9b) (BuildId: dfa875f2a84736d29effb9b4566398b47d5efdbb)
> #7 Pistache::Aio::SyncImpl::runOnce() /pistache/src/common/reactor.cc:245:25 (fuzz_server+0x620036) (BuildId: dfa875f2a84736d29effb9b4566398b47d5efdbb)
> #8 Pistache::Aio::SyncImpl::run() /pistache/src/common/reactor.cc:258:17 (fuzz_server+0x6203a8) (BuildId: dfa875f2a84736d29effb9b4566398b47d5efdbb)
> #9 Pistache::Aio::AsyncImpl::Worker::run()::'lambda'()::operator()() const /pistache/src/common/reactor.cc:662:27 (fuzz_server+0x6415e8) (BuildId: dfa875f2a84736d29effb9b4566398b47d5efdbb)
> #10 void std::__invoke_impl<void, Pistache::Aio::AsyncImpl::Worker::run()::'lambda'()>(std::__invoke_other, Pistache::Aio::AsyncImpl::Worker::run()::'lambda'()&&) /usr/bin/../lib/gcc/x86_64-linux-gnu/12/../../../../include/c++/12/bits/invoke.h:61:14 (fuzz_server+0x6413e8) (BuildId: dfa875f2a84736d29effb9b4566398b47d5efdbb)
> #11 std::__invoke_result<Pistache::Aio::AsyncImpl::Worker::run()::'lambda'()>::type std::__invoke<Pistache::Aio::AsyncImpl::Worker::run()::'lambda'()>(Pistache::Aio::AsyncImpl::Worker::run()::'lambda'()&&) /usr/bin/../lib/gcc/x86_64-linux-gnu/12/../../../../include/c++/12/bits/invoke.h:96:14 (fuzz_server+0x641238) (BuildId: dfa875f2a84736d29effb9b4566398b47d5efdbb)
> #12 void std::thread::_Invoker<std::tuple<Pistache::Aio::AsyncImpl::Worker::run()::'lambda'()>>::_M_invoke<0ul>(std::_Index_tuple<0ul>) /usr/bin/../lib/gcc/x86_64-linux-gnu/12/../../../../include/c++/12/bits/std_thread.h:279:13 (fuzz_server+0x641150) (BuildId: dfa875f2a84736d29effb9b4566398b47d5efdbb)
> #13 std::thread::_Invoker<std::tuple<Pistache::Aio::AsyncImpl::Worker::run()::'lambda'()>>::operator()() /usr/bin/../lib/gcc/x86_64-linux-gnu/12/../../../../include/c++/12/bits/std_thread.h:286:11 (fuzz_server+0x641068) (BuildId: dfa875f2a84736d29effb9b4566398b47d5efdbb)
> #14 std::thread::_State_impl<std::thread::_Invoker<std::tuple<Pistache::Aio::AsyncImpl::Worker::run()::'lambda'()>>>::_M_run() /usr/bin/../lib/gcc/x86_64-linux-gnu/12/../../../../include/c++/12/bits/std_thread.h:231:13 (fuzz_server+0x640cdc) (BuildId: dfa875f2a84736d29effb9b4566398b47d5efdbb)
> #15 <null> <null> (libstdc++.so.6+0xdc252) (BuildId: e37fe1a879783838de78cbc8c80621fa685d58a2)
>
> Hint: use TSAN_OPTIONS=second_deadlock_stack=1 to get more informative warning message
>
> Mutex M0 acquired here while holding mutex M1 in main thread:
> #0 pthread_mutex_lock <null> (fuzz_server+0x199b8b) (BuildId: dfa875f2a84736d29effb9b4566398b47d5efdbb)
> #1 __gthread_mutex_lock(pthread_mutex_t*) /usr/bin/../lib/gcc/x86_64-linux-gnu/12/../../../../include/x86_64-linux-gnu/c++/12/bits/gthr-default.h:749:12 (fuzz_server+0x320130) (BuildId: dfa875f2a84736d29effb9b4566398b47d5efdbb)
> #2 std::mutex::lock() /usr/bin/../lib/gcc/x86_64-linux-gnu/12/../../../../include/c++/12/bits/std_mutex.h:100:17 (fuzz_server+0x31ff78) (BuildId: dfa875f2a84736d29effb9b4566398b47d5efdbb)
> #3 std::lock_guard<std::mutex>::lock_guard(std::mutex&) /usr/bin/../lib/gcc/x86_64-linux-gnu/12/../../../../include/c++/12/bits/std_mutex.h:229:19 (fuzz_server+0x431d3c) (BuildId: dfa875f2a84736d29effb9b4566398b47d5efdbb)
> #4 Pistache::Aio::SyncImpl::detachFromReactor(std::shared_ptr<Pistache::Aio::Handler> const&) /pistache/src/common/reactor.cc:122:41 (fuzz_server+0x61f046) (BuildId: dfa875f2a84736d29effb9b4566398b47d5efdbb)
> #5 Pistache::Aio::SyncImpl::detachAndRemoveAllHandlers()::'lambda'(std::shared_ptr<Pistache::Aio::Handler>)::operator()(std::shared_ptr<Pistache::Aio::Handler>) const /pistache/src/common/reactor.cc:135:17 (fuzz_server+0x622acf) (BuildId: dfa875f2a84736d29effb9b4566398b47d5efdbb)
> #6 void Pistache::Aio::SyncImpl::HandlerList::forEachHandler<Pistache::Aio::SyncImpl::detachAndRemoveAllHandlers()::'lambda'(std::shared_ptr<Pistache::Aio::Handler>)>(Pistache::Aio::SyncImpl::detachAndRemoveAllHandlers()::'lambda'(std::shared_ptr<Pistache::Aio::Handler>)) /pistache/src/common/reactor.cc:420:21 (fuzz_server+0x6225e2) (BuildId: dfa875f2a84736d29effb9b4566398b47d5efdbb)
> #7 Pistache::Aio::SyncImpl::detachAndRemoveAllHandlers() /pistache/src/common/reactor.cc:133:23 (fuzz_server+0x61f27b) (BuildId: dfa875f2a84736d29effb9b4566398b47d5efdbb)
> #8 Pistache::Aio::AsyncImpl::detachAndRemoveAllHandlers() /pistache/src/common/reactor.cc:520:28 (fuzz_server+0x636a3d) (BuildId: dfa875f2a84736d29effb9b4566398b47d5efdbb)
> #9 Pistache::Aio::Reactor::detachAndRemoveAllHandlers() /pistache/src/common/reactor.cc:732:21 (fuzz_server+0x61aac2) (BuildId: dfa875f2a84736d29effb9b4566398b47d5efdbb)
> #10 Pistache::Aio::Reactor::~Reactor() /pistache/src/common/reactor.cc:691:9 (fuzz_server+0x61a8cc) (BuildId: dfa875f2a84736d29effb9b4566398b47d5efdbb)
> #11 void std::_Destroy<Pistache::Aio::Reactor>(Pistache::Aio::Reactor*) /usr/bin/../lib/gcc/x86_64-linux-gnu/12/../../../../include/c++/12/bits/stl_construct.h:151:19 (fuzz_server+0x4cca08) (BuildId: dfa875f2a84736d29effb9b4566398b47d5efdbb)
> #12 void std::allocator_traits<std::allocator<void>>::destroy<Pistache::Aio::Reactor>(std::allocator<void>&, Pistache::Aio::Reactor*) /usr/bin/../lib/gcc/x86_64-linux-gnu/12/../../../../include/c++/12/bits/alloc_traits.h:648:4 (fuzz_server+0x4cc7f4) (BuildId: dfa875f2a84736d29effb9b4566398b47d5efdbb)
> #13 std::_Sp_counted_ptr_inplace<Pistache::Aio::Reactor, std::allocator<void>, (__gnu_cxx::_Lock_policy)2>::_M_dispose() /usr/bin/../lib/gcc/x86_64-linux-gnu/12/../../../../include/c++/12/bits/shared_ptr_base.h:613:2 (fuzz_server+0x4cc359) (BuildId: dfa875f2a84736d29effb9b4566398b47d5efdbb)
> #14 std::_Sp_counted_base<(__gnu_cxx::_Lock_policy)2>::_M_release_last_use() /usr/bin/../lib/gcc/x86_64-linux-gnu/12/../../../../include/c++/12/bits/shared_ptr_base.h:175:2 (fuzz_server+0x22b210) (BuildId: dfa875f2a84736d29effb9b4566398b47d5efdbb)
> #15 std::_Sp_counted_base<(__gnu_cxx::_Lock_policy)2>::_M_release() /usr/bin/../lib/gcc/x86_64-linux-gnu/12/../../../../include/c++/12/bits/shared_ptr_base.h:361:4 (fuzz_server+0x22b12f) (BuildId: dfa875f2a84736d29effb9b4566398b47d5efdbb)
> #16 std::__shared_count<(__gnu_cxx::_Lock_policy)2>::~__shared_count() /usr/bin/../lib/gcc/x86_64-linux-gnu/12/../../../../include/c++/12/bits/shared_ptr_base.h:1071:11 (fuzz_server+0x22ae13) (BuildId: dfa875f2a84736d29effb9b4566398b47d5efdbb)
> #17 std::__shared_ptr<Pistache::Aio::Reactor, (__gnu_cxx::_Lock_policy)2>::~__shared_ptr() /usr/bin/../lib/gcc/x86_64-linux-gnu/12/../../../../include/c++/12/bits/shared_ptr_base.h:1524:31 (fuzz_server+0x4cd76c) (BuildId: dfa875f2a84736d29effb9b4566398b47d5efdbb)
> #18 std::shared_ptr<Pistache::Aio::Reactor>::~shared_ptr() /usr/bin/../lib/gcc/x86_64-linux-gnu/12/../../../../include/c++/12/bits/shared_ptr.h:175:11 (fuzz_server+0x4b3c78) (BuildId: dfa875f2a84736d29effb9b4566398b47d5efdbb)
> #19 Pistache::Tcp::Listener::~Listener() /pistache/src/server/listener.cc:274:5 (fuzz_server+0x4b474a) (BuildId: dfa875f2a84736d29effb9b4566398b47d5efdbb)
> #20 Pistache::Http::Endpoint::~Endpoint() /pistache/src/server/endpoint.cc:360:41 (fuzz_server+0x4a0dd2) (BuildId: dfa875f2a84736d29effb9b4566398b47d5efdbb)
> #21 LLVMFuzzerTestOneInput /pistache/build/fuzzing/../../tests/fuzzers/fuzz_server.cpp:64:1 (fuzz_server+0x221d5f) (BuildId: dfa875f2a84736d29effb9b4566398b47d5efdbb)
> #22 fuzzer::Fuzzer::ExecuteCallback(unsigned char const*, unsigned long) <null> (fuzz_server+0x14f4e0) (BuildId: dfa875f2a84736d29effb9b4566398b47d5efdbb)
>
> Just FYI: this is a fuzzing test (you can see it on the bottom of the
> stacks) but it doesn't matter right now.
>
> —
> Reply to this email directly, view it on GitHub
> <#1229 (comment)>,
> or unsubscribe
> <https://github.com/notifications/unsubscribe-auth/AAFMA27ADHGRGCWGEITK23LZSYZGBAVCNFSM6AAAAABM23IYVWVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDGMBVGQ2DOOBXHE>
> .
> You are receiving this because you were mentioned.Message ID:
> ***@***.***>
>
--
NOTICE: This email and its attachments may contain privileged and
confidential information, only for the viewing and use of the intended
recipient. If you are not the intended recipient, you are hereby notified
that any disclosure, copying, distribution, acting upon, or use of the
information contained in this email and its attachments is strictly
prohibited and that this email and its attachments must be immediately
returned to the sender and deleted from your system. If you received this
email erroneously, please notify the sender immediately. Xage Security,
Inc. and its affiliates will never request personal information (e.g.,
passwords, Social Security numbers) via email. Report suspicious emails to
***@***.*** ***@***.***>
|
@dgreatwood Hi! I will try to check new commit nearest 1-2 days |
@dgreatwood now it is much better. ASAN and TSAN are much more happy than before. I have a couple of warnings from TSAN that are unrelated to the current issue/fix so I will report them as separate issues. |
EventMethEpollEquivImpl::ctlEx now locks interest_mutex_ when performing its add/mod/del actions. Previously, the interest_mutex_ lock was released before these add/mod/del actions were performed. This was confirmed to occasionally cause a "use after free" in findEmEventInAnInterestSet (called out of getReadyEmEvents) in another thread. It might also result in corruption of the interest_ std::set.
Until the 1.0 release, setting the soname to version_major.version_minor gives us more freedom to make changes without being too restricted by ABI compatibility. We never actually really commited to ABI stability before the 1.0 relase - the README states that at the bottom - but I made this decision clearer by updating the "versioning" section.
Prompted by @tyler92's suggested PR#1228 "fix: data race on client/server shutdown":
Looking at reactor.cc HandlerList, it does not attempt any multithread protection for its underlying std::array. In this experimental branch, we have added a mutex to HandlerList, and required that HandlerList members that access the array claim that mutex.
Other related changes:
Removed HandlerList default move constructors; mutex has a deleted move constructor.
Removed HandlerList::clone(), an unused method of HandlerList that required a move constructor.
Added a non-const form of "at" in HandlerList; further explanation in the comments.
Guard the mutex for HandlerList::add, removeAll, at, and forEachHandler