-
Notifications
You must be signed in to change notification settings - Fork 701
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix: data race on client/server shutdown #1228
Conversation
Hey @tyler92. Thanks for the PR. Question for you. Isn't |
Hi. Yes, you are right. So in my example, I launch the server with |
Might the solution then be to change the unit tests to use the blocking version instead? |
As a workaround - yes, it's possible. But we still have a problem with the current solution: how to stop the server after |
But isn't that the expected behaviour though? If the user requested to start the server asynchronously? |
If a user called But even in this scenario the user should be able to stop and destroy the server correctly, without data races and other memory issues. Right now it's now possible as far as I could see |
Ok, that makes sense. I'll wait to hear @Tachi107 and @dgreatwood's thoughts. |
OK. Just for clarification - I don't insist on my solution, probably there is a more correct way to achieve the same result |
I’ll try and take a look in next few days.
…On Sun, Aug 18, 2024 at 12:17 PM tyler92 ***@***.***> wrote:
OK. Just for clarification - I don't insist on my solution, probably there
is a more correct way to achieve the same result
—
Reply to this email directly, view it on GitHub
<#1228 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAFMA2ZLXFFCJIHWYKBDEHTZSDXM7AVCNFSM6AAAAABMWROXDGVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDEOJVGM3DEMRZGQ>
.
You are receiving this because you were mentioned.Message ID:
***@***.***>
--
NOTICE: This email and its attachments may contain privileged and
confidential information, only for the viewing and use of the intended
recipient. If you are not the intended recipient, you are hereby notified
that any disclosure, copying, distribution, acting upon, or use of the
information contained in this email and its attachments is strictly
prohibited and that this email and its attachments must be immediately
returned to the sender and deleted from your system. If you received this
email erroneously, please notify the sender immediately. Xage Security,
Inc. and its affiliates will never request personal information (e.g.,
passwords, Social Security numbers) via email. Report suspicious emails to
***@***.*** ***@***.***>
|
Hi @tyler92 <https://github.com/tyler92>,
Thanks for raising this.
I had a look at the code in reactor.cc. I was wondering if there is a
somewhat more general issue, namely that "handlers" (which is a
HandlerList) in SyncImpl is not protected in the event of multithreaded
access.
We do already have a mutex poller.reg_unreg_mutex_ which protects against a
handler getting deregistered while in use. But this is more like protection
of individual handlers, rather than of the HandlerList as a whole.
Accordingly, as an experiment I have created a
branch HandlerListMultiThread - I've created a PR
<#1229> for it so you can see it
easily. In the HandlerListMultiThread branch, there is a
mutex handlers_arr_mutex_ private within HandlerList, which is used to
guard (aka "lock") the std::array of handlers when the array is accessed
within HandlerList's methods.
Looking at your asan.txt:
- In thread T148321, which does the access-after-free,
the HandlerList::forEachHandler function is in use, called out
of SyncImpl::run()
- In thread T0, the accessed address was previously freed by a call
to HandlerList::removeAll
In this new HandlerListMultiThread branch code, HandlerList::forEachHandler
and removeAll both have to acquire the mutex
HandlerList::handlers_arr_mutex_. I believe then that the new mutex should
stop the "removeAll" from stamping on the list before SyncImpl::run has
finished with it.
**** Could you please try out this new branch* with your test scenario?
(Without your "join" code, of course).
https://github.com/dgreatwood/pistachefork/tree/HandlerListMultiThread
(and see #1229)
Thanks once more!
Duncan
P.S. The above is concerned just with changes to reactor.cc. I haven't
looked at the other proposed changes in tests/tcp_client.h, though they
look logical at first glance.
On Sun, Aug 18, 2024 at 3:55 PM Duncan Greatwood ***@***.***>
wrote:
… I’ll try and take a look in next few days.
On Sun, Aug 18, 2024 at 12:17 PM tyler92 ***@***.***> wrote:
> OK. Just for clarification - I don't insist on my solution, probably
> there is a more correct way to achieve the same result
>
> —
> Reply to this email directly, view it on GitHub
> <#1228 (comment)>,
> or unsubscribe
> <https://github.com/notifications/unsubscribe-auth/AAFMA2ZLXFFCJIHWYKBDEHTZSDXM7AVCNFSM6AAAAABMWROXDGVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDEOJVGM3DEMRZGQ>
> .
> You are receiving this because you were mentioned.Message ID:
> ***@***.***>
>
--
NOTICE: This email and its attachments may contain privileged and
confidential information, only for the viewing and use of the intended
recipient. If you are not the intended recipient, you are hereby notified
that any disclosure, copying, distribution, acting upon, or use of the
information contained in this email and its attachments is strictly
prohibited and that this email and its attachments must be immediately
returned to the sender and deleted from your system. If you received this
email erroneously, please notify the sender immediately. Xage Security,
Inc. and its affiliates will never request personal information (e.g.,
passwords, Social Security numbers) via email. Report suspicious emails to
***@***.*** ***@***.***>
|
Yes, it's a "bonus" change that fixes FD leak, but I can PR it separately |
There is a data race with quite simple scenario:
Address sanitizer report: asan.txt
As far as I understood it's not legal to modify
SyncImpl::handlers_
beforeSyncImpl::run
is finished. So the fix joins worker threads beforeSyncImpl::handlers_
invalidation.I'm not sure the fix is correct from the original design point of view, but on my local machine, ASAN is happy with the provided example.
I think the following issues are related: #842 #1018 #539