-
Notifications
You must be signed in to change notification settings - Fork 3.4k
Fix infinite wait bug in test_pthread_proxying_cpp.cpp #18358
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
The tests in test_pthread_proxying_cpp proxy functions that increment local variables. For the async proxying functions, the proxying is followed by a wait on a condition variable so the test can assert that the proxied work was completed. However, the incrementing of the local variables was not protected by the lock used with the condition variable, so it was possible for the variable to be incremented and the condition variable notified after checking the wait condition but before waiting. Since the condition variable would not be notified again, the test would wait on the condition variable forever. Fix the bug by taking the lock before incrementing the variable in tests where this could cause problems. Fixes #18353.
Current dependencies on/for this PR:
This comment was auto-generated by Graphite. |
It had previously been disabled because of flakes on wasm2js, but those flakes are probably fixed by #18358.
{ | ||
std::unique_lock<std::mutex> lock(mutex); | ||
i = 3; | ||
} | ||
executor = std::this_thread::get_id(); | ||
cond.notify_one(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You don't need to hold the lock when calling notify too?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, it's actually a pessimization to hold the lock while calling notify because then the other thread wakes up just to immediately block on acquiring the lock.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
TIL!
{ | ||
std::unique_lock<std::mutex> lock(mutex); | ||
i = 3; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Despite reading you description I've having trouble seeing how putting a lock around this one assignment fixes the issue. Would making i
and atomic also fix the issue?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
After reading https://stackoverflow.com/questions/17101922/do-i-have-to-acquire-lock-before-calling-condition-variable-notify-one I think i understand.. I had never known this about cond vars..
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, the lock is important because of the condition variable. The condition variable needs to atomically check the condition and block if the condition is not met. The mechanism for combining those into an atomic action is the lock, so if the condition can change without the lock being held, the condition variable doesn't work as intended.
I thought the idea was that the outside holds the mutex so that the thread blocks until the right time, but reading it again, I think I misunderstood it before as the outside grabs the mutex after firing off the async event, so they actually race. So I'm also not sure about this change, and not sure why it was failing before... |
Right, since the proxying is async, the proxied work races with the following code on the proxying thread. To resolve the race, we use explicit condition variables to wait for the proxied work to be completed. The bug was that we weren't using the condition variables properly. Condition variables only work if the condition is protected by the same lock passed to |
You can go ahead and land without waiting for the restarted sockets tests if you like.. that failure was clearly unrelated. |
It had previously been disabled because of flakes on wasm2js, but those flakes are probably fixed by #18358.
It had previously been disabled because of flakes on wasm2js, but those flakes are probably fixed by #18358.
The tests in test_pthread_proxying_cpp proxy functions that increment local
variables. For the async proxying functions, the proxying is followed by a wait
on a condition variable so the test can assert that the proxied work was
completed. However, the incrementing of the local variables was not protected by
the lock used with the condition variable, so it was possible for the variable
to be incremented and the condition variable notified after checking the wait
condition but before waiting. Since the condition variable would not be notified
again, the test would wait on the condition variable forever.
Fix the bug by taking the lock before incrementing the variable in tests where
this could cause problems.
Fixes #18353.