-
Notifications
You must be signed in to change notification settings - Fork 784
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Avoid acquiring another read lock while holding one to avoid potential deadlock #6200
Conversation
I'd need to check the drop rules, but the compiler may insert a drop for Nice find!! |
fc0a8b5
to
059e4d7
Compare
@mergify queue |
🛑 The pull request has been removed from the queue
|
@mergify requeue |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice find! But why did this showed up in das branch and not on stable?
@mergify requeue |
✅ This pull request will be re-embarked automaticallyThe followup |
🛑 The pull request has been removed from the queue
|
My guess is this is a super rare edge case - its really hard to hit this because there's no long running operation between the two read locks - only simple time calculation and logging - and this could only happen if another thread attempts to acquire a write lock before the 2nd read lock is being acquired. |
@mergify refresh |
✅ Pull request refreshed |
@mergify requeue |
✅ This pull request will be re-embarked automaticallyThe followup |
🛑 The pull request has been removed from the queue
|
@mergify requeue |
✅ This pull request will be re-embarked automaticallyThe followup |
🛑 The pull request has been removed from the queue
|
@mergify refresh |
✅ Pull request refreshed |
@mergify queue |
🛑 The pull request has been removed from the queue
|
@mergify requeue |
✅ This pull request will be re-embarked automaticallyThe followup |
✅ The pull request has been merged automaticallyThe pull request has been merged automatically at 9b3b730 |
…l deadlock (sigp#6200) * Avoid acquiring another read lock to avoid potential deadlock.
…l deadlock (sigp#6200) * Avoid acquiring another read lock to avoid potential deadlock.
Issue Addressed
While testing PeerDAS I ran into a scenario that looked like a deadlock. The Lighthouse process was still live but stopped responding and logging anything.
The backtrace shows that one of the thread is waiting for a read lock
while another one is waiting for a write lock, potentially here
I'm not sure if I've got to the bottom of the problem yet, but I'm suspecting that the code here is causing the deadlock, as it tries to acquire a read lock while we already hold one in scope. If
do_update
attempts to acquire a write lock when the deposit cache read lock is in place, then it will wait for the read lock to release. However the code below attempts to acquire the read lock again while the old read lock is still in scope, causing the write lock to be stuck and a dead lock in this function:lighthouse/beacon_node/eth1/src/service.rs
Line 1132 in 19b3ab3
This part of the code hasn't changed in ages though, so I could be wrong or running into an edge case.