-
Notifications
You must be signed in to change notification settings - Fork 1.2k
llmq: Fix thread handling in CDKGSessionManager and CDKGSessionHandler #3601
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Oh, while looking at the open PRs i just noticed this one here seems to replace #3296 as it solves the issue pointed out there. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Makes sense. One suggestion, pls see c25a37c
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Slightly tested ACK, seems to be working with no issues. Would really like to hear @codablock's thoughts to make sure we don't throw the baby out with the bathwater here somehow :)
Concept ACK. Just one review comment :) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Concept ACK, waiting for #3609 merged and rebase
Co-Authored-By: UdjinM6 <UdjinM6@users.noreply.github.com>
1bb2c61
to
ef66d06
Compare
Rebased! |
llmqDb(_llmqDb), | ||
blsWorker(_blsWorker) | ||
{ | ||
for (const auto& qt : Params().GetConsensus().llmqs) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why's this called qt? maybe quorum
or quorumThread
would be clearer (I'm not actually sure what exactly this variable represents)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have no idea why its called like that i just moved it around a bit. But it represents a key/value pair from the Consensus::Params::llmqs
map which contains LLMQType, LLMQParams
pairs for the available LLMQs.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What if we make it not auto so it's clearer what this is referring to. IMO we should only really be using auto where by the name of it and the surrounding code it's obvious. I guess it's obvious in the sense you can look at llmqs
and find what that is...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
guess it's obvious in the sense you can look at
llmqs
and find what
Hm.. so for me that auto
style isn't an issue at all because the declaration of llmqs
is only a click away. But thats obviously a personal preference thing and at the end anyway kind of unrelated to this PR though because it's used like this all over the place :D
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, it's fine how it is. Just makes code review easier when it's obvious what variables mean
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
utACK
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ACK
dashpay#3601) * llqm: Fix thread handling in CDKGSessionManager and CDKGSessionHandler * llmq: Removed unused thread_pool from CDKGSessionManager * Tweak `CDKGSessionHandler::StartThread()` * llmq: Simplify CDKGSessionHandler's thread naming * llmq: Make sure CDKGSessionHandler uses a valid LLMQ type Co-Authored-By: UdjinM6 <UdjinM6@users.noreply.github.com> Co-authored-by: UdjinM6 <UdjinM6@users.noreply.github.com>
backported in #3670 |
dashpay#3601) * llqm: Fix thread handling in CDKGSessionManager and CDKGSessionHandler * llmq: Removed unused thread_pool from CDKGSessionManager * Tweak `CDKGSessionHandler::StartThread()` * llmq: Simplify CDKGSessionHandler's thread naming * llmq: Make sure CDKGSessionHandler uses a valid LLMQ type Co-Authored-By: UdjinM6 <UdjinM6@users.noreply.github.com> Co-authored-by: UdjinM6 <UdjinM6@users.noreply.github.com>
This is inspired by this CI test failure https://gitlab.com/dashpay/dash/-/jobs/614306340.
p2p-instantsend.py
failed with aSegmentation fault
https://gitlab.com/dashpay/dash/-/jobs/614306340#L3201
My understanding of this issue:
phaseHandlerThread.join()
is only called in the destructor ofCDKGSessionHandler
.dash/src/llmq/quorums_dkgsessionhandler.cpp
Lines 104 to 110 in a7f4f95
This happens here
dash/src/init.cpp
Line 326 in a7f4f95
which is after the networking threads have been stopped already which happens some lines above due to
dash/src/init.cpp
Line 287 in a7f4f95
So it seems like if
CDKGSessionHandler::PhaseHandlerThread
has pending messages its possible that the code before the nextShutdownRequested()
check ends up inCDKGSession::ReceiveMessage
where it tries to relay a message even though the networking threads are stopped already. Unfortunately i couldn't figure the exact line where the segmentation fault happens but as trace ends with aLOCK
afterForEachNode<llmq::CDKGSession::RelayInvToParticipants(const CInv&) const::<lambda(CNode*)> >
it does probably end somewhere in heredash/src/llmq/quorums_dkgsession.cpp
Lines 1314 to 1323 in 3f7553a
If i would have to choose one it would be
dash/src/llmq/quorums_dkgsession.cpp
Line 1322 in 3f7553a
If someone has an exact line and the explanation why please let me know!
No matter what line it is.. i would say this just happens because the dkg threads do run longer than the networking threads and this will be fixed with this PR due to an earlier
join()
for all dkg threads which happens heredash/src/init.cpp
Line 247 in 3f7553a
Questions i still have now
messageHandlerPool
) which i removed fromCDKGSessionManager
in f3db650? For me this looked like a leftover or some unfinished stuff unless this is some crazy magic which has an impact which i can't see yet. So i removed it. If there is some reason to revoke this please tell me!