-
Notifications
You must be signed in to change notification settings - Fork 1.2k
fix: use correct hash of quorum as cycle hash for regtest #6673
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
WalkthroughThe change modifies the quorum selection logic in the InstantSend module's 📜 Recent review detailsConfiguration used: CodeRabbit UI 📒 Files selected for processing (1)
🚧 Files skipped from review as they are similar to previous changes (1)
⏰ Context from checks skipped due to timeout of 90000ms (1)
✨ Finishing Touches
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. 🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
SupportNeed help? Create a ticket on our support page for assistance with any issues or questions. Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (Invoked using PR comments)
Other keywords and placeholders
CodeRabbit Configuration File (
|
PastaPastaPasta
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
utACK 4b66448
src/llmq/instantsend.cpp
Outdated
| islock.cycleHash = (llmq_params_opt->useRotation ? quorum->m_quorum_base_block_index->GetAncestor(cycle_height) | ||
| : quorum->m_quorum_base_block_index) | ||
| ->GetBlockHash(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should make no difference because quorum->m_quorum_base_block_index->nHeight % llmq_params_opt->dkgInterval is 0 for non-rotating quorums so cycle_height == quorum->m_quorum_base_block_index->nHeight and m_quorum_base_block_index->GetAncestor(cycle_height) == m_quorum_base_block_index.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, you are right; I reverted this part
| nSignHeight = blockIndex->nHeight + dkgInterval - 1; | ||
| } | ||
| // For RegTest non-rotating quorum cycleHash has directly quorum hash | ||
| auto quorum = llmq_params.useRotation ? llmq::SelectQuorumForSigning(llmq_params, m_chainstate.m_chain, qman, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
SelectQuorumForSigning should handle this internally already
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it should handle this internally, but somehow SelectQuorumForSigning returns sometimes is not exactly same quorum -> validation fails (see logs):
node0 2025-06-19T14:49:28.463903Z (mocktime: 2014-12-12T05:56:41Z) [ isman] [llmq/instantsend.cpp:957] [ProcessPendingInstantSendLocks] [instantsend] CInstantSendManager::ProcessPendingInstantSendLocks -- verified locks. count=1, alreadyVerified=0, vt=4, nodes=1
node0 2025-06-19T14:49:28.463934Z (mocktime: 2014-12-12T05:56:41Z) [ isman] [llmq/instantsend.cpp:976] [ProcessPendingInstantSendLocks] [instantsend] CInstantSendManager::ProcessPendingInstantSendLocks -- txid=11fe72d9be3c2fe4a2545429ed88c20928e21ece77de896423dbbb55fe00679d, islock=527c905af5e2f8d7e7be84e57c19ae6fe22b37bcc96735d475c21f1f9ce57927: invalid sig in islock, peer=1
node0 2025-06-19T14:49:28.463951Z (mocktime: 2014-12-12T05:56:41Z) [ isman] [llmq/instantsend.cpp:876] [ProcessPendingInstantSendLocks] [instantsend] CInstantSendManager::ProcessPendingInstantSendLocks -- doing verification on old active set
node0 2025-06-19T14:49:28.468400Z (mocktime: 2014-12-12T05:56:41Z) [ isman] [llmq/instantsend.cpp:957] [ProcessPendingInstantSendLocks] [instantsend] CInstantSendManager::ProcessPendingInstantSendLocks -- verified locks. count=1, alreadyVerified=0, vt=4, nodes=1
node0 2025-06-19T14:49:28.468436Z (mocktime: 2014-12-12T05:56:41Z) [ isman] [net_processing.cpp:1799] [Misbehaving] [net] Misbehaving: peer=1 (0 -> 20)
node0 2025-06-19T14:49:28.468448Z (mocktime: 2014-12-12T05:56:41Z) [ isman] [llmq/instantsend.cpp:976] [ProcessPendingInstantSendLocks] [instantsend] CInstantSendManager::ProcessPendingInstantSendLocks -- txid=11fe72d9be3c2fe4a2545429ed88c20928e21ece77de896423dbbb55fe00679d, islock=527c905af5e2f8d7e7be84e57c19ae6fe22b37bcc96735d475c21f1f9ce57927: invalid sig in islock, peer=1
invalid sig in islock is caused by wrong value signHash (I added extra logs while investigation issue) and that happens due to wrong quorum->qc->quorumHash.
I can not figure what exactly causes indeterminism in SelectQuorumForSigning, but this fix is quite reliable on my localhost (5-10% failures vs ~0% failures with the exactly same log message)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It looks like nSignHeight adjustment above works with rotating quorums only. These calculations could be tweaked to work with non-rotating quorums too I guess but I like simply using GetQuorum instead.
|
I actually reviewed it a month ago initially, just didn't post it cause I couldn't understand why any of this makes any difference. I kept drafts of my review to re-review it again a bit later and... I might be missing smth but I still don't see any changes in behaviour here and why implementing this could affect tests in any way 🙈 |
…ional tests 0d9418e test: reduce delay in wait_until from 0.5s to 0.05s (Konstantin Akimov) 876d6c8 test: enforce 1 second delay for wait_for_sporks helper (Konstantin Akimov) ec6e7bf test: enforce 1s delay for feature_mnehf test (Konstantin Akimov) 6ab3f7c test: reduce spamming quorum list to logs while waiting (Konstantin Akimov) 9d9975f test: simplify wait_for_quorum_list (Konstantin Akimov) Pull request description: ## Issue being fixed or feature implemented Waiting for 0.5s in functional test for every action is a bit excessive, especially for p2p tests that sending messages by localnetwork and waiting at least 0.5 seconds before checking if message is received. ## What was done? Decreasing default delay from 0.5s to 0.05s. It affects mostly p2p tests, but many other tests become faster too. For quorum formation; for sporks and some other dash specific features bigger delays (0.5s, 1s) are used. Further improvements are blocked by #6673, #6672, #6671 and are out of scope this PR. ## How Has This Been Tested? Speed up on CI for 30% and more. [develop] linux64-test https://gitlab.com/dashpay/dash/-/jobs/10049432489 ALL | ✓ Passed | 7241 s (accumulated) Runtime: 1272 s [PR] linux64-test https://gitlab.com/dashpay/dash/-/jobs/10067158169 ALL | ✓ Passed | 5421 s (accumulated) Runtime: 938 s **-25%** [develop] linux64-nowallet https://gitlab.com/dashpay/dash/-/jobs/10049432511 ALL | ✓ Passed | 2739 s (accumulated) Runtime: 488 s [PR] linux64-nowallet https://gitlab.com/dashpay/dash/-/jobs/10067158174 ALL | ✓ Passed | 1232 s (accumulated) Runtime: 243 s **-49%** [develop] linux64-tsan https://gitlab.com/dashpay/dash/-/jobs/10049432499 ALL | ✓ Passed | 10399 s (accumulated) Runtime: 2023 s [PR] linux64-tsan https://gitlab.com/dashpay/dash/-/jobs/10072993489 ALL | ✓ Passed | 8710 s (accumulated) Runtime: 1543 s **-25%** [develop] Functional tests on localhost (-O3, debug, no sanitizers, -j20): ALL | ✓ Passed | 6680 s (accumulated) Runtime: 372 s [PR] Functional tests on localhost (-O3, debug, no sanitizers, -j20): ALL | ✓ Passed | 4609 s (accumulated) Runtime: 365 s **Benefits of running locally in 20 parallel jobs are very slight. Accumulated time is decreased for 32% as expected, but total time is improved less than 2%.** It is because the slowest tests requires many quorums to be formed and they are still slow. ## Breaking Changes N/A ## Checklist: - [x] I have performed a self-review of my own code - [ ] I have commented my code, particularly in hard-to-understand areas - [ ] I have added or updated relevant unit/integration/functional/e2e tests - [ ] I have made corresponding changes to the documentation - [x] I have assigned this pull request to a milestone ACKs for top commit: PastaPastaPasta: utACK 0d9418e; hopefully it doesn't make tests flakey UdjinM6: utACK 0d9418e Tree-SHA512: 32405bd1f229af5146c96aea6031cee3f084d3ebfb3ec515ad743e79c3bc29a5c891d4330688d07b63b0e06ef7cd50240ab8b6d1a3939a56fe3e64a55918edd1
4b66448 to
ff61071
Compare
UdjinM6
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
light ACK ff61071
| nSignHeight = blockIndex->nHeight + dkgInterval - 1; | ||
| } | ||
| // For RegTest non-rotating quorum cycleHash has directly quorum hash | ||
| auto quorum = llmq_params.useRotation ? llmq::SelectQuorumForSigning(llmq_params, m_chainstate.m_chain, qman, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It looks like nSignHeight adjustment above works with rotating quorums only. These calculations could be tweaked to work with non-rotating quorums too I guess but I like simply using GetQuorum instead.
… regtest ff61071 fix: use correct hash of quorum as cycle hash to prevent possible misunderstandings (Konstantin Akimov) Pull request description: ## Issue being fixed or feature implemented feature_llmq_singlenode.py works unstable, roughly 10% failure rate. It fails with this error: 2025-05-17T15:03:08.586000Z TestFramework (INFO): InstantSend lock on tx: 202b6698cf0da8adfeff8430daff3d1cfe767d1120751bb7765f6df15d09a3ee is expecting 2025-05-17T15:04:09.033000Z TestFramework.utils (ERROR): wait_until() failed. Predicate: '''' def check_instantlock(): self.bump_mocktime(1) try: return node.getrawtransaction(txid, True)["instantlock"] except: return False ''' 2025-05-17T15:04:09.033000Z TestFramework (ERROR): Assertion failed Deep in logs the failure look like this: ``` node0 2025-05-17T14:40:45.628148Z (mocktime: 2014-12-12T05:56:24Z) [ isman] [llmq/instantsend.cpp:985] [ProcessPendingInstantSendLocks] [instantsend] CInstantSendManager::ProcessPendingInstantSendLocks -- txid=202b6698cf0da8adfeff8430daff3d1cfe767d1120751bb7765f6df15d09a3ee, islock=89b6a8bb50c2a6b07ebd2bbde0727a72ed4fbe4f78247aab509189b50fc7293f: invalid sig in islock, peer=1 ``` Further debug shown that somehow sign-hash sometimes is different; while msgHash is same; signature is valid and bytes-per-bytes matched; just validation code does not work as expected. ## What was done? For non-rotating quorums on RegTest should be used directly quorum hash instead `cycleQuorum`; no quorum choosing procedure is needed. ## How Has This Been Tested? Run multiple times functional tests. ## Breaking Changes N/A; affect only RegTests for special case of non-rotating quorum used for InstantSend. ## Checklist: - [x] I have performed a self-review of my own code - [ ] I have commented my code, particularly in hard-to-understand areas - [ ] I have added or updated relevant unit/integration/functional/e2e tests - [ ] I have made corresponding changes to the documentation - [x] I have assigned this pull request to a milestone ACKs for top commit: UdjinM6: light ACK ff61071 Tree-SHA512: 7be0eb246f13f851de2c6ce6543f0bac8faf64cbc2a8f5fd6763f0aaca602cfb99e6df882d9e9e8a6d65fbdcdd9188e925427578d86729696dc23652856f7e1c
… regtest ff61071 fix: use correct hash of quorum as cycle hash to prevent possible misunderstandings (Konstantin Akimov) Pull request description: ## Issue being fixed or feature implemented feature_llmq_singlenode.py works unstable, roughly 10% failure rate. It fails with this error: 2025-05-17T15:03:08.586000Z TestFramework (INFO): InstantSend lock on tx: 202b6698cf0da8adfeff8430daff3d1cfe767d1120751bb7765f6df15d09a3ee is expecting 2025-05-17T15:04:09.033000Z TestFramework.utils (ERROR): wait_until() failed. Predicate: '''' def check_instantlock(): self.bump_mocktime(1) try: return node.getrawtransaction(txid, True)["instantlock"] except: return False ''' 2025-05-17T15:04:09.033000Z TestFramework (ERROR): Assertion failed Deep in logs the failure look like this: ``` node0 2025-05-17T14:40:45.628148Z (mocktime: 2014-12-12T05:56:24Z) [ isman] [llmq/instantsend.cpp:985] [ProcessPendingInstantSendLocks] [instantsend] CInstantSendManager::ProcessPendingInstantSendLocks -- txid=202b6698cf0da8adfeff8430daff3d1cfe767d1120751bb7765f6df15d09a3ee, islock=89b6a8bb50c2a6b07ebd2bbde0727a72ed4fbe4f78247aab509189b50fc7293f: invalid sig in islock, peer=1 ``` Further debug shown that somehow sign-hash sometimes is different; while msgHash is same; signature is valid and bytes-per-bytes matched; just validation code does not work as expected. ## What was done? For non-rotating quorums on RegTest should be used directly quorum hash instead `cycleQuorum`; no quorum choosing procedure is needed. ## How Has This Been Tested? Run multiple times functional tests. ## Breaking Changes N/A; affect only RegTests for special case of non-rotating quorum used for InstantSend. ## Checklist: - [x] I have performed a self-review of my own code - [ ] I have commented my code, particularly in hard-to-understand areas - [ ] I have added or updated relevant unit/integration/functional/e2e tests - [ ] I have made corresponding changes to the documentation - [x] I have assigned this pull request to a milestone ACKs for top commit: UdjinM6: light ACK ff61071 Tree-SHA512: 7be0eb246f13f851de2c6ce6543f0bac8faf64cbc2a8f5fd6763f0aaca602cfb99e6df882d9e9e8a6d65fbdcdd9188e925427578d86729696dc23652856f7e1c
Issue being fixed or feature implemented
feature_llmq_singlenode.py works unstable, roughly 10% failure rate. It fails with this error:
Deep in logs the failure look like this:
Further debug shown that somehow sign-hash sometimes is different; while msgHash is same; signature is valid and bytes-per-bytes matched; just validation code does not work as expected.
What was done?
For non-rotating quorums on RegTest should be used directly quorum hash instead
cycleQuorum; no quorum choosing procedure is needed.How Has This Been Tested?
Run multiple times functional tests.
Breaking Changes
N/A; affect only RegTests for special case of non-rotating quorum used for InstantSend.
Checklist: