-
Notifications
You must be signed in to change notification settings - Fork 1.2k
perf: improve islock / llmq signing latency #6896
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: develop
Are you sure you want to change the base?
perf: improve islock / llmq signing latency #6896
Conversation
…nsiveness Adjusted the timing intervals in the WorkThreadMain function of CSigSharesManager to 10 milliseconds for both message sending and work interruption checks. This change enhances the responsiveness of the signing shares manager by allowing more frequent processing of pending signature shares and message sending. This results in ~33% latency improvements in a contrived local latency functional test / benchmark from ~500ms to ~333ms
… CSigSharesManager Added a NotifyWorker function to both CInstantSendManager and CSigSharesManager to signal the worker thread when new work is available. This change improves the responsiveness of the worker threads by allowing them to wake up promptly when there are pending tasks, thus enhancing overall performance and reducing latency in processing instant send locks and signature shares.
✅ No Merge Conflicts DetectedThis PR currently has no conflicts with other open PRs. |
WalkthroughIntroduces epoch/condition-variable wake-up mechanisms across InstantSend, LLMQ signing, and sigshares: adds workMutex, workCv, workEpoch, and a NotifyWorker() helper; InterruptWorkerThread now notifies the CV. InstantSend replaces pair-based pending entries with instantsend::PendingISLockFromPeer (fields node_id, islock) and updates all insertion/lookup/removal call sites. Many state-mutating methods now call NotifyWorker(). Worker threads were refactored from sleep-based polling to epoch-aware condition-variable waits with deadline-based periodic wakeups and steady_clock timing for cleanup/retry scheduling. Estimated code review effort🎯 4 (Complex) | ⏱️ ~60 minutes Pre-merge checks and finishing touches❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✨ Finishing touches
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (11)
src/llmq/signing.cpp (3)
434-436
: Move NotifyWorker() outside cs_pending to avoid waking into lock contentionYou notify while still holding cs_pending. Wake-ups can immediately contend on the same lock the worker needs. Release the lock before notifying.
Apply:
- LOCK(cs_pending); - if (pendingReconstructedRecoveredSigs.count(recoveredSig->GetHash())) { + { + LOCK(cs_pending); + if (pendingReconstructedRecoveredSigs.count(recoveredSig->GetHash())) { // no need to perform full verification LogPrint(BCLog::LLMQ, "CSigningManager::%s -- already pending reconstructed sig, signHash=%s, id=%s, msgHash=%s, node=%d\n", __func__, recoveredSig->buildSignHash().ToString(), recoveredSig->getId().ToString(), recoveredSig->getMsgHash().ToString(), from); - return ret; - } - - pendingRecoveredSigs[from].emplace_back(recoveredSig); - workEpoch.fetch_add(1, std::memory_order_acq_rel); - NotifyWorker(); + return ret; + } + pendingRecoveredSigs[from].emplace_back(recoveredSig); + } + workEpoch.fetch_add(1, std::memory_order_release); + NotifyWorker();
651-653
: Also notify after releasing cs_pending in PushReconstructedRecoveredSigSame contention pattern; release cs_pending before bumping epoch/notify.
- LOCK(cs_pending); - pendingReconstructedRecoveredSigs.emplace(std::piecewise_construct, std::forward_as_tuple(recoveredSig->GetHash()), std::forward_as_tuple(recoveredSig)); - workEpoch.fetch_add(1, std::memory_order_acq_rel); - NotifyWorker(); + { + LOCK(cs_pending); + pendingReconstructedRecoveredSigs.emplace(std::piecewise_construct, std::forward_as_tuple(recoveredSig->GetHash()), std::forward_as_tuple(recoveredSig)); + } + workEpoch.fetch_add(1, std::memory_order_release); + NotifyWorker();
837-863
: Epoch-based wait + deadline is solid; optional steady_clock clean-upThe wait logic is correct. Minor optional: store lastCleanupTime as a steady_clock timestamp (or compute deadline fully in steady_clock) to avoid system clock jumps influencing the cadence.
src/instantsend/instantsend.h (1)
66-70
: LGTM; minor nits
- Adding workMutex/workCv/workEpoch and waking on interrupt is correct.
- Optional: mark NotifyWorker() noexcept and use memory_order_release for fetch_add (loads already use acquire).
Also applies to: 112-112, 140-143
src/instantsend/instantsend.cpp (2)
164-167
: Notify after releasing locks to avoid waking into contentionThese NotifyWorker() calls run while cs_pendingLocks (ProcessMessage), cs_nonLocked (RemoveNonLockedTx), or cs_pendingLocks (TryEmplacePendingLock) are still held. Wake-ups can contend immediately on the same locks.
Move notifications after the lock scopes.
@@ - LOCK(cs_pendingLocks); - pendingInstantSendLocks.emplace(hash, std::make_pair(from, islock)); - NotifyWorker(); + { + LOCK(cs_pendingLocks); + pendingInstantSendLocks.emplace(hash, std::make_pair(from, islock)); + } + NotifyWorker(); @@ - LogPrint(BCLog::INSTANTSEND, "CInstantSendManager::%s -- txid=%s, pindexMined=%s\n", __func__, - tx->GetHash().ToString(), pindexMined ? pindexMined->GetBlockHash().ToString() : ""); - NotifyWorker(); + LogPrint(BCLog::INSTANTSEND, "CInstantSendManager::%s -- txid=%s, pindexMined=%s\n", __func__, + tx->GetHash().ToString(), pindexMined ? pindexMined->GetBlockHash().ToString() : ""); + NotifyWorker(); @@ - LOCK(cs_nonLocked); + LOCK(cs_nonLocked); ... - LogPrint(BCLog::INSTANTSEND, "CInstantSendManager::%s -- txid=%s, retryChildren=%d, retryChildrenCount=%d\n", - __func__, txid.ToString(), retryChildren, retryChildrenCount); - NotifyWorker(); + LogPrint(BCLog::INSTANTSEND, "CInstantSendManager::%s -- txid=%s, retryChildren=%d, retryChildrenCount=%d\n", + __func__, txid.ToString(), retryChildren, retryChildrenCount); + // cs_nonLocked is released here (end of function scope) in current layout; + // if kept, wrap the critical section in its own block and notify after. + NotifyWorker(); @@ - LOCK(cs_pendingLocks); - if (!pendingInstantSendLocks.count(hash)) { - pendingInstantSendLocks.emplace(hash, std::make_pair(id, islock)); - } - NotifyWorker(); + { + LOCK(cs_pendingLocks); + if (!pendingInstantSendLocks.count(hash)) { + pendingInstantSendLocks.emplace(hash, std::make_pair(id, islock)); + } + } + NotifyWorker();Also applies to: 561-562, 602-603, 633-634
934-967
: Epoch-based wait loop is correct; optional deadline for periodic workLooping while there’s pending work and otherwise waiting on epoch/interrupt is good. If InstantSend ever needs periodic timeouts without external triggers, consider a bounded wait (like signing.cpp’s next-deadline) to guarantee periodic wakeups.
Please confirm there are no time-based maintenance tasks in InstantSend that require periodic wakeups.
src/llmq/signing.h (1)
241-246
: Worker wake-up API is consistent; minor polish
- Adding workMutex/workCv/workEpoch and exposing NotifyWorker is consistent with usage in signing.cpp.
- Optional: declare NotifyWorker noexcept and consider memory_order_release on fetch_add for clarity.
Also applies to: 251-251
src/llmq/signing_shares.cpp (4)
1502-1556
: Worker loop: event-driven with deadlines; consider two small tweaksThe epoch/CV wait with deadline calculation is sound. Two optional improvements:
- If no deadline is computed, prefer an indefinite wait (with predicate) over 10ms polling to reduce wakeups.
- Use a single time source for recovery timing (either util::GetTimestd::chrono::milliseconds() everywhere or TicksSinceEpoch) for consistency.
Example for the first point:
- if (next_deadline == std::chrono::steady_clock::time_point::max()) { - workCv.wait_for(l, std::chrono::milliseconds(10), [this, startEpoch]{ - return bool(workInterrupt) || workEpoch.load(std::memory_order_acquire) != startEpoch; - }); - } else { + if (next_deadline == std::chrono::steady_clock::time_point::max()) { + workCv.wait(l, [this, startEpoch]{ + return bool(workInterrupt) || workEpoch.load(std::memory_order_acquire) != startEpoch; + }); + } else { workCv.wait_until(l, next_deadline, [this, startEpoch]{ return bool(workInterrupt) || workEpoch.load(std::memory_order_acquire) != startEpoch; }); - } + }
1563-1565
: Notify outside cs_pendingSigns to minimize contentionNotify while not holding cs_pendingSigns, so the worker can grab it immediately.
void CSigSharesManager::AsyncSign(const CQuorumCPtr& quorum, const uint256& id, const uint256& msgHash) { - LOCK(cs_pendingSigns); - pendingSigns.emplace_back(quorum, id, msgHash); - // Wake worker to handle new pending sign immediately - NotifyWorker(); + { + LOCK(cs_pendingSigns); + pendingSigns.emplace_back(quorum, id, msgHash); + } + // Wake worker to handle new pending sign immediately + NotifyWorker(); }
1690-1692
: Release cs before notifyingAvoid notifying while holding cs; release first to reduce contention.
void CSigSharesManager::ForceReAnnouncement(const CQuorumCPtr& quorum, Consensus::LLMQType llmqType, const uint256& id, const uint256& msgHash) { if (IsAllMembersConnectedEnabled(llmqType, m_sporkman)) { return; } - - LOCK(cs); - auto signHash = SignHash(llmqType, quorum->qc->quorumHash, id, msgHash).Get(); - if (const auto *const sigs = sigShares.GetAllForSignHash(signHash)) { - for (const auto& [quorumMemberIndex, _] : *sigs) { - // re-announce every sigshare to every node - sigSharesQueuedToAnnounce.Add(std::make_pair(signHash, quorumMemberIndex), true); - } - } - for (auto& [_, nodeState] : nodeStates) { - auto* session = nodeState.GetSessionBySignHash(signHash); - if (session == nullptr) { - continue; - } - session->knows.SetAll(false); - session->sendSessionId = UNINITIALIZED_SESSION_ID; - } - // Wake worker so announcements are sent promptly - NotifyWorker(); + { + LOCK(cs); + auto signHash = SignHash(llmqType, quorum->qc->quorumHash, id, msgHash).Get(); + if (const auto *const sigs = sigShares.GetAllForSignHash(signHash)) { + for (const auto& [quorumMemberIndex, _] : *sigs) { + sigSharesQueuedToAnnounce.Add(std::make_pair(signHash, quorumMemberIndex), true); + } + } + for (auto& [_, nodeState] : nodeStates) { + auto* session = nodeState.GetSessionBySignHash(signHash); + if (session == nullptr) continue; + session->knows.SetAll(false); + session->sendSessionId = UNINITIALIZED_SESSION_ID; + } + } + // Wake worker so announcements are sent promptly + NotifyWorker(); }
1698-1701
: Release cs before notifyingSame rationale as above; notify after releasing cs.
MessageProcessingResult CSigSharesManager::HandleNewRecoveredSig(const llmq::CRecoveredSig& recoveredSig) { - LOCK(cs); - RemoveSigSharesForSession(recoveredSig.buildSignHash().Get()); - // Cleaning up a session can free resources; wake worker to proceed - NotifyWorker(); + { + LOCK(cs); + RemoveSigSharesForSession(recoveredSig.buildSignHash().Get()); + } + // Cleaning up a session can free resources; wake worker to proceed + NotifyWorker(); return {}; }
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (6)
src/instantsend/instantsend.cpp
(11 hunks)src/instantsend/instantsend.h
(3 hunks)src/llmq/signing.cpp
(5 hunks)src/llmq/signing.h
(2 hunks)src/llmq/signing_shares.cpp
(8 hunks)src/llmq/signing_shares.h
(3 hunks)
🧰 Additional context used
📓 Path-based instructions (1)
src/**/*.{cpp,h,cc,cxx,hpp}
📄 CodeRabbit inference engine (CLAUDE.md)
src/**/*.{cpp,h,cc,cxx,hpp}
: Dash Core C++ codebase must be written in C++20 and require at least Clang 16 or GCC 11.1
Dash uses unordered_lru_cache for efficient caching with LRU eviction
Files:
src/llmq/signing.cpp
src/llmq/signing.h
src/instantsend/instantsend.h
src/llmq/signing_shares.h
src/instantsend/instantsend.cpp
src/llmq/signing_shares.cpp
🔇 Additional comments (11)
src/llmq/signing.cpp (3)
526-533
: Early-continue on pending queues is correctReturning true when either queue still has items prevents unnecessary sleeping. Good.
585-592
: Good: report more work when queues non-empty or batch threshold hitThis avoids idle sleeps under load. Looks correct.
831-832
: Interrupt path properly wakes the workerCalling workCv.notify_all() on interrupt is appropriate. LGTM.
src/instantsend/instantsend.cpp (1)
462-463
: LGTM: timely worker wake-ups at key state transitionsWaking after mempool adds/removals, block connects/disconnects, ChainLock notifications, and tip updates looks correct.
Also applies to: 480-481, 511-512, 518-519, 639-640, 657-658
src/llmq/signing_shares.h (1)
383-387
: Worker wake-up primitives look good; keep annotations or document; verify implementation exists
- New workMutex/workCv/workEpoch and NotifyWorker are appropriate.
- You removed thread-safety annotations from SignPendingSigShares/WorkThreadMain. Consider restoring them (or documenting internal locking) to keep static checks useful.
- Ensure CSigSharesManager::NotifyWorker is defined in the .cpp and used at all state-mutating sites (message handlers, queue inserts, timers).
Run to verify definition and usages:
Also applies to: 494-496, 497-497
src/llmq/signing_shares.cpp (6)
218-220
: Prompt shutdown wakeup is correctNotifying the worker during interrupt ensures timely exit. Looks good.
301-304
: Wake worker after inbound message processingGood call to notify; inbound messages can create immediate work.
735-736
: Batching announce + deferred notify is solidQueuing under cs and notifying after unlock minimizes contention and spurious wakeups. Nice.
Also applies to: 744-745, 766-769
1496-1497
: Notify after ban to reroute workWaking the worker after marking a node banned is appropriate.
1589-1591
: Post-sign notify to progress announcements/recoveryGood to wake after enqueuing new work.
1703-1709
: Epoch + CV notify is appropriate for lossless wakeupsAtomic increment plus notify_one is correct for a single worker.
} | ||
if (fMoreWork) continue; | ||
std::unique_lock<Mutex> l(workMutex); | ||
workCv.wait(l, [this, startEpoch]{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is there any chance that event could be missing and thread will wait forever? For example, when app is terminating.
…nager Adjusted the formatting of the NotifyWorker function in CInstantSendManager for improved readability. This change includes consistent spacing and indentation, enhancing code maintainability without altering functionality.
…ingSigShares Modified the WorkThreadMain function in signing.h and SignPendingSigShares in signing_shares.h to specify exclusive lock requirements. This change enhances thread safety by ensuring proper locking mechanisms are in place, preventing potential race conditions during concurrent operations.
dda0aef
to
708fa0d
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (2)
src/llmq/signing_shares.h (2)
383-387
: Concurrency primitives look right; please document lock ordering
- Using Mutex + condition_variable_any is appropriate.
- To avoid deadlocks, define/order: never block on workCv while holding cs or cs_pendingSigns; avoid taking workMutex while holding cs/cs_pendingSigns (or document a single consistent order if needed).
496-498
: Restore thread-safety annotation on WorkThreadMain; mark NotifyWorker noexcept
- The removed EXCLUSIVE_LOCKS_REQUIRED(!cs_pendingSigns) weakens static analysis. If WorkThreadMain still must not be called with cs_pendingSigns held, keep the attribute; otherwise use LOCKS_EXCLUDED(cs_pendingSigns).
- NotifyWorker should be noexcept (only increments and notifies).
Apply this diff:
- void WorkThreadMain(CConnman& connman, PeerManager& peerman); - void NotifyWorker(); + void WorkThreadMain(CConnman& connman, PeerManager& peerman) + EXCLUSIVE_LOCKS_REQUIRED(!cs_pendingSigns); + void NotifyWorker() noexcept;
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (2)
src/llmq/signing.h
(2 hunks)src/llmq/signing_shares.h
(3 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
- src/llmq/signing.h
🧰 Additional context used
📓 Path-based instructions (1)
src/**/*.{cpp,h,cc,cxx,hpp}
📄 CodeRabbit inference engine (CLAUDE.md)
src/**/*.{cpp,h,cc,cxx,hpp}
: Dash Core C++ codebase must be written in C++20 and require at least Clang 16 or GCC 11.1
Dash uses unordered_lru_cache for efficient caching with LRU eviction
Files:
src/llmq/signing_shares.h
🔇 Additional comments (2)
src/llmq/signing_shares.h (2)
29-29
: Include OK for condition_variable_anyCorrect header for std::condition_variable_any with custom Mutex.
494-495
: Good: explicit “not-held” requirementEXCLUSIVE_LOCKS_REQUIRED(!cs_pendingSigns) on SignPendingSigShares matches AsyncSign and prevents calls under that lock.
…hares.h Added exclusive lock requirements to the WorkThreadMain function in signing_shares.h to enhance thread safety. This change ensures that the function operates correctly under concurrent conditions, preventing potential race conditions.
running |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
119b81f should fix CPU issue but there is more, pls see below.
SendMessages(connman); | ||
lastSendTime = TicksSinceEpoch<std::chrono::milliseconds>(SystemClock::now()); | ||
} | ||
bool didSend = SendMessages(connman); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we had a cooldown for SendMessages()
for a reason but I don't remember why exactly 🙈 All qsig*
messages contain vectors, so maybe waiting a bit allowed us to utilise them better. Or maybe we simply did not want to push messages too often. Anyway, we need to confirm that there is no regressing in this part.
…antSendManager Updated the CInstantSendManager to use a new struct, PendingISLockFromPeer, for better clarity and type safety. This change replaces the use of std::pair for storing node ID and InstantSendLockPtr, enhancing code readability and maintainability across multiple functions handling instant send locks.
The 'pend' local variable in ProcessPendingInstantSendLocks was previously using the same data structure as pendingInstantSendLocks (a hash map). However, once we're in the processing step, we only iterate sequentially through the locks - there are no hash-based lookups. This commit changes 'pend' to use std::vector for better performance: - Improved cache locality with contiguous memory layout - Better CPU prefetching during iteration (3x through the data) - Eliminates hash map overhead (bucket allocation, pointer chasing) - Filtering step uses build-new-vector approach to maintain O(n) The typical case processes up to 32 locks, making the vector's sequential access pattern ideal for modern CPU cache hierarchies.
Fixes time-mixing bugs where mocked time (controllable in tests) was being used to compute steady_clock deadlines. Since mocked time and system time move independently, this caused incorrect wait behavior in tests. Changes: - Use steady_clock::time_point for all wait deadlines (nextAttemptTime, lastCleanupTimeSteady) - Keep mocked time (GetTime<>()) for business logic only (timeouts, session tracking) - Remove redundant workEpoch increments (NotifyWorker already does this) - Move NotifyWorker() calls to individual message handlers for better control This ensures that: 1. In production: steady_clock provides monotonic, reliable timing 2. In tests: mocked time controls business logic while steady_clock handles waits 3. No double-incrementing of workEpoch that could cause busy-wait issues
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 2
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
src/instantsend/instantsend.cpp (1)
1-1
: Fix clang-format differences.The pipeline reports clang-format differences in this file. Please run the formatting script to ensure code style consistency.
Run the following command to fix formatting:
clang-format -i src/instantsend/instantsend.cpp
♻️ Duplicate comments (1)
src/llmq/signing_shares.cpp (1)
1510-1517
: Tight loop when didSend/fMoreWork → likely cause of 100% CPU. Add a micro coalescing wait.Continuing immediately after sending or detecting more work can hot‑loop under load. Add a very short wait with the same epoch predicate to batch messages and avoid CPU peg, preserving sub‑ms latency.
- // If there is more work or we just sent something, iterate again without waiting - if (fMoreWork || didSend) { - continue; - } + // If there is more work or we just sent something, briefly coalesce to avoid hot looping + if (fMoreWork || didSend) { + std::unique_lock<Mutex> l(workMutex); + workCv.wait_for(l, std::chrono::milliseconds(1), [this, startEpoch]{ + return bool(workInterrupt) || workEpoch.load(std::memory_order_acquire) != startEpoch; + }); + continue; + }This also addresses the previously raised concern about removing the SendMessages cooldown.
🧹 Nitpick comments (4)
src/llmq/signing.cpp (1)
805-806
: Nit: thread name mismatch.CSigningManager worker is named "sigshares". Prefer a specific name like "recsigs" for profiling.
- workThread = std::thread(&util::TraceThread, "sigshares", [this, &peerman] { WorkThreadMain(peerman); }); + workThread = std::thread(&util::TraceThread, "recsigs", [this, &peerman] { WorkThreadMain(peerman); });src/llmq/signing_shares.cpp (2)
1260-1281
: Message batching cadence: validate no regression without explicit cooldown.Given frequent immediate loops, ensure QSIG* batching still groups efficiently under typical traffic. If not, consider a small per-peer send window (1–2ms) similar to the coalescing wait above.
1472-1498
: Wake on BanNode is good, but double-check request re-routing pressure.After banning, mass re-requests can cascade. Consider limiting per-iteration re-requests to avoid bursts.
src/instantsend/instantsend.cpp (1)
222-229
: Optional: Remove trailing blank line.Line 229 contains a trailing blank line that could be removed for consistency.
Apply this diff:
filteredPend.push_back(std::move(p)); } } - + // Now check against the previous active set and perform banning if this fails
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (6)
src/instantsend/instantsend.cpp
(20 hunks)src/instantsend/instantsend.h
(5 hunks)src/llmq/signing.cpp
(6 hunks)src/llmq/signing.h
(3 hunks)src/llmq/signing_shares.cpp
(13 hunks)src/llmq/signing_shares.h
(5 hunks)
🧰 Additional context used
📓 Path-based instructions (1)
src/**/*.{cpp,h,cc,cxx,hpp}
📄 CodeRabbit inference engine (CLAUDE.md)
src/**/*.{cpp,h,cc,cxx,hpp}
: Dash Core C++ codebase must be written in C++20 and require at least Clang 16 or GCC 11.1
Dash uses unordered_lru_cache for efficient caching with LRU eviction
Files:
src/llmq/signing.cpp
src/llmq/signing_shares.h
src/llmq/signing.h
src/llmq/signing_shares.cpp
src/instantsend/instantsend.h
src/instantsend/instantsend.cpp
🧬 Code graph analysis (6)
src/llmq/signing.cpp (3)
src/llmq/signing.h (1)
NotifyWorker
(252-252)src/llmq/signing_shares.cpp (7)
NotifyWorker
(1699-1704)NotifyWorker
(1699-1699)WorkThreadMain
(1500-1553)WorkThreadMain
(1500-1500)Cleanup
(1297-1434)Cleanup
(1297-1297)l
(1541-1541)src/instantsend/instantsend.cpp (3)
WorkThreadMain
(933-970)WorkThreadMain
(933-933)l
(965-965)
src/llmq/signing_shares.h (1)
src/llmq/signing_shares.cpp (6)
SignPendingSigShares
(1563-1587)SignPendingSigShares
(1563-1563)WorkThreadMain
(1500-1553)WorkThreadMain
(1500-1500)NotifyWorker
(1699-1704)NotifyWorker
(1699-1699)
src/llmq/signing.h (4)
src/llmq/signing.cpp (8)
WorkThreadMain
(826-856)WorkThreadMain
(826-826)StartWorkerThread
(798-806)StartWorkerThread
(798-798)StopWorkerThread
(808-818)StopWorkerThread
(808-808)InterruptWorkerThread
(820-824)InterruptWorkerThread
(820-820)src/llmq/signing_shares.cpp (10)
WorkThreadMain
(1500-1553)WorkThreadMain
(1500-1500)StartWorkerThread
(182-191)StartWorkerThread
(182-182)StopWorkerThread
(193-203)StopWorkerThread
(193-193)InterruptWorkerThread
(215-220)InterruptWorkerThread
(215-215)NotifyWorker
(1699-1704)NotifyWorker
(1699-1699)src/instantsend/instantsend.cpp (2)
WorkThreadMain
(933-970)WorkThreadMain
(933-933)src/instantsend/instantsend.h (2)
InterruptWorkerThread
(117-117)NotifyWorker
(146-149)
src/llmq/signing_shares.cpp (5)
src/llmq/signing.h (1)
NotifyWorker
(252-252)src/instantsend/instantsend.h (1)
NotifyWorker
(146-149)src/util/time.h (1)
now
(21-55)src/llmq/signing.cpp (5)
WorkThreadMain
(826-856)WorkThreadMain
(826-826)Cleanup
(651-665)Cleanup
(651-651)l
(835-835)src/instantsend/instantsend.cpp (3)
WorkThreadMain
(933-970)WorkThreadMain
(933-933)l
(965-965)
src/instantsend/instantsend.h (5)
src/instantsend/lock.h (1)
instantsend
(18-41)src/llmq/signing.cpp (2)
InterruptWorkerThread
(820-824)InterruptWorkerThread
(820-820)src/llmq/signing_shares.cpp (4)
InterruptWorkerThread
(215-220)InterruptWorkerThread
(215-215)NotifyWorker
(1699-1704)NotifyWorker
(1699-1699)src/instantsend/instantsend.cpp (4)
ProcessPendingInstantSendLocks
(170-235)ProcessPendingInstantSendLocks
(170-170)ProcessPendingInstantSendLocks
(237-347)ProcessPendingInstantSendLocks
(237-240)src/llmq/signing.h (1)
NotifyWorker
(252-252)
src/instantsend/instantsend.cpp (2)
src/llmq/signing.h (1)
NotifyWorker
(252-252)src/instantsend/instantsend.h (1)
NotifyWorker
(146-149)
🪛 GitHub Actions: Clang Diff Format Check
src/llmq/signing.cpp
[error] 1-1: Clang-format differences detected in this file. Run the formatting script to fix: clang-format-diff.py or clang-format -i.
src/llmq/signing_shares.h
[error] 1-1: Clang-format differences detected in this file. Run the formatting script to fix: clang-format-diff.py or clang-format -i.
src/llmq/signing.h
[error] 1-1: Clang-format differences detected in this file. Run the formatting script to fix: clang-format-diff.py or clang-format -i.
src/llmq/signing_shares.cpp
[error] 1-1: Clang-format differences detected in this file. Run the formatting script to fix: clang-format-diff.py or clang-format -i.
src/instantsend/instantsend.h
[error] 1-1: Clang-format differences detected in this file. Run the formatting script to fix: clang-format-diff.py or clang-format -i.
src/instantsend/instantsend.cpp
[error] 1-1: Clang-format differences detected in this file. Run the formatting script to fix: clang-format-diff.py or clang-format -i.
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (9)
- GitHub Check: linux64_nowallet-build / Build source
- GitHub Check: mac-build / Build source
- GitHub Check: linux64_tsan-build / Build source
- GitHub Check: arm-linux-build / Build source
- GitHub Check: linux64_ubsan-build / Build source
- GitHub Check: linux64_sqlite-build / Build source
- GitHub Check: linux64_fuzz-build / Build source
- GitHub Check: linux64-build / Build source
- GitHub Check: win64-build / Build source
🔇 Additional comments (10)
src/llmq/signing_shares.h (2)
358-360
: Good move to steady_clock for retry scheduling.Switching CSignedSession::nextAttemptTime to std::chrono::steady_clock::time_point avoids system-time jumps and fixes mocked-time pitfalls raised earlier.
383-387
: Worker wake primitives and lock annotations look correct.
- workMutex/workCv/workEpoch pattern is consistent with other modules.
- lastCleanupTimeSteady supports deadline waits without mixing clocks.
- EXCLUSIVE_LOCKS_REQUIRED(!cs_pendingSigns) annotation matches usage in the worker.
Please run clang-format as CI flags style diffs for this file.Also applies to: 418-418, 495-499
src/llmq/signing.cpp (1)
433-435
: Event-driven wakeups are wired correctly.NotifyWorker on enqueue and notify_all on interrupt plus steady-clock cleanup deadlines are all consistent with the new model.
Please run clang-format; CI reports diffs for this file.Also applies to: 819-824, 828-855, 664-665
src/llmq/signing.h (1)
239-253
: Wake primitives and inline NotifyWorker look good.API is cohesive with the worker loop and other modules. No issues.
Run clang-format; CI flagged style diffs here too.src/instantsend/instantsend.h (1)
39-43
: IS pending container refactor + wakeups look solid.
- PendingISLockFromPeer clarifies ownership and reduces pair juggling.
- Worker wake plumbing mirrors other modules and should reduce latency.
Please run clang-format to fix the CI failure on this header.Also applies to: 70-76, 117-118, 144-149, 79-82
src/llmq/signing_shares.cpp (4)
1029-1047
: Good: steady_clock for nextAttemptTime eliminates mocked-time mixing.Using steady_clock now for curTime/nextAttemptTime fixes the “mocked vs system time” issue noted earlier.
229-305
: NotifyWorker placement is appropriate; avoids self-wake from worker.Wakes are issued on inbound work and state changes; avoided within worker-owned paths. Looks correct.
Also applies to: 746-769, 1663-1688, 1695-1704
215-220
: Worker interrupt and wait-until deadline logic look right.notify_all on interrupt plus steady deadlines for cleanup/retry are correct.
Also applies to: 1503-1552
1432-1434
: Style: run clang-format.CI reports formatting diffs for this file.
src/instantsend/instantsend.cpp (1)
966-968
: InterruptWorkerThread properly notifies the condition variable—no indefinite wait risk.The implementation at
src/instantsend/instantsend.h:117
calls bothworkInterrupt()
(setting the flag) andworkCv.notify_all()
, so the wait condition at lines 966-968 will always be satisfied. The thread cannot wait indefinitely during shutdown because the lambda checksbool(workInterrupt)
, which becomes true whenInterruptWorkerThread()
is called, triggering the condition variable notification.
if (!fMoreWork && !workInterrupt.sleep_for(std::chrono::milliseconds(100))) { | ||
return; | ||
} | ||
if (fMoreWork) continue; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Critical: Busy-loop causes 100% CPU utilization.
When fMoreWork
is true, the thread immediately continues without any throttling, causing a busy-loop. This explains the reported CPU increase from 5-10% to 100% in the PR comments.
Under sustained load with >32 pending locks, ProcessPendingInstantSendLocks
will consistently return m_pending_work=true
(line 192), causing continuous spinning.
Apply this diff to add throttling even when there's more work:
return more_work;
}();
- if (fMoreWork) continue;
std::unique_lock<Mutex> l(workMutex);
- workCv.wait(l, [this, startEpoch]{
+ workCv.wait_for(l, std::chrono::milliseconds(fMoreWork ? 1 : 100), [this, startEpoch]{
return bool(workInterrupt) || workEpoch.load(std::memory_order_acquire) != startEpoch;
});
This change uses a 1ms timeout when there's more work (allowing rapid processing) and 100ms when idle (reducing unnecessary wakeups).
Committable suggestion skipped: line range outside the PR's diff.
🤖 Prompt for AI Agents
In src/instantsend/instantsend.cpp around line 964, the loop currently continues
immediately when fMoreWork is true causing a busy-spin and 100% CPU; modify the
control flow so that instead of an immediate continue you call a short
sleep/condition wait (e.g., 1ms) when fMoreWork is true to allow rapid
processing without spinning, and use a longer sleep/condition wait (e.g., 100ms)
when idle to reduce wakeups; implement this by replacing the unconditional
continue with a timed wait or SleepFor that uses 1ms if fMoreWork is set,
otherwise 100ms, preserving existing wakeup semantics.
// Only report more work if we processed a full batch, indicating there's likely more | ||
// work from the original collection. Don't check queues for work added by listeners | ||
// during processing, as that would cause busy-wait with epoch-based wake conditions. | ||
return recSigsByNode.size() >= nMaxBatchSize; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Incorrect “more work” predicate uses node-count, not work-count.
Use the processed message count (e.g., verifyCount) to decide if batches likely remain; recSigsByNode.size() counts nodes, not total recsigs, and can underreport.
Apply this diff:
- // Only report more work if we processed a full batch, indicating there's likely more
- // work from the original collection. Don't check queues for work added by listeners
- // during processing, as that would cause busy-wait with epoch-based wake conditions.
- return recSigsByNode.size() >= nMaxBatchSize;
+ // Only report more work if we processed a full batch of recovered sigs.
+ return verifyCount >= nMaxBatchSize;
🤖 Prompt for AI Agents
In src/llmq/signing.cpp around lines 580 to 584, the predicate for "more work"
currently compares recSigsByNode.size() (node count) to nMaxBatchSize which
underreports work; change it to use the processed message count (e.g.,
verifyCount) or compute the total number of recsigs processed (sum of sizes of
vectors in recSigsByNode) and compare that total to nMaxBatchSize so the
function returns true when the actual processed-work count >= nMaxBatchSize.
Replaced the previous implementation of the GetOrAdd method with a more efficient approach using emplace. This change simplifies the logic by directly attempting to insert a new entry into the internal map, improving performance and readability while maintaining the same functionality.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
🧹 Nitpick comments (1)
src/llmq/signing_shares.h (1)
213-216
: Inline comments improve clarity.The added comments accurately describe the two-level map access pattern, which aids maintainability.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
src/llmq/signing_shares.h
(6 hunks)
🧰 Additional context used
📓 Path-based instructions (1)
src/**/*.{cpp,h,cc,cxx,hpp}
📄 CodeRabbit inference engine (CLAUDE.md)
src/**/*.{cpp,h,cc,cxx,hpp}
: Dash Core C++ codebase must be written in C++20 and require at least Clang 16 or GCC 11.1
Dash uses unordered_lru_cache for efficient caching with LRU eviction
Files:
src/llmq/signing_shares.h
🧬 Code graph analysis (1)
src/llmq/signing_shares.h (1)
src/llmq/signing_shares.cpp (6)
SignPendingSigShares
(1563-1587)SignPendingSigShares
(1563-1563)WorkThreadMain
(1500-1553)WorkThreadMain
(1500-1500)NotifyWorker
(1699-1704)NotifyWorker
(1699-1699)
🪛 GitHub Actions: Clang Diff Format Check
src/llmq/signing_shares.h
[error] 1-1: Clang-format-diff detected formatting changes in src/llmq/signing_shares.h (before vs after formatting).
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (9)
- GitHub Check: mac-build / Build source
- GitHub Check: win64-build / Build source
- GitHub Check: linux64_nowallet-build / Build source
- GitHub Check: linux64_ubsan-build / Build source
- GitHub Check: linux64_sqlite-build / Build source
- GitHub Check: linux64_fuzz-build / Build source
- GitHub Check: linux64-build / Build source
- GitHub Check: linux64_tsan-build / Build source
- GitHub Check: arm-linux-build / Build source
🔇 Additional comments (5)
src/llmq/signing_shares.h (5)
29-29
: LGTM!The
<condition_variable>
include is required for the newstd::condition_variable_any
member added at line 382.
355-355
: LGTM! Migration tosteady_clock
improves robustness.Using
std::chrono::steady_clock::time_point
instead ofint64_t
provides monotonic timing unaffected by system clock adjustments, which is essential for retry scheduling.
380-383
: LGTM! Event-driven synchronization is correctly implemented.The combination of
workMutex
,workCv
, and atomicworkEpoch
implements a robust wake-up mechanism:
- The epoch counter prevents spurious wakeups
condition_variable_any
is appropriate sinceMutex
may not bestd::mutex
- Memory ordering (acq_rel for increment, acquire for load) establishes proper happens-before relationships
414-414
: LGTM! Consistent with the steady_clock migration.Using
steady_clock::time_point
for cleanup timing aligns with the other time-tracking changes and ensures monotonic behavior for periodic task scheduling.
495-495
: LGTM! Public notification API enables event-driven architecture.The
NotifyWorker()
method allows external threads to wake the worker immediately when new work arrives, replacing the previous polling-based approach. The public visibility is correct since this must be called from outside the class.
void SignPendingSigShares(const CConnman& connman, PeerManager& peerman) | ||
EXCLUSIVE_LOCKS_REQUIRED(!cs_pendingSigns); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Address the clang-format violation.
The pipeline detected formatting issues in this file. Please run clang-format to fix the formatting.
#!/bin/bash
# Run clang-format on the file to show the expected formatting
clang-format src/llmq/signing_shares.h
🤖 Prompt for AI Agents
In src/llmq/signing_shares.h around lines 492-493 the function declaration line
violates the project's clang-format style; run clang-format on this file (e.g.
clang-format src/llmq/signing_shares.h) or reformat the line so it matches the
surrounding style (adjust spacing/line breaks so the declaration and
EXCLUSIVE_LOCKS_REQUIRED annotation follow the project's formatting rules) and
commit the updated file.
} | ||
|
||
auto curTime = GetTime<std::chrono::milliseconds>().count(); | ||
auto curTime = std::chrono::steady_clock::now(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it's conceptually wrong to switch from mockable time to steady_clock here. Session expiration, cleanup, re-sending sig shares to recovery members - all of that is business logic, they must either all use mockable time (to work in sync in tests) or none of them.
Consider reverting back to mockable time and either dropping the extra condition to calculate next_deadline
based on nextAttemptTime
completely or implementing it differently e.g. smth like 25e53e5 maybe.
filteredPend.push_back(std::move(p)); | ||
} | ||
} | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
linter complains about extra whitespaces here
Issue being fixed or feature implemented
InstantSend / LLMQ signing is currently wait / loop driven. This PR converts this to notification driven. This allows us to immediately begin working on new signing or other operations instead of waiting. During very high load, this is unlikely to significantly affect throughput, however it will affect best case latency during low throughput.
📊 Mempool Performance Improvement (15 nodes local functional test measuring latency via ZMQ from mempool notification to islock / recsig (should reflect input lock time) notification)
Currently on main net, we see on average ~1.5-2 second time to lock transactions. Based on the 250ms average reduction locally, this implies that we may see 1.25 - 1.75 second average time to lock with this. Although, the reduction in latency may be more or less with significantly larger quorums.
Checklist:
Go over all the following points, and put an
x
in all the boxes that apply.