Skip to content

Conversation

PastaPastaPasta
Copy link
Member

Issue being fixed or feature implemented

InstantSend / LLMQ signing is currently wait / loop driven. This PR converts this to notification driven. This allows us to immediately begin working on new signing or other operations instead of waiting. During very high load, this is unlikely to significantly affect throughput, however it will affect best case latency during low throughput.

📊 Mempool Performance Improvement (15 nodes local functional test measuring latency via ZMQ from mempool notification to islock / recsig (should reflect input lock time) notification)

Metric Before (avg of 2 runs) After (avg of 2 runs) Improvement (%)
lock - min 307.26 ms 53.30 ms −82.7%
lock - p50 537.28 ms 235.15 ms −56.2%
lock - p90 718.17 ms 388.75 ms −45.9%
lock - p99 813.85 ms 443.19 ms −45.6%
lock - max 846.09 ms 561.26 ms −33.6%
recsig - min 111.26 ms 46.55 ms −58.2%
recsig - p50 367.82 ms 145.53 ms −60.4%
recsig - p90 491.30 ms 246.39 ms −49.9%
recsig - p99 562.76 ms 313.32 ms −44.3%
recsig - max 603.32 ms 359.99 ms −40.3%

Currently on main net, we see on average ~1.5-2 second time to lock transactions. Based on the 250ms average reduction locally, this implies that we may see 1.25 - 1.75 second average time to lock with this. Although, the reduction in latency may be more or less with significantly larger quorums.

Checklist:

Go over all the following points, and put an x in all the boxes that apply.

  • I have performed a self-review of my own code
  • I have commented my code, particularly in hard-to-understand areas
  • I have added or updated relevant unit/integration/functional/e2e tests
  • I have made corresponding changes to the documentation
  • I have assigned this pull request to a milestone (for repository code-owners and collaborators only)

…nsiveness

Adjusted the timing intervals in the WorkThreadMain function of CSigSharesManager to 10 milliseconds for both message sending and work interruption checks. This change enhances the responsiveness of the signing shares manager by allowing more frequent processing of pending signature shares and message sending.

This results in ~33% latency improvements in a contrived local latency functional test / benchmark from ~500ms to ~333ms
… CSigSharesManager

Added a NotifyWorker function to both CInstantSendManager and CSigSharesManager to signal the worker thread when new work is available. This change improves the responsiveness of the worker threads by allowing them to wake up promptly when there are pending tasks, thus enhancing overall performance and reducing latency in processing instant send locks and signature shares.
Copy link

github-actions bot commented Oct 15, 2025

✅ No Merge Conflicts Detected

This PR currently has no conflicts with other open PRs.

Copy link

coderabbitai bot commented Oct 15, 2025

Walkthrough

Introduces epoch/condition-variable wake-up mechanisms across InstantSend, LLMQ signing, and sigshares: adds workMutex, workCv, workEpoch, and a NotifyWorker() helper; InterruptWorkerThread now notifies the CV. InstantSend replaces pair-based pending entries with instantsend::PendingISLockFromPeer (fields node_id, islock) and updates all insertion/lookup/removal call sites. Many state-mutating methods now call NotifyWorker(). Worker threads were refactored from sleep-based polling to epoch-aware condition-variable waits with deadline-based periodic wakeups and steady_clock timing for cleanup/retry scheduling.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Pre-merge checks and finishing touches

❌ Failed checks (1 warning)
Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%. You can run @coderabbitai generate docstrings to improve docstring coverage.
✅ Passed checks (2 passed)
Check name Status Explanation
Title Check ✅ Passed The PR title "perf: improve islock / llmq signing latency" directly and clearly summarizes the main objective of the changeset. The raw summary confirms that the changes convert InstantSend/LLMQ signing from a wait/loop-driven model to a notification-driven model to improve responsiveness and latency. The title is concise, specific, and appropriately highlights the primary change from the developer's perspective without unnecessary noise.
Description Check ✅ Passed The PR description is directly related to the changeset and accurately describes the conversion from wait/loop-driven to notification-driven architecture. It explains the motivation (allowing immediate work instead of waiting), expected impact (improved best-case latency), and provides performance benchmarks from local testing showing significant latency reductions across all metrics. The description aligns well with the technical changes detailed in the raw summary, such as the addition of NotifyWorker() calls, condition variables, and epoch-based synchronization across multiple files.
✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (11)
src/llmq/signing.cpp (3)

434-436: Move NotifyWorker() outside cs_pending to avoid waking into lock contention

You notify while still holding cs_pending. Wake-ups can immediately contend on the same lock the worker needs. Release the lock before notifying.

Apply:

-    LOCK(cs_pending);
-    if (pendingReconstructedRecoveredSigs.count(recoveredSig->GetHash())) {
+    {
+        LOCK(cs_pending);
+        if (pendingReconstructedRecoveredSigs.count(recoveredSig->GetHash())) {
             // no need to perform full verification
             LogPrint(BCLog::LLMQ, "CSigningManager::%s -- already pending reconstructed sig, signHash=%s, id=%s, msgHash=%s, node=%d\n", __func__,
                      recoveredSig->buildSignHash().ToString(), recoveredSig->getId().ToString(), recoveredSig->getMsgHash().ToString(), from);
-        return ret;
-    }
-
-    pendingRecoveredSigs[from].emplace_back(recoveredSig);
-    workEpoch.fetch_add(1, std::memory_order_acq_rel);
-    NotifyWorker();
+            return ret;
+        }
+        pendingRecoveredSigs[from].emplace_back(recoveredSig);
+    }
+    workEpoch.fetch_add(1, std::memory_order_release);
+    NotifyWorker();

651-653: Also notify after releasing cs_pending in PushReconstructedRecoveredSig

Same contention pattern; release cs_pending before bumping epoch/notify.

-    LOCK(cs_pending);
-    pendingReconstructedRecoveredSigs.emplace(std::piecewise_construct, std::forward_as_tuple(recoveredSig->GetHash()), std::forward_as_tuple(recoveredSig));
-    workEpoch.fetch_add(1, std::memory_order_acq_rel);
-    NotifyWorker();
+    {
+        LOCK(cs_pending);
+        pendingReconstructedRecoveredSigs.emplace(std::piecewise_construct, std::forward_as_tuple(recoveredSig->GetHash()), std::forward_as_tuple(recoveredSig));
+    }
+    workEpoch.fetch_add(1, std::memory_order_release);
+    NotifyWorker();

837-863: Epoch-based wait + deadline is solid; optional steady_clock clean-up

The wait logic is correct. Minor optional: store lastCleanupTime as a steady_clock timestamp (or compute deadline fully in steady_clock) to avoid system clock jumps influencing the cadence.

src/instantsend/instantsend.h (1)

66-70: LGTM; minor nits

  • Adding workMutex/workCv/workEpoch and waking on interrupt is correct.
  • Optional: mark NotifyWorker() noexcept and use memory_order_release for fetch_add (loads already use acquire).

Also applies to: 112-112, 140-143

src/instantsend/instantsend.cpp (2)

164-167: Notify after releasing locks to avoid waking into contention

These NotifyWorker() calls run while cs_pendingLocks (ProcessMessage), cs_nonLocked (RemoveNonLockedTx), or cs_pendingLocks (TryEmplacePendingLock) are still held. Wake-ups can contend immediately on the same locks.

Move notifications after the lock scopes.

@@
-    LOCK(cs_pendingLocks);
-    pendingInstantSendLocks.emplace(hash, std::make_pair(from, islock));
-    NotifyWorker();
+    {
+        LOCK(cs_pendingLocks);
+        pendingInstantSendLocks.emplace(hash, std::make_pair(from, islock));
+    }
+    NotifyWorker();
@@
-    LogPrint(BCLog::INSTANTSEND, "CInstantSendManager::%s -- txid=%s, pindexMined=%s\n", __func__,
-             tx->GetHash().ToString(), pindexMined ? pindexMined->GetBlockHash().ToString() : "");
-    NotifyWorker();
+    LogPrint(BCLog::INSTANTSEND, "CInstantSendManager::%s -- txid=%s, pindexMined=%s\n", __func__,
+             tx->GetHash().ToString(), pindexMined ? pindexMined->GetBlockHash().ToString() : "");
+    NotifyWorker();
@@
-    LOCK(cs_nonLocked);
+    LOCK(cs_nonLocked);
     ...
-    LogPrint(BCLog::INSTANTSEND, "CInstantSendManager::%s -- txid=%s, retryChildren=%d, retryChildrenCount=%d\n",
-             __func__, txid.ToString(), retryChildren, retryChildrenCount);
-    NotifyWorker();
+    LogPrint(BCLog::INSTANTSEND, "CInstantSendManager::%s -- txid=%s, retryChildren=%d, retryChildrenCount=%d\n",
+             __func__, txid.ToString(), retryChildren, retryChildrenCount);
+    // cs_nonLocked is released here (end of function scope) in current layout;
+    // if kept, wrap the critical section in its own block and notify after.
+    NotifyWorker();
@@
-    LOCK(cs_pendingLocks);
-    if (!pendingInstantSendLocks.count(hash)) {
-        pendingInstantSendLocks.emplace(hash, std::make_pair(id, islock));
-    }
-    NotifyWorker();
+    {
+        LOCK(cs_pendingLocks);
+        if (!pendingInstantSendLocks.count(hash)) {
+            pendingInstantSendLocks.emplace(hash, std::make_pair(id, islock));
+        }
+    }
+    NotifyWorker();

Also applies to: 561-562, 602-603, 633-634


934-967: Epoch-based wait loop is correct; optional deadline for periodic work

Looping while there’s pending work and otherwise waiting on epoch/interrupt is good. If InstantSend ever needs periodic timeouts without external triggers, consider a bounded wait (like signing.cpp’s next-deadline) to guarantee periodic wakeups.

Please confirm there are no time-based maintenance tasks in InstantSend that require periodic wakeups.

src/llmq/signing.h (1)

241-246: Worker wake-up API is consistent; minor polish

  • Adding workMutex/workCv/workEpoch and exposing NotifyWorker is consistent with usage in signing.cpp.
  • Optional: declare NotifyWorker noexcept and consider memory_order_release on fetch_add for clarity.

Also applies to: 251-251

src/llmq/signing_shares.cpp (4)

1502-1556: Worker loop: event-driven with deadlines; consider two small tweaks

The epoch/CV wait with deadline calculation is sound. Two optional improvements:

  • If no deadline is computed, prefer an indefinite wait (with predicate) over 10ms polling to reduce wakeups.
  • Use a single time source for recovery timing (either util::GetTimestd::chrono::milliseconds() everywhere or TicksSinceEpoch) for consistency.

Example for the first point:

-        if (next_deadline == std::chrono::steady_clock::time_point::max()) {
-            workCv.wait_for(l, std::chrono::milliseconds(10), [this, startEpoch]{
-                return bool(workInterrupt) || workEpoch.load(std::memory_order_acquire) != startEpoch;
-            });
-        } else {
+        if (next_deadline == std::chrono::steady_clock::time_point::max()) {
+            workCv.wait(l, [this, startEpoch]{
+                return bool(workInterrupt) || workEpoch.load(std::memory_order_acquire) != startEpoch;
+            });
+        } else {
             workCv.wait_until(l, next_deadline, [this, startEpoch]{
                 return bool(workInterrupt) || workEpoch.load(std::memory_order_acquire) != startEpoch;
             });
-        }
+        }

1563-1565: Notify outside cs_pendingSigns to minimize contention

Notify while not holding cs_pendingSigns, so the worker can grab it immediately.

 void CSigSharesManager::AsyncSign(const CQuorumCPtr& quorum, const uint256& id, const uint256& msgHash)
 {
-    LOCK(cs_pendingSigns);
-    pendingSigns.emplace_back(quorum, id, msgHash);
-    // Wake worker to handle new pending sign immediately
-    NotifyWorker();
+    {
+        LOCK(cs_pendingSigns);
+        pendingSigns.emplace_back(quorum, id, msgHash);
+    }
+    // Wake worker to handle new pending sign immediately
+    NotifyWorker();
 }

1690-1692: Release cs before notifying

Avoid notifying while holding cs; release first to reduce contention.

 void CSigSharesManager::ForceReAnnouncement(const CQuorumCPtr& quorum, Consensus::LLMQType llmqType, const uint256& id, const uint256& msgHash)
 {
     if (IsAllMembersConnectedEnabled(llmqType, m_sporkman)) {
         return;
     }
-
-    LOCK(cs);
-    auto signHash = SignHash(llmqType, quorum->qc->quorumHash, id, msgHash).Get();
-    if (const auto *const sigs = sigShares.GetAllForSignHash(signHash)) {
-        for (const auto& [quorumMemberIndex, _] : *sigs) {
-            // re-announce every sigshare to every node
-            sigSharesQueuedToAnnounce.Add(std::make_pair(signHash, quorumMemberIndex), true);
-        }
-    }
-    for (auto& [_, nodeState] : nodeStates) {
-        auto* session = nodeState.GetSessionBySignHash(signHash);
-        if (session == nullptr) {
-            continue;
-        }
-        session->knows.SetAll(false);
-        session->sendSessionId = UNINITIALIZED_SESSION_ID;
-    }
-    // Wake worker so announcements are sent promptly
-    NotifyWorker();
+    {
+        LOCK(cs);
+        auto signHash = SignHash(llmqType, quorum->qc->quorumHash, id, msgHash).Get();
+        if (const auto *const sigs = sigShares.GetAllForSignHash(signHash)) {
+            for (const auto& [quorumMemberIndex, _] : *sigs) {
+                sigSharesQueuedToAnnounce.Add(std::make_pair(signHash, quorumMemberIndex), true);
+            }
+        }
+        for (auto& [_, nodeState] : nodeStates) {
+            auto* session = nodeState.GetSessionBySignHash(signHash);
+            if (session == nullptr) continue;
+            session->knows.SetAll(false);
+            session->sendSessionId = UNINITIALIZED_SESSION_ID;
+        }
+    }
+    // Wake worker so announcements are sent promptly
+    NotifyWorker();
 }

1698-1701: Release cs before notifying

Same rationale as above; notify after releasing cs.

 MessageProcessingResult CSigSharesManager::HandleNewRecoveredSig(const llmq::CRecoveredSig& recoveredSig)
 {
-    LOCK(cs);
-    RemoveSigSharesForSession(recoveredSig.buildSignHash().Get());
-    // Cleaning up a session can free resources; wake worker to proceed
-    NotifyWorker();
+    {
+        LOCK(cs);
+        RemoveSigSharesForSession(recoveredSig.buildSignHash().Get());
+    }
+    // Cleaning up a session can free resources; wake worker to proceed
+    NotifyWorker();
     return {};
 }
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between f170aed and 2f03126.

📒 Files selected for processing (6)
  • src/instantsend/instantsend.cpp (11 hunks)
  • src/instantsend/instantsend.h (3 hunks)
  • src/llmq/signing.cpp (5 hunks)
  • src/llmq/signing.h (2 hunks)
  • src/llmq/signing_shares.cpp (8 hunks)
  • src/llmq/signing_shares.h (3 hunks)
🧰 Additional context used
📓 Path-based instructions (1)
src/**/*.{cpp,h,cc,cxx,hpp}

📄 CodeRabbit inference engine (CLAUDE.md)

src/**/*.{cpp,h,cc,cxx,hpp}: Dash Core C++ codebase must be written in C++20 and require at least Clang 16 or GCC 11.1
Dash uses unordered_lru_cache for efficient caching with LRU eviction

Files:

  • src/llmq/signing.cpp
  • src/llmq/signing.h
  • src/instantsend/instantsend.h
  • src/llmq/signing_shares.h
  • src/instantsend/instantsend.cpp
  • src/llmq/signing_shares.cpp
🔇 Additional comments (11)
src/llmq/signing.cpp (3)

526-533: Early-continue on pending queues is correct

Returning true when either queue still has items prevents unnecessary sleeping. Good.


585-592: Good: report more work when queues non-empty or batch threshold hit

This avoids idle sleeps under load. Looks correct.


831-832: Interrupt path properly wakes the worker

Calling workCv.notify_all() on interrupt is appropriate. LGTM.

src/instantsend/instantsend.cpp (1)

462-463: LGTM: timely worker wake-ups at key state transitions

Waking after mempool adds/removals, block connects/disconnects, ChainLock notifications, and tip updates looks correct.

Also applies to: 480-481, 511-512, 518-519, 639-640, 657-658

src/llmq/signing_shares.h (1)

383-387: Worker wake-up primitives look good; keep annotations or document; verify implementation exists

  • New workMutex/workCv/workEpoch and NotifyWorker are appropriate.
  • You removed thread-safety annotations from SignPendingSigShares/WorkThreadMain. Consider restoring them (or documenting internal locking) to keep static checks useful.
  • Ensure CSigSharesManager::NotifyWorker is defined in the .cpp and used at all state-mutating sites (message handlers, queue inserts, timers).

Run to verify definition and usages:

Also applies to: 494-496, 497-497

src/llmq/signing_shares.cpp (6)

218-220: Prompt shutdown wakeup is correct

Notifying the worker during interrupt ensures timely exit. Looks good.


301-304: Wake worker after inbound message processing

Good call to notify; inbound messages can create immediate work.


735-736: Batching announce + deferred notify is solid

Queuing under cs and notifying after unlock minimizes contention and spurious wakeups. Nice.

Also applies to: 744-745, 766-769


1496-1497: Notify after ban to reroute work

Waking the worker after marking a node banned is appropriate.


1589-1591: Post-sign notify to progress announcements/recovery

Good to wake after enqueuing new work.


1703-1709: Epoch + CV notify is appropriate for lossless wakeups

Atomic increment plus notify_one is correct for a single worker.

}
if (fMoreWork) continue;
std::unique_lock<Mutex> l(workMutex);
workCv.wait(l, [this, startEpoch]{
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is there any chance that event could be missing and thread will wait forever? For example, when app is terminating.

…nager

Adjusted the formatting of the NotifyWorker function in CInstantSendManager for improved readability. This change includes consistent spacing and indentation, enhancing code maintainability without altering functionality.
…ingSigShares

Modified the WorkThreadMain function in signing.h and SignPendingSigShares in signing_shares.h to specify exclusive lock requirements. This change enhances thread safety by ensuring proper locking mechanisms are in place, preventing potential race conditions during concurrent operations.
@PastaPastaPasta PastaPastaPasta force-pushed the perf/improve-islock-llmq-signing-latency branch from dda0aef to 708fa0d Compare October 16, 2025 15:34
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (2)
src/llmq/signing_shares.h (2)

383-387: Concurrency primitives look right; please document lock ordering

  • Using Mutex + condition_variable_any is appropriate.
  • To avoid deadlocks, define/order: never block on workCv while holding cs or cs_pendingSigns; avoid taking workMutex while holding cs/cs_pendingSigns (or document a single consistent order if needed).

496-498: Restore thread-safety annotation on WorkThreadMain; mark NotifyWorker noexcept

  • The removed EXCLUSIVE_LOCKS_REQUIRED(!cs_pendingSigns) weakens static analysis. If WorkThreadMain still must not be called with cs_pendingSigns held, keep the attribute; otherwise use LOCKS_EXCLUDED(cs_pendingSigns).
  • NotifyWorker should be noexcept (only increments and notifies).

Apply this diff:

-    void WorkThreadMain(CConnman& connman, PeerManager& peerman);
-    void NotifyWorker();
+    void WorkThreadMain(CConnman& connman, PeerManager& peerman)
+        EXCLUSIVE_LOCKS_REQUIRED(!cs_pendingSigns);
+    void NotifyWorker() noexcept;
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between dda0aef and 708fa0d.

📒 Files selected for processing (2)
  • src/llmq/signing.h (2 hunks)
  • src/llmq/signing_shares.h (3 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
  • src/llmq/signing.h
🧰 Additional context used
📓 Path-based instructions (1)
src/**/*.{cpp,h,cc,cxx,hpp}

📄 CodeRabbit inference engine (CLAUDE.md)

src/**/*.{cpp,h,cc,cxx,hpp}: Dash Core C++ codebase must be written in C++20 and require at least Clang 16 or GCC 11.1
Dash uses unordered_lru_cache for efficient caching with LRU eviction

Files:

  • src/llmq/signing_shares.h
🔇 Additional comments (2)
src/llmq/signing_shares.h (2)

29-29: Include OK for condition_variable_any

Correct header for std::condition_variable_any with custom Mutex.


494-495: Good: explicit “not-held” requirement

EXCLUSIVE_LOCKS_REQUIRED(!cs_pendingSigns) on SignPendingSigShares matches AsyncSign and prevents calls under that lock.

…hares.h

Added exclusive lock requirements to the WorkThreadMain function in signing_shares.h to enhance thread safety. This change ensures that the function operates correctly under concurrent conditions, preventing potential race conditions.
@UdjinM6
Copy link

UdjinM6 commented Oct 17, 2025

running feature_llmq_signing.py, monitoring CPU levels:
develop: 5-10%
this PR: 100%

Copy link

@UdjinM6 UdjinM6 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

119b81f should fix CPU issue but there is more, pls see below.

SendMessages(connman);
lastSendTime = TicksSinceEpoch<std::chrono::milliseconds>(SystemClock::now());
}
bool didSend = SendMessages(connman);
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we had a cooldown for SendMessages() for a reason but I don't remember why exactly 🙈 All qsig* messages contain vectors, so maybe waiting a bit allowed us to utilise them better. Or maybe we simply did not want to push messages too often. Anyway, we need to confirm that there is no regressing in this part.

…antSendManager

Updated the CInstantSendManager to use a new struct, PendingISLockFromPeer, for better clarity and type safety. This change replaces the use of std::pair for storing node ID and InstantSendLockPtr, enhancing code readability and maintainability across multiple functions handling instant send locks.
The 'pend' local variable in ProcessPendingInstantSendLocks was previously
using the same data structure as pendingInstantSendLocks (a hash map).
However, once we're in the processing step, we only iterate sequentially
through the locks - there are no hash-based lookups.

This commit changes 'pend' to use std::vector for better performance:
- Improved cache locality with contiguous memory layout
- Better CPU prefetching during iteration (3x through the data)
- Eliminates hash map overhead (bucket allocation, pointer chasing)
- Filtering step uses build-new-vector approach to maintain O(n)

The typical case processes up to 32 locks, making the vector's sequential
access pattern ideal for modern CPU cache hierarchies.
Fixes time-mixing bugs where mocked time (controllable in tests) was being
used to compute steady_clock deadlines. Since mocked time and system time
move independently, this caused incorrect wait behavior in tests.

Changes:
- Use steady_clock::time_point for all wait deadlines (nextAttemptTime, lastCleanupTimeSteady)
- Keep mocked time (GetTime<>()) for business logic only (timeouts, session tracking)
- Remove redundant workEpoch increments (NotifyWorker already does this)
- Move NotifyWorker() calls to individual message handlers for better control

This ensures that:
1. In production: steady_clock provides monotonic, reliable timing
2. In tests: mocked time controls business logic while steady_clock handles waits
3. No double-incrementing of workEpoch that could cause busy-wait issues
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
src/instantsend/instantsend.cpp (1)

1-1: Fix clang-format differences.

The pipeline reports clang-format differences in this file. Please run the formatting script to ensure code style consistency.

Run the following command to fix formatting:

clang-format -i src/instantsend/instantsend.cpp
♻️ Duplicate comments (1)
src/llmq/signing_shares.cpp (1)

1510-1517: Tight loop when didSend/fMoreWork → likely cause of 100% CPU. Add a micro coalescing wait.

Continuing immediately after sending or detecting more work can hot‑loop under load. Add a very short wait with the same epoch predicate to batch messages and avoid CPU peg, preserving sub‑ms latency.

-        // If there is more work or we just sent something, iterate again without waiting
-        if (fMoreWork || didSend) {
-            continue;
-        }
+        // If there is more work or we just sent something, briefly coalesce to avoid hot looping
+        if (fMoreWork || didSend) {
+            std::unique_lock<Mutex> l(workMutex);
+            workCv.wait_for(l, std::chrono::milliseconds(1), [this, startEpoch]{
+                return bool(workInterrupt) || workEpoch.load(std::memory_order_acquire) != startEpoch;
+            });
+            continue;
+        }

This also addresses the previously raised concern about removing the SendMessages cooldown.

🧹 Nitpick comments (4)
src/llmq/signing.cpp (1)

805-806: Nit: thread name mismatch.

CSigningManager worker is named "sigshares". Prefer a specific name like "recsigs" for profiling.

-    workThread = std::thread(&util::TraceThread, "sigshares", [this, &peerman] { WorkThreadMain(peerman); });
+    workThread = std::thread(&util::TraceThread, "recsigs", [this, &peerman] { WorkThreadMain(peerman); });
src/llmq/signing_shares.cpp (2)

1260-1281: Message batching cadence: validate no regression without explicit cooldown.

Given frequent immediate loops, ensure QSIG* batching still groups efficiently under typical traffic. If not, consider a small per-peer send window (1–2ms) similar to the coalescing wait above.


1472-1498: Wake on BanNode is good, but double-check request re-routing pressure.

After banning, mass re-requests can cascade. Consider limiting per-iteration re-requests to avoid bursts.

src/instantsend/instantsend.cpp (1)

222-229: Optional: Remove trailing blank line.

Line 229 contains a trailing blank line that could be removed for consistency.

Apply this diff:

             filteredPend.push_back(std::move(p));
         }
     }
-    
+
     // Now check against the previous active set and perform banning if this fails
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 5177f80 and ae2dc7a.

📒 Files selected for processing (6)
  • src/instantsend/instantsend.cpp (20 hunks)
  • src/instantsend/instantsend.h (5 hunks)
  • src/llmq/signing.cpp (6 hunks)
  • src/llmq/signing.h (3 hunks)
  • src/llmq/signing_shares.cpp (13 hunks)
  • src/llmq/signing_shares.h (5 hunks)
🧰 Additional context used
📓 Path-based instructions (1)
src/**/*.{cpp,h,cc,cxx,hpp}

📄 CodeRabbit inference engine (CLAUDE.md)

src/**/*.{cpp,h,cc,cxx,hpp}: Dash Core C++ codebase must be written in C++20 and require at least Clang 16 or GCC 11.1
Dash uses unordered_lru_cache for efficient caching with LRU eviction

Files:

  • src/llmq/signing.cpp
  • src/llmq/signing_shares.h
  • src/llmq/signing.h
  • src/llmq/signing_shares.cpp
  • src/instantsend/instantsend.h
  • src/instantsend/instantsend.cpp
🧬 Code graph analysis (6)
src/llmq/signing.cpp (3)
src/llmq/signing.h (1)
  • NotifyWorker (252-252)
src/llmq/signing_shares.cpp (7)
  • NotifyWorker (1699-1704)
  • NotifyWorker (1699-1699)
  • WorkThreadMain (1500-1553)
  • WorkThreadMain (1500-1500)
  • Cleanup (1297-1434)
  • Cleanup (1297-1297)
  • l (1541-1541)
src/instantsend/instantsend.cpp (3)
  • WorkThreadMain (933-970)
  • WorkThreadMain (933-933)
  • l (965-965)
src/llmq/signing_shares.h (1)
src/llmq/signing_shares.cpp (6)
  • SignPendingSigShares (1563-1587)
  • SignPendingSigShares (1563-1563)
  • WorkThreadMain (1500-1553)
  • WorkThreadMain (1500-1500)
  • NotifyWorker (1699-1704)
  • NotifyWorker (1699-1699)
src/llmq/signing.h (4)
src/llmq/signing.cpp (8)
  • WorkThreadMain (826-856)
  • WorkThreadMain (826-826)
  • StartWorkerThread (798-806)
  • StartWorkerThread (798-798)
  • StopWorkerThread (808-818)
  • StopWorkerThread (808-808)
  • InterruptWorkerThread (820-824)
  • InterruptWorkerThread (820-820)
src/llmq/signing_shares.cpp (10)
  • WorkThreadMain (1500-1553)
  • WorkThreadMain (1500-1500)
  • StartWorkerThread (182-191)
  • StartWorkerThread (182-182)
  • StopWorkerThread (193-203)
  • StopWorkerThread (193-193)
  • InterruptWorkerThread (215-220)
  • InterruptWorkerThread (215-215)
  • NotifyWorker (1699-1704)
  • NotifyWorker (1699-1699)
src/instantsend/instantsend.cpp (2)
  • WorkThreadMain (933-970)
  • WorkThreadMain (933-933)
src/instantsend/instantsend.h (2)
  • InterruptWorkerThread (117-117)
  • NotifyWorker (146-149)
src/llmq/signing_shares.cpp (5)
src/llmq/signing.h (1)
  • NotifyWorker (252-252)
src/instantsend/instantsend.h (1)
  • NotifyWorker (146-149)
src/util/time.h (1)
  • now (21-55)
src/llmq/signing.cpp (5)
  • WorkThreadMain (826-856)
  • WorkThreadMain (826-826)
  • Cleanup (651-665)
  • Cleanup (651-651)
  • l (835-835)
src/instantsend/instantsend.cpp (3)
  • WorkThreadMain (933-970)
  • WorkThreadMain (933-933)
  • l (965-965)
src/instantsend/instantsend.h (5)
src/instantsend/lock.h (1)
  • instantsend (18-41)
src/llmq/signing.cpp (2)
  • InterruptWorkerThread (820-824)
  • InterruptWorkerThread (820-820)
src/llmq/signing_shares.cpp (4)
  • InterruptWorkerThread (215-220)
  • InterruptWorkerThread (215-215)
  • NotifyWorker (1699-1704)
  • NotifyWorker (1699-1699)
src/instantsend/instantsend.cpp (4)
  • ProcessPendingInstantSendLocks (170-235)
  • ProcessPendingInstantSendLocks (170-170)
  • ProcessPendingInstantSendLocks (237-347)
  • ProcessPendingInstantSendLocks (237-240)
src/llmq/signing.h (1)
  • NotifyWorker (252-252)
src/instantsend/instantsend.cpp (2)
src/llmq/signing.h (1)
  • NotifyWorker (252-252)
src/instantsend/instantsend.h (1)
  • NotifyWorker (146-149)
🪛 GitHub Actions: Clang Diff Format Check
src/llmq/signing.cpp

[error] 1-1: Clang-format differences detected in this file. Run the formatting script to fix: clang-format-diff.py or clang-format -i.

src/llmq/signing_shares.h

[error] 1-1: Clang-format differences detected in this file. Run the formatting script to fix: clang-format-diff.py or clang-format -i.

src/llmq/signing.h

[error] 1-1: Clang-format differences detected in this file. Run the formatting script to fix: clang-format-diff.py or clang-format -i.

src/llmq/signing_shares.cpp

[error] 1-1: Clang-format differences detected in this file. Run the formatting script to fix: clang-format-diff.py or clang-format -i.

src/instantsend/instantsend.h

[error] 1-1: Clang-format differences detected in this file. Run the formatting script to fix: clang-format-diff.py or clang-format -i.

src/instantsend/instantsend.cpp

[error] 1-1: Clang-format differences detected in this file. Run the formatting script to fix: clang-format-diff.py or clang-format -i.

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (9)
  • GitHub Check: linux64_nowallet-build / Build source
  • GitHub Check: mac-build / Build source
  • GitHub Check: linux64_tsan-build / Build source
  • GitHub Check: arm-linux-build / Build source
  • GitHub Check: linux64_ubsan-build / Build source
  • GitHub Check: linux64_sqlite-build / Build source
  • GitHub Check: linux64_fuzz-build / Build source
  • GitHub Check: linux64-build / Build source
  • GitHub Check: win64-build / Build source
🔇 Additional comments (10)
src/llmq/signing_shares.h (2)

358-360: Good move to steady_clock for retry scheduling.

Switching CSignedSession::nextAttemptTime to std::chrono::steady_clock::time_point avoids system-time jumps and fixes mocked-time pitfalls raised earlier.


383-387: Worker wake primitives and lock annotations look correct.

  • workMutex/workCv/workEpoch pattern is consistent with other modules.
  • lastCleanupTimeSteady supports deadline waits without mixing clocks.
  • EXCLUSIVE_LOCKS_REQUIRED(!cs_pendingSigns) annotation matches usage in the worker.
    Please run clang-format as CI flags style diffs for this file.

Also applies to: 418-418, 495-499

src/llmq/signing.cpp (1)

433-435: Event-driven wakeups are wired correctly.

NotifyWorker on enqueue and notify_all on interrupt plus steady-clock cleanup deadlines are all consistent with the new model.
Please run clang-format; CI reports diffs for this file.

Also applies to: 819-824, 828-855, 664-665

src/llmq/signing.h (1)

239-253: Wake primitives and inline NotifyWorker look good.

API is cohesive with the worker loop and other modules. No issues.
Run clang-format; CI flagged style diffs here too.

src/instantsend/instantsend.h (1)

39-43: IS pending container refactor + wakeups look solid.

  • PendingISLockFromPeer clarifies ownership and reduces pair juggling.
  • Worker wake plumbing mirrors other modules and should reduce latency.
    Please run clang-format to fix the CI failure on this header.

Also applies to: 70-76, 117-118, 144-149, 79-82

src/llmq/signing_shares.cpp (4)

1029-1047: Good: steady_clock for nextAttemptTime eliminates mocked-time mixing.

Using steady_clock now for curTime/nextAttemptTime fixes the “mocked vs system time” issue noted earlier.


229-305: NotifyWorker placement is appropriate; avoids self-wake from worker.

Wakes are issued on inbound work and state changes; avoided within worker-owned paths. Looks correct.

Also applies to: 746-769, 1663-1688, 1695-1704


215-220: Worker interrupt and wait-until deadline logic look right.

notify_all on interrupt plus steady deadlines for cleanup/retry are correct.

Also applies to: 1503-1552


1432-1434: Style: run clang-format.

CI reports formatting diffs for this file.

src/instantsend/instantsend.cpp (1)

966-968: InterruptWorkerThread properly notifies the condition variable—no indefinite wait risk.

The implementation at src/instantsend/instantsend.h:117 calls both workInterrupt() (setting the flag) and workCv.notify_all(), so the wait condition at lines 966-968 will always be satisfied. The thread cannot wait indefinitely during shutdown because the lambda checks bool(workInterrupt), which becomes true when InterruptWorkerThread() is called, triggering the condition variable notification.

if (!fMoreWork && !workInterrupt.sleep_for(std::chrono::milliseconds(100))) {
return;
}
if (fMoreWork) continue;
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

Critical: Busy-loop causes 100% CPU utilization.

When fMoreWork is true, the thread immediately continues without any throttling, causing a busy-loop. This explains the reported CPU increase from 5-10% to 100% in the PR comments.

Under sustained load with >32 pending locks, ProcessPendingInstantSendLocks will consistently return m_pending_work=true (line 192), causing continuous spinning.

Apply this diff to add throttling even when there's more work:

         return more_work;
     }();
-    if (fMoreWork) continue;
     std::unique_lock<Mutex> l(workMutex);
-    workCv.wait(l, [this, startEpoch]{
+    workCv.wait_for(l, std::chrono::milliseconds(fMoreWork ? 1 : 100), [this, startEpoch]{
         return bool(workInterrupt) || workEpoch.load(std::memory_order_acquire) != startEpoch;
     });

This change uses a 1ms timeout when there's more work (allowing rapid processing) and 100ms when idle (reducing unnecessary wakeups).

Committable suggestion skipped: line range outside the PR's diff.

🤖 Prompt for AI Agents
In src/instantsend/instantsend.cpp around line 964, the loop currently continues
immediately when fMoreWork is true causing a busy-spin and 100% CPU; modify the
control flow so that instead of an immediate continue you call a short
sleep/condition wait (e.g., 1ms) when fMoreWork is true to allow rapid
processing without spinning, and use a longer sleep/condition wait (e.g., 100ms)
when idle to reduce wakeups; implement this by replacing the unconditional
continue with a timed wait or SleepFor that uses 1ms if fMoreWork is set,
otherwise 100ms, preserving existing wakeup semantics.

Comment on lines +580 to 584
// Only report more work if we processed a full batch, indicating there's likely more
// work from the original collection. Don't check queues for work added by listeners
// during processing, as that would cause busy-wait with epoch-based wake conditions.
return recSigsByNode.size() >= nMaxBatchSize;
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Incorrect “more work” predicate uses node-count, not work-count.

Use the processed message count (e.g., verifyCount) to decide if batches likely remain; recSigsByNode.size() counts nodes, not total recsigs, and can underreport.

Apply this diff:

-    // Only report more work if we processed a full batch, indicating there's likely more
-    // work from the original collection. Don't check queues for work added by listeners
-    // during processing, as that would cause busy-wait with epoch-based wake conditions.
-    return recSigsByNode.size() >= nMaxBatchSize;
+    // Only report more work if we processed a full batch of recovered sigs.
+    return verifyCount >= nMaxBatchSize;
🤖 Prompt for AI Agents
In src/llmq/signing.cpp around lines 580 to 584, the predicate for "more work"
currently compares recSigsByNode.size() (node count) to nMaxBatchSize which
underreports work; change it to use the processed message count (e.g.,
verifyCount) or compute the total number of recsigs processed (sum of sizes of
vectors in recSigsByNode) and compare that total to nMaxBatchSize so the
function returns true when the actual processed-work count >= nMaxBatchSize.

Replaced the previous implementation of the GetOrAdd method with a more efficient approach using emplace. This change simplifies the logic by directly attempting to insert a new entry into the internal map, improving performance and readability while maintaining the same functionality.
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (1)
src/llmq/signing_shares.h (1)

213-216: Inline comments improve clarity.

The added comments accurately describe the two-level map access pattern, which aids maintainability.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between ae2dc7a and a80a8df.

📒 Files selected for processing (1)
  • src/llmq/signing_shares.h (6 hunks)
🧰 Additional context used
📓 Path-based instructions (1)
src/**/*.{cpp,h,cc,cxx,hpp}

📄 CodeRabbit inference engine (CLAUDE.md)

src/**/*.{cpp,h,cc,cxx,hpp}: Dash Core C++ codebase must be written in C++20 and require at least Clang 16 or GCC 11.1
Dash uses unordered_lru_cache for efficient caching with LRU eviction

Files:

  • src/llmq/signing_shares.h
🧬 Code graph analysis (1)
src/llmq/signing_shares.h (1)
src/llmq/signing_shares.cpp (6)
  • SignPendingSigShares (1563-1587)
  • SignPendingSigShares (1563-1563)
  • WorkThreadMain (1500-1553)
  • WorkThreadMain (1500-1500)
  • NotifyWorker (1699-1704)
  • NotifyWorker (1699-1699)
🪛 GitHub Actions: Clang Diff Format Check
src/llmq/signing_shares.h

[error] 1-1: Clang-format-diff detected formatting changes in src/llmq/signing_shares.h (before vs after formatting).

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (9)
  • GitHub Check: mac-build / Build source
  • GitHub Check: win64-build / Build source
  • GitHub Check: linux64_nowallet-build / Build source
  • GitHub Check: linux64_ubsan-build / Build source
  • GitHub Check: linux64_sqlite-build / Build source
  • GitHub Check: linux64_fuzz-build / Build source
  • GitHub Check: linux64-build / Build source
  • GitHub Check: linux64_tsan-build / Build source
  • GitHub Check: arm-linux-build / Build source
🔇 Additional comments (5)
src/llmq/signing_shares.h (5)

29-29: LGTM!

The <condition_variable> include is required for the new std::condition_variable_any member added at line 382.


355-355: LGTM! Migration to steady_clock improves robustness.

Using std::chrono::steady_clock::time_point instead of int64_t provides monotonic timing unaffected by system clock adjustments, which is essential for retry scheduling.


380-383: LGTM! Event-driven synchronization is correctly implemented.

The combination of workMutex, workCv, and atomic workEpoch implements a robust wake-up mechanism:

  • The epoch counter prevents spurious wakeups
  • condition_variable_any is appropriate since Mutex may not be std::mutex
  • Memory ordering (acq_rel for increment, acquire for load) establishes proper happens-before relationships

414-414: LGTM! Consistent with the steady_clock migration.

Using steady_clock::time_point for cleanup timing aligns with the other time-tracking changes and ensures monotonic behavior for periodic task scheduling.


495-495: LGTM! Public notification API enables event-driven architecture.

The NotifyWorker() method allows external threads to wake the worker immediately when new work arrives, replacing the previous polling-based approach. The public visibility is correct since this must be called from outside the class.

Comment on lines +492 to +493
void SignPendingSigShares(const CConnman& connman, PeerManager& peerman)
EXCLUSIVE_LOCKS_REQUIRED(!cs_pendingSigns);
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Address the clang-format violation.

The pipeline detected formatting issues in this file. Please run clang-format to fix the formatting.

#!/bin/bash
# Run clang-format on the file to show the expected formatting
clang-format src/llmq/signing_shares.h
🤖 Prompt for AI Agents
In src/llmq/signing_shares.h around lines 492-493 the function declaration line
violates the project's clang-format style; run clang-format on this file (e.g.
clang-format src/llmq/signing_shares.h) or reformat the line so it matches the
surrounding style (adjust spacing/line breaks so the declaration and
EXCLUSIVE_LOCKS_REQUIRED annotation follow the project's formatting rules) and
commit the updated file.

}

auto curTime = GetTime<std::chrono::milliseconds>().count();
auto curTime = std::chrono::steady_clock::now();
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's conceptually wrong to switch from mockable time to steady_clock here. Session expiration, cleanup, re-sending sig shares to recovery members - all of that is business logic, they must either all use mockable time (to work in sync in tests) or none of them.

Consider reverting back to mockable time and either dropping the extra condition to calculate next_deadline based on nextAttemptTime completely or implementing it differently e.g. smth like 25e53e5 maybe.

filteredPend.push_back(std::move(p));
}
}

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

linter complains about extra whitespaces here

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants