-
Notifications
You must be signed in to change notification settings - Fork 209
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
solBanknStgTx00 and solBanknStgTx01 threads peg the CPU #636
Comments
I see now where the re-forwarding of packets is prevented. In LatestUnprocessedVotes::get_and_insert_forwardable_packets, only votes for which is_forwarded() returns false are included. However the flow of the loop is definitely wasteful; it does an awful lot of work re-evaluating already forwarded packets in a tight loop. I do not know the degree to which this contributes to the high CPU usage. |
Yeah looking at the vote threads, they do seem wasteful in their usage. I think we could probably make a few simple changes here to reduce CPU usage in short-term:
Long-term I think we may be able to consolidate the 2 threads into single thread, also remove the holding period before gossip votes are released to the thread. |
Above statement is incorrect - see #636 (comment) |
New scheduler doesn't change the vote threads though, so I'd not expect new scheduler to have any effect here. |
I went back and revisited my data that made me make this claim, and it looks like the time window that I sampled was "abnormal" in that it had lower utilization on these threads. The output below show thread name ( Tip-of-master with default (new scheduler)
Tip-of-master with old scheduler (
So yeah, my mistake and agreed with you there |
#743 related to this, kept skipping on a length check. |
On investigation of issues around performance during high packet ingestion rates, I noticed that on all validators I have access to (including two on mainnet and one on testnet; versions 1.17.28, 1.17.28, and 1.18.9 respectively), the threads solBanknStgTx00 and solBanknStgTx01 appear to use very nearly 100% CPU each continuously. Here's a simple process listing:
By my calculation, the other banking threads use approximately 2.5% the CPU of those first two threads.
solBanknStgTx00 is the gossip vote ingestion thread; and solBanknStgTx01 is the TPU vote ingestion thread. It is not entirely clear what causes this high CPU usage although code analysis would seem to indicate that there is nothing preventing these threads from looping continuously as long as there are unprocessed packets in their packet storage queues.
In addition, on reading the code, it looks to me like any packet which arrives in the 20 slots leading up to the validator's leader slots will invoke the ForwardAndHold logic, and at that point I cannot see any way that the packet would leave the queue until the Consume phase is reached 20 slots later. In the meantime it looks to me like the vote TPU thread will continuously re-evaluate the packets it has Forwarded and Held, repeating the forwarding process over and over again. I cannot see how this doesn't lead to a spew of duplicate forwards out of the leader for all packets received in the 20 slots leading up to its leader slots.
I do not fully understand the code and may be wrong; but the high CPU usage of these threads combined with my evaluations might necessitate some deeper analysis of the code involved.
The text was updated successfully, but these errors were encountered: