solBanknStgTx00 and solBanknStgTx01 threads peg the CPU #636

bji · 2024-04-07T02:30:34Z

On investigation of issues around performance during high packet ingestion rates, I noticed that on all validators I have access to (including two on mainnet and one on testnet; versions 1.17.28, 1.17.28, and 1.18.9 respectively), the threads solBanknStgTx00 and solBanknStgTx01 appear to use very nearly 100% CPU each continuously. Here's a simple process listing:

$ ps -eL | grep BanknS
2363290 2364914 ?        11-23:49:41 solBanknStgTx00
2363290 2364915 ?        11-22:12:47 solBanknStgTx01
2363290 2364916 ?        07:15:09 solBanknStgTx02
2363290 2364917 ?        07:26:36 solBanknStgTx03
2363290 2364918 ?        07:15:41 solBanknStgTx04
2363290 2364919 ?        07:25:15 solBanknStgTx05

By my calculation, the other banking threads use approximately 2.5% the CPU of those first two threads.

solBanknStgTx00 is the gossip vote ingestion thread; and solBanknStgTx01 is the TPU vote ingestion thread. It is not entirely clear what causes this high CPU usage although code analysis would seem to indicate that there is nothing preventing these threads from looping continuously as long as there are unprocessed packets in their packet storage queues.

In addition, on reading the code, it looks to me like any packet which arrives in the 20 slots leading up to the validator's leader slots will invoke the ForwardAndHold logic, and at that point I cannot see any way that the packet would leave the queue until the Consume phase is reached 20 slots later. In the meantime it looks to me like the vote TPU thread will continuously re-evaluate the packets it has Forwarded and Held, repeating the forwarding process over and over again. I cannot see how this doesn't lead to a spew of duplicate forwards out of the leader for all packets received in the 20 slots leading up to its leader slots.

I do not fully understand the code and may be wrong; but the high CPU usage of these threads combined with my evaluations might necessitate some deeper analysis of the code involved.

The text was updated successfully, but these errors were encountered:

bji · 2024-04-07T03:06:54Z

I see now where the re-forwarding of packets is prevented. In LatestUnprocessedVotes::get_and_insert_forwardable_packets, only votes for which is_forwarded() returns false are included.

However the flow of the loop is definitely wasteful; it does an awful lot of work re-evaluating already forwarded packets in a tight loop. I do not know the degree to which this contributes to the high CPU usage.

apfitzge · 2024-04-08T14:40:51Z

Yeah looking at the vote threads, they do seem wasteful in their usage. I think we could probably make a few simple changes here to reduce CPU usage in short-term:

Gossip thread never does any actual work. Only receives/buffers threads. However it's receive timeout is still chosen the same as the other threads. In PacketReceiver::get_receive_timeout. This should be adjusted so the gossip thread will sleep for a much longer period.
TpuVote thread might keep track of the unforwarded packet count and only loop/process again if there are actually unforwarded packets. If no unforwarded packets, should also try receiving for a longer period of time.

Long-term I think we may be able to consolidate the 2 threads into single thread, also remove the holding period before gossip votes are released to the thread.

steviez · 2024-04-09T22:29:59Z

~~FWIW - I think this issue is tied to the old scheduler (default in v1.17 and v1.18); the new scheduler (default in master) looks to have much lower usage when the leader is not near its' leader slots~~

Above statement is incorrect - see #636 (comment)

apfitzge · 2024-04-10T13:32:49Z

New scheduler doesn't change the vote threads though, so I'd not expect new scheduler to have any effect here.

steviez · 2024-04-10T18:25:52Z

New scheduler doesn't change the vote threads though, so I'd not expect new scheduler to have any effect here.

I went back and revisited my data that made me make this claim, and it looks like the time window that I sampled was "abnormal" in that it had lower utilization on these threads. The output below show thread name (comm) and cumulative scheduled time divided by cumulative process time (pcpu).

Tip-of-master with default (new scheduler)

$ ps -p 3753527 -T -o comm,pcpu | grep BanknStg
solBanknStgTx00 99.6
solBanknStgTx01 99.3

Tip-of-master with old scheduler (--block-production-method thread-local-multi-iterator):

$ ps -p 953856 -T -o comm,pcpu | grep Bankn
solBanknStgTx00 99.6
solBanknStgTx01 99.4
solBanknStgTx02  0.4
solBanknStgTx03  0.4
solBanknStgTx04  0.4
solBanknStgTx05  0.5

So yeah, my mistake and agreed with you there

apfitzge · 2024-04-11T15:29:54Z

#743 related to this, kept skipping on a length check.

bji added the community label Apr 7, 2024

bji assigned apfitzge Apr 7, 2024

apfitzge mentioned this issue Apr 9, 2024

BankingStage - Gossip vote thread always wait #681

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

solBanknStgTx00 and solBanknStgTx01 threads peg the CPU #636

solBanknStgTx00 and solBanknStgTx01 threads peg the CPU #636

bji commented Apr 7, 2024

bji commented Apr 7, 2024

apfitzge commented Apr 8, 2024

steviez commented Apr 9, 2024 •

edited

Loading

apfitzge commented Apr 10, 2024

steviez commented Apr 10, 2024

apfitzge commented Apr 11, 2024 •

edited

Loading

solBanknStgTx00 and solBanknStgTx01 threads peg the CPU #636

solBanknStgTx00 and solBanknStgTx01 threads peg the CPU #636

Comments

bji commented Apr 7, 2024

bji commented Apr 7, 2024

apfitzge commented Apr 8, 2024

steviez commented Apr 9, 2024 • edited Loading

apfitzge commented Apr 10, 2024

steviez commented Apr 10, 2024

apfitzge commented Apr 11, 2024 • edited Loading

steviez commented Apr 9, 2024 •

edited

Loading

apfitzge commented Apr 11, 2024 •

edited

Loading