Skip to content
This repository was archived by the owner on Jan 22, 2025. It is now read-only.

Conversation

apfitzge
Copy link
Contributor

@apfitzge apfitzge commented Apr 29, 2023

Problem

Wanted some idea for performance of processing function w/o considering interference from other threads or work.

Essentially creating ideal conditions for single-threaded bank tx consume performance: no possible conflicts, essentially no overhead.
This will help gauge performance in the scheduler/worker threads in terms of overhead.

Summary of Changes

Add benchmarks for Consumer::process_and_record_transactions for different batch sizes.

Fixes #

@codecov
Copy link

codecov bot commented May 5, 2023

Codecov Report

Merging #31414 (bf0e56b) into master (4b0e16d) will decrease coverage by 0.1%.
The diff coverage is n/a.

@@            Coverage Diff            @@
##           master   #31414     +/-   ##
=========================================
- Coverage    81.5%    81.5%   -0.1%     
=========================================
  Files         734      734             
  Lines      207159   207159             
=========================================
- Hits       168959   168940     -19     
- Misses      38200    38219     +19     

@apfitzge apfitzge marked this pull request as ready for review May 8, 2023 15:31
@apfitzge apfitzge requested review from ryoqun and tao-stones May 8, 2023 15:31
Copy link
Contributor

@tao-stones tao-stones left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm - good job to bench the consumer. Do you have results to share in comments? Just curious, also do you think if good idea to add 128 batch_size ?

@apfitzge
Copy link
Contributor Author

apfitzge commented May 12, 2023

Do you have results to share in comments?

EDIT: THE FOLLOWING RESULTS ARE NOT VALID. SEE #31625 (comment)

results (hidden due to above invalidation)

lol I made the PR and guess I didn't actually press "Comment" to paste my table

running 3 tests
test bench_process_and_record_transactions_full_batch ... bench:     692,216 ns/iter (+/- 28,212)
test bench_process_and_record_transactions_half_batch ... bench:     354,768 ns/iter (+/- 15,125)
test bench_process_and_record_transactions_unbatched  ... bench:      20,376 ns/iter (+/- 454)
batch size ns/batch TPS txs/slot
1 20376 47792 19117
32 354768 92748 37099
64 692216 94271 37709

i.e. there's a ton of room for improvement w/o even improving the performance of these consuming functions. One big thing missing from these benches is slot transitions, which I have a feeling accounts for a significant portion of the inefficiency. Probably worthwhile for me to create a modified bench that shows that!

Just curious, also do you think if good idea to add 128 batch_size ?

So this is actually directly calling the inner function that only handles batches of up to 64, in the current callpath process_transactions calls this after creating sub-slices of <=64, which is what the chunk_offset is for. However, I did just look through the call stack of this...not sure there's anything explicitly preventing us from doing larger batches - i'm not sure if we have the entry size limit imposed somewhere on replay side. Diminishing returns on batching size though - already not much difference between batch size of 32 and 64, expect even less for 64 vs 128. Quickly checked it - 1,349,943 ns/iter => 94818 TPS, so ~.5% difference.

@apfitzge apfitzge merged commit 6adbb12 into solana-labs:master May 12, 2023
@apfitzge apfitzge deleted the bench/banking_stage_consumer branch May 12, 2023 16:44
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants