core: concurrent background transaction sender ecrecover #16882
Merged
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
A single
ecrecover
operation on my laptop (Core i7) takes approximately 165us according to our benchmarks (can go up to 220us if the machine is hot and CPU gets throttled).That seems fairly low and insignificant (5000-6000 ecrecovers / second), until we realize there are 243M transactions on mainnet... which takes approximately 12-14 hours to ecrecover on a single thread if running a full sync.
Post sync block propagation also suffers on mainnet when we consider that a block has about 250 transactions in them. That amounts to about 50ms of processing time just for recovering the senders from the transactions.
Uncles further worsen the issue, because they usually arrive at the same time with the canonical block, so they incur an extra 50ms processing hit. Furthermore any transaction reorged out of the canonical chain is loaded back into the pool, incurring ecrevocer costs once more.
A functionally correct solution would be to introduce ecrecover threading to all places where we might see performance gains, and properly synchronize and process the results similar to PoW checks. Unfortunately that would be way too expensive and messy, because:
Launching goroutines when there isn't enough work would outweigh the benefits, waiting for the results could introduce blockages and unneeded sync overhead. And overall complexity would easily become prohibitive.
Luckily our transaction objects support caching the senders for later retrieval. This means that we don't need to implement fancy scheduling to ensure the recoveries are correct but still fast. Rather we can create the fastest and simplest way to ecrecover the signers, and any occasional hiccups (didn't recover in time when needed, recovered with the wrong signer during a fork block, etc) auto correct themselves.
This PR solves the concurrent ecrecovery task by launching a
numcpu
pool of global recovery threads, each waiting for transaction lists to recover. Whenever a list of transaction is in need of recovery, it is split between all the workers, which execute the operations and cache the results back into the origin transaction. To maximize performance the same underlying buffer is sent to all workers, but not in contiguous chunks, rather strided (thread 1 processes tx 1, 1+N, 1+2N, etc).Recovery tasks are pushed by the blockchain and the tx pool during block importing and tx rescheduling. This ensures that we relieve sync/propagation ecrecover as well as reorg ecrecover slowdowns.
Performance stats:
master
ecrecover