Description
System information
Geth version: 1.10.19 (but behavior appears unchanged in .21)
OS & Version: Linux
Expected behaviour
A node that is only asked about "recent" logs/receipts via eth_getLogs
should be able to handle a high volume of that RPC without duplicate trips to the raw level DB.
Actual behaviour
In #17610 we added a receiptsCache
LRU to BlockChain
to prevent redundant db lookups and RLP decodes of receipts and logs from the DB. However, it appears that changes introduced in #23147 circumvent that LRU cache completely, leading to what appears to me to be a performance regression in eth_getLogs
RPC performance.
With a recent version of geth, under only moderate eth_getLogs
load that only accesses logs from the last 128 blocks, we quickly see that a significant portion of the nodes flamegraph is spent in the new ReadLogs
method:
Steps to reproduce the behaviour
It's pretty easy to reproduce:
- sync a node to head, preferably mainnet since it has the most logs in a given block.
- generate some load on the geth node, for example using
hey
(https://github.com/rakyll/hey):hey -c 5 -z 30s -t 0 -m POST -T application/json -d '{"jsonrpc":"2.0", "id": 1123123, "method": "eth_getLogs", "params": [{}]}' http://0.0.0.0:8545/
- even with only 5 concurrent
eth_getLogs
requests for the latest logs like the above, note that a non-trivial amount of time is spent inReadLogs
:
- raise the number of concurrent requests to 50 (
-c 50
) and note that the amount of time spent inReadLogs
increases as well:
One would expect that the recent block's receipts/logs is only read from the raw DB once per block, and thus that the relative amount of time spent in ReadLogs
would go down rather than up under increased load.