Description
This is intended to be a migration of this issue originally filed on web3. However, I dove all the way in to the W3CWebSocket used by web3 and found that the requests are being sent but the responses were not being received at all from geth, so I don't think it's a web3 issue.
Additionally, I have an odd observation from Geth which might help in debugging.
System information
Geth version: 1.9.18
OS & Version: Windows/Linux/OSX: Windows 10
Steps to reproduce the behaviour
Follow the setup here. Use a WebSocket URL when connecting to geth (in truffle-config) and in migrations/3_increment_counter.js
you can set batchSize
to 100 without an issue. This should produce a blockchain with >100K transactions in just over 1000 blocks, with most blocks containing 100 transactions (sometimes that's split across a couple).
The expected and actual behavior are as listed there. If the responses are not going to be responded to, some error is expected; if no error at least some logging and documentation about how to configure the threshold would be better than status quo silent failure.
Now here's the twist. If I make all three of the following changes (from the code as of the 1.9.18 release tag), I get the expected results. The changes are all Geth console logging and shouldn't affect correctness of operation, but in combination, these three log statements do. The changes are:
- In internal/ethapi/api.go, the getBlockByNumber function, add at the start of the function
log.Info("In api.go GetBlockByNumber", "number", number);
- In core/blockchain.go, the getBlock function, add at the start of the function
log.Info("In getBlock for number", "number", number);
- In core/blockchain.go, the getBlock function, add at the start of the if block for checking the cache
log.Info("Returning from getBlock (cached) for number", "number", number);
The third one gets lots of use because of how the transaction generation and querying process works; if yours isn't doing that you could add a similar log statement (without "cached") just before the return statement.
Comment out any of those three and Geth only returns data for roughly 325 blocks and 0 transactions out of an expected 100K of each. Include all three, and you get the expected 100K block and 100K transactions returned.
I suspect that there may be some buffer or limit that is being overwhelmed, causing some glitch that leads Geth to stop sending responses to queries for at least a little while, and that these log statements slow down processing just enough that the issue threshold is never exceeded. However, I'm not yet quite sure where that threshold is set.