-
Notifications
You must be signed in to change notification settings - Fork 278
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
pool: Use a shorter loop to get hosts for refilling peers #221
Conversation
Currently, pool::getHost() uses a fixed loop of 0-99 regardless of how many hosts it has in it's lists so it can use a percentage to factor in whether it should use an entry or not. Using network regtest and a single host of 127.0.0.1:14038, hsd burns a lot of CPU when refilling peers and this blocks incoming requests so much it times out. This is less noticeable the more hosts you have. Instead, calculate how many hosts we have in our list and then the percentage of each index for it. We can then iterate a much smmaller list using the same percentage calculations. This brings CPU utlisation on my NetBSD Xen DOMU on regtest with one host from 100% to 6% and connecting 4 to peers on testnet from 85% to 8%, allowing RPC commands to now work. Fixes handshake-org#220.
We might want to reduce max from 100 to 10, otherwise we'll just see the same issue if we really have >99 hosts in the list. |
If I'm reading this correctly, this loop should only loop 100 times if there are 100 invalid addresses in the host list right? As once it finds one that actually works, it returns it? Would it be possible for you to share your hosts.json with us, then we can see why it's not finding a valid address within the first 100 iterations. Just from comparing in Bitcoin Core's behavior it looks like the max value they set on this loop (it's slightly different config, but I think comes out to be roughly the same behavior) is 2500. So I think there might be something else going on here that is causing issues. Additionally, I wonder what would happen if you wipe your hosts.json file, and start afresh. I'd make a backup of it for debugging purposes, but that could give us some interesting data as to what's going on. |
No, there is only one entry in the hosts - localhost - and it still loops 100 times.
It's a clean install, not touched the hosts file.
I can wipe the whole hsd folder and start afresh and it can be replicated with --network=regtest because it cannot contact 127.0.0.1:14038 because nothing is running on that port. |
The corresponding code in I'm going to apply your patch to my mainnet |
Seeing significant CPU performance improvements (10x) running this patch. My only concern is that the termination condition on the loop is based on a variable. That variable should never be less than or equal to 0, so I think its ok. If we are feeling extra paranoid, I recommend:
ACK edited post b821704
|
100 is too high, it can cause local DoS for RPC calls. Tie the maximal value to maxOutbound as there is currently nothing better to tie it to, and the default of 8 is low enough to keep thing running smoothly.
Codecov Report
@@ Coverage Diff @@
## master #221 +/- ##
==========================================
- Coverage 52.99% 52.98% -0.01%
==========================================
Files 129 129
Lines 35773 35780 +7
Branches 6032 6033 +1
==========================================
+ Hits 18957 18958 +1
- Misses 16816 16822 +6
Continue to review full report at Codecov.
|
I'm also happy with setting I would argue that 100 is far too high and 10 seems a reasonable default. I'll make this change over the next few days if I don't hear any other suggestion. |
I toook the path of least resistance and just united it. |
Merged in 6213460 |
Currently, pool::getHost() uses a fixed loop of 0-99 regardless
of how many hosts it has in it's lists so it can use a percentage
to factor in whether it should use an entry or not.
Using network regtest and a single host of 127.0.0.1:14038, hsd
burns a lot of CPU when refilling peers and this blocks incoming
requests so much it times out.
This is less noticeable the more hosts you have.
Instead, calculate how many hosts we have in our list and then
the percentage of each index for it. We can then iterate a much
smmaller list using the same percentage calculations.
This brings CPU utlisation on my NetBSD Xen DOMU on regtest with
one host from 100% to 6% and connecting 4 to peers on testnet
from 85% to 8%, allowing RPC commands to now work.
Fixes #220.