Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

pool: Use a shorter loop to get hosts for refilling peers #221

Closed
wants to merge 4 commits into from

Conversation

rsmarples
Copy link
Contributor

Currently, pool::getHost() uses a fixed loop of 0-99 regardless
of how many hosts it has in it's lists so it can use a percentage
to factor in whether it should use an entry or not.
Using network regtest and a single host of 127.0.0.1:14038, hsd
burns a lot of CPU when refilling peers and this blocks incoming
requests so much it times out.

This is less noticeable the more hosts you have.

Instead, calculate how many hosts we have in our list and then
the percentage of each index for it. We can then iterate a much
smmaller list using the same percentage calculations.

This brings CPU utlisation on my NetBSD Xen DOMU on regtest with
one host from 100% to 6% and connecting 4 to peers on testnet
from 85% to 8%, allowing RPC commands to now work.

Fixes #220.

Currently, pool::getHost() uses a fixed loop of 0-99 regardless
of how many hosts it has in it's lists so it can use a percentage
to factor in whether it should use an entry or not.
Using network regtest and a single host of 127.0.0.1:14038, hsd
burns a lot of CPU when refilling peers and this blocks incoming
requests so much it times out.

This is less noticeable the more hosts you have.

Instead, calculate how many hosts we have in our list and then
the percentage of each index for it. We can then iterate a much
smmaller list using the same percentage calculations.

This brings CPU utlisation on my NetBSD Xen DOMU on regtest with
one host from 100% to 6% and connecting 4 to peers on testnet
from 85% to 8%, allowing RPC commands to now work.

Fixes handshake-org#220.
@rsmarples
Copy link
Contributor Author

We might want to reduce max from 100 to 10, otherwise we'll just see the same issue if we really have >99 hosts in the list.
I elected to keep it at 100 in-case there was a reason for it that I've missed.

@kilpatty
Copy link
Contributor

If I'm reading this correctly, this loop should only loop 100 times if there are 100 invalid addresses in the host list right? As once it finds one that actually works, it returns it?

Would it be possible for you to share your hosts.json with us, then we can see why it's not finding a valid address within the first 100 iterations.

Just from comparing in Bitcoin Core's behavior it looks like the max value they set on this loop (it's slightly different config, but I think comes out to be roughly the same behavior) is 2500. So I think there might be something else going on here that is causing issues.

Additionally, I wonder what would happen if you wipe your hosts.json file, and start afresh. I'd make a backup of it for debugging purposes, but that could give us some interesting data as to what's going on.

@rsmarples
Copy link
Contributor Author

If I'm reading this correctly, this loop should only loop 100 times if there are 100 invalid addresses in the host list right? As once it finds one that actually works, it returns it?

No, there is only one entry in the hosts - localhost - and it still loops 100 times.
This patch is trying to get it to loop one time as there's only one address.

Would it be possible for you to share your hosts.json with us, then we can see why it's not finding a valid address within the first 100 iterations.

It's a clean install, not touched the hosts file.

Just from comparing in Bitcoin Core's behavior it looks like the max value they set on this loop (it's slightly different config, but I think comes out to be roughly the same behavior) is 2500. So I think there might be something else going on here that is causing issues.

Additionally, I wonder what would happen if you wipe your hosts.json file, and start afresh. I'd make a backup of it for debugging purposes, but that could give us some interesting data as to what's going on.

I can wipe the whole hsd folder and start afresh and it can be replicated with --network=regtest because it cannot contact 127.0.0.1:14038 because nothing is running on that port.

@tynes
Copy link
Contributor

tynes commented Jul 11, 2019

The corresponding code in bitcoind is here:
https://github.com/bitcoin/bitcoin/blob/4882040182ea1109ef9befde93c2f49a98aba391/src/net.cpp#L1657 - they have some optimizations in this part of the codebase that bcoin never got.

I'm going to apply your patch to my mainnet bcoin node to test the scenario with many peers and also a testnet hsd node.

@tynes
Copy link
Contributor

tynes commented Jul 12, 2019

Seeing significant CPU performance improvements (10x) running this patch.

My only concern is that the termination condition on the loop is based on a variable. That variable should never be less than or equal to 0, so I think its ok.

If we are feeling extra paranoid, I recommend:

    if (max > 100 || max <= 0)
      max = 100;

ACK


edited post b821704

maxOutbound isn't the correct config option for this. If its going to be configurable, it should be a new config option

100 is too high, it can cause local DoS for RPC calls.
Tie the maximal value to maxOutbound as there is currently
nothing better to tie it to, and the default of 8 is low enough
to keep thing running smoothly.
@codecov-io
Copy link

codecov-io commented Jul 12, 2019

Codecov Report

Merging #221 into master will decrease coverage by <.01%.
The diff coverage is 90.9%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master     #221      +/-   ##
==========================================
- Coverage   52.99%   52.98%   -0.01%     
==========================================
  Files         129      129              
  Lines       35773    35780       +7     
  Branches     6032     6033       +1     
==========================================
+ Hits        18957    18958       +1     
- Misses      16816    16822       +6
Impacted Files Coverage Δ
lib/net/pool.js 24.32% <90.9%> (+0.14%) ⬆️
lib/net/hostlist.js 42.13% <0%> (-0.55%) ⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update cc1ef7a...b6a70aa. Read the comment docs.

@rsmarples
Copy link
Contributor Author

Seeing significant CPU performance improvements (10x) running this patch.

My only concern is that the termination condition on the loop is based on a variable. That variable should never be less than or equal to 0, so I think its ok.

If we are feeling extra paranoid, I recommend:

    if (max > 100 || max <= 0)
      max = 100;

ACK

edited post b821704

maxOutbound isn't the correct config option for this. If its going to be configurable, it should be a new config option

I'm also happy with setting const maxGetHostIterations = 10; near the top of the file and using that.
It doesn't have to be configurable per se, just not magic and definitely obvious.

I would argue that 100 is far too high and 10 seems a reasonable default.
I'm not entirely sure if it should be user configurable. If we want it higher for whatever reason then the hostfile::getHost() needs to be optimised as well as any calls to misc functions within the loop because we don't want any possible config option to be a self DoS.

I'll make this change over the next few days if I don't hear any other suggestion.

@rsmarples
Copy link
Contributor Author

maxOutbound isn't the correct config option for this. If its going to be configurable, it should be a new config option

I toook the path of least resistance and just united it.

@nodech nodech added enhancement general - improving existing feature needs rebase modifier net/p2p part of the codebase patch Release version labels Dec 10, 2021
@nodech nodech self-assigned this Dec 13, 2021
@nodech nodech mentioned this pull request Feb 24, 2022
@nodech nodech closed this Apr 7, 2022
@nodech
Copy link
Contributor

nodech commented Apr 7, 2022

Merged in 6213460

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement general - improving existing feature needs rebase modifier net/p2p part of the codebase patch Release version
Projects
None yet
Development

Successfully merging this pull request may close these issues.

pool.js::getHost() is CPU hungry
5 participants