pool: Use a shorter loop to get hosts for refilling peers #221

rsmarples · 2019-07-11T17:00:21Z

Currently, pool::getHost() uses a fixed loop of 0-99 regardless
of how many hosts it has in it's lists so it can use a percentage
to factor in whether it should use an entry or not.
Using network regtest and a single host of 127.0.0.1:14038, hsd
burns a lot of CPU when refilling peers and this blocks incoming
requests so much it times out.

This is less noticeable the more hosts you have.

Instead, calculate how many hosts we have in our list and then
the percentage of each index for it. We can then iterate a much
smmaller list using the same percentage calculations.

This brings CPU utlisation on my NetBSD Xen DOMU on regtest with
one host from 100% to 6% and connecting 4 to peers on testnet
from 85% to 8%, allowing RPC commands to now work.

Fixes #220.

Currently, pool::getHost() uses a fixed loop of 0-99 regardless of how many hosts it has in it's lists so it can use a percentage to factor in whether it should use an entry or not. Using network regtest and a single host of 127.0.0.1:14038, hsd burns a lot of CPU when refilling peers and this blocks incoming requests so much it times out. This is less noticeable the more hosts you have. Instead, calculate how many hosts we have in our list and then the percentage of each index for it. We can then iterate a much smmaller list using the same percentage calculations. This brings CPU utlisation on my NetBSD Xen DOMU on regtest with one host from 100% to 6% and connecting 4 to peers on testnet from 85% to 8%, allowing RPC commands to now work. Fixes handshake-org#220.

rsmarples · 2019-07-11T17:41:12Z

We might want to reduce max from 100 to 10, otherwise we'll just see the same issue if we really have >99 hosts in the list.
I elected to keep it at 100 in-case there was a reason for it that I've missed.

kilpatty · 2019-07-11T21:02:54Z

If I'm reading this correctly, this loop should only loop 100 times if there are 100 invalid addresses in the host list right? As once it finds one that actually works, it returns it?

Would it be possible for you to share your hosts.json with us, then we can see why it's not finding a valid address within the first 100 iterations.

Just from comparing in Bitcoin Core's behavior it looks like the max value they set on this loop (it's slightly different config, but I think comes out to be roughly the same behavior) is 2500. So I think there might be something else going on here that is causing issues.

Additionally, I wonder what would happen if you wipe your hosts.json file, and start afresh. I'd make a backup of it for debugging purposes, but that could give us some interesting data as to what's going on.

rsmarples · 2019-07-11T21:55:55Z

If I'm reading this correctly, this loop should only loop 100 times if there are 100 invalid addresses in the host list right? As once it finds one that actually works, it returns it?

No, there is only one entry in the hosts - localhost - and it still loops 100 times.
This patch is trying to get it to loop one time as there's only one address.

Would it be possible for you to share your hosts.json with us, then we can see why it's not finding a valid address within the first 100 iterations.

It's a clean install, not touched the hosts file.

Just from comparing in Bitcoin Core's behavior it looks like the max value they set on this loop (it's slightly different config, but I think comes out to be roughly the same behavior) is 2500. So I think there might be something else going on here that is causing issues.

Additionally, I wonder what would happen if you wipe your hosts.json file, and start afresh. I'd make a backup of it for debugging purposes, but that could give us some interesting data as to what's going on.

I can wipe the whole hsd folder and start afresh and it can be replicated with --network=regtest because it cannot contact 127.0.0.1:14038 because nothing is running on that port.

tynes · 2019-07-11T22:39:23Z

The corresponding code in bitcoind is here:
https://github.com/bitcoin/bitcoin/blob/4882040182ea1109ef9befde93c2f49a98aba391/src/net.cpp#L1657 - they have some optimizations in this part of the codebase that bcoin never got.

I'm going to apply your patch to my mainnet bcoin node to test the scenario with many peers and also a testnet hsd node.

tynes · 2019-07-12T01:32:29Z

Seeing significant CPU performance improvements (10x) running this patch.

My only concern is that the termination condition on the loop is based on a variable. That variable should never be less than or equal to 0, so I think its ok.

If we are feeling extra paranoid, I recommend:

    if (max > 100 || max <= 0)
      max = 100;

ACK

edited post b821704

maxOutbound isn't the correct config option for this. If its going to be configurable, it should be a new config option

100 is too high, it can cause local DoS for RPC calls. Tie the maximal value to maxOutbound as there is currently nothing better to tie it to, and the default of 8 is low enough to keep thing running smoothly.

codecov-io · 2019-07-12T01:49:17Z

Codecov Report

Merging #221 into master will decrease coverage by <.01%.
The diff coverage is 90.9%.

@@            Coverage Diff             @@
##           master     #221      +/-   ##
==========================================
- Coverage   52.99%   52.98%   -0.01%     
==========================================
  Files         129      129              
  Lines       35773    35780       +7     
  Branches     6032     6033       +1     
==========================================
+ Hits        18957    18958       +1     
- Misses      16816    16822       +6

Impacted Files	Coverage Δ
lib/net/pool.js	`24.32% <90.9%> (+0.14%)`	⬆️
lib/net/hostlist.js	`42.13% <0%> (-0.55%)`	⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update cc1ef7a...b6a70aa. Read the comment docs.

rsmarples · 2019-07-14T19:20:59Z

Seeing significant CPU performance improvements (10x) running this patch.

My only concern is that the termination condition on the loop is based on a variable. That variable should never be less than or equal to 0, so I think its ok.

If we are feeling extra paranoid, I recommend:
    if (max > 100 || max <= 0)
      max = 100;
ACK

edited post b821704

maxOutbound isn't the correct config option for this. If its going to be configurable, it should be a new config option

I'm also happy with setting const maxGetHostIterations = 10; near the top of the file and using that.
It doesn't have to be configurable per se, just not magic and definitely obvious.

I would argue that 100 is far too high and 10 seems a reasonable default.
I'm not entirely sure if it should be user configurable. If we want it higher for whatever reason then the hostfile::getHost() needs to be optimised as well as any calls to misc functions within the loop because we don't want any possible config option to be a self DoS.

I'll make this change over the next few days if I don't hear any other suggestion.

rsmarples · 2019-07-19T09:29:45Z

maxOutbound isn't the correct config option for this. If its going to be configurable, it should be a new config option

I toook the path of least resistance and just united it.

nodech · 2022-04-07T07:42:34Z

Merged in 6213460

rsmarples added 2 commits July 11, 2019 17:56

let -> const

a322c0c

pool: Limit number of times getHost() will iterate

b821704

100 is too high, it can cause local DoS for RPC calls. Tie the maximal value to maxOutbound as there is currently nothing better to tie it to, and the default of 8 is low enough to keep thing running smoothly.

Untie pool::getHosts from maxOutbound config option.

b6a70aa

nodech added enhancement general - improving existing feature needs rebase modifier net/p2p part of the codebase patch Release version labels Dec 10, 2021

nodech self-assigned this Dec 13, 2021

nodech mentioned this pull request Feb 24, 2022

Network tests & updates #692

Merged

nodech closed this Apr 7, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

pool: Use a shorter loop to get hosts for refilling peers #221

pool: Use a shorter loop to get hosts for refilling peers #221

rsmarples commented Jul 11, 2019

rsmarples commented Jul 11, 2019

kilpatty commented Jul 11, 2019

rsmarples commented Jul 11, 2019

tynes commented Jul 11, 2019

tynes commented Jul 12, 2019 •

edited

Loading

codecov-io commented Jul 12, 2019 •

edited

Loading

rsmarples commented Jul 14, 2019

rsmarples commented Jul 19, 2019

nodech commented Apr 7, 2022

pool: Use a shorter loop to get hosts for refilling peers #221

pool: Use a shorter loop to get hosts for refilling peers #221

Conversation

rsmarples commented Jul 11, 2019

rsmarples commented Jul 11, 2019

kilpatty commented Jul 11, 2019

rsmarples commented Jul 11, 2019

tynes commented Jul 11, 2019

tynes commented Jul 12, 2019 • edited Loading

codecov-io commented Jul 12, 2019 • edited Loading

Codecov Report

rsmarples commented Jul 14, 2019

rsmarples commented Jul 19, 2019

nodech commented Apr 7, 2022

tynes commented Jul 12, 2019 •

edited

Loading

codecov-io commented Jul 12, 2019 •

edited

Loading