Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

pool.js::getHost() is CPU hungry #220

Closed
rsmarples opened this issue Jul 11, 2019 · 1 comment
Closed

pool.js::getHost() is CPU hungry #220

rsmarples opened this issue Jul 11, 2019 · 1 comment
Assignees

Comments

@rsmarples
Copy link
Contributor

This is best observable on regtest where constant connections to 127.0.0.1:14038 are observed and always fail.

Near the top of the function we have this code:

for (let i = 0; i < 100; i++) {
      const entry = this.hosts.getHost();

All 100 iteratrions are made. This takes on average almost 2 seconds on my machine. Then multiply this by the need to fill 8 spots causes the CPU to peg 100% and many rpc calls then fail. My test case is wallet creation - generally the hsw rpc will timeout way before hsd actually logs the request to create it.

this.hosts.getHost() looks expensive to call 100 times. Maybe a better approach would be to have a function this.hosts.getRandomList(maxHosts) which does a similar job but just returns a list of a maximal number of hosts and iterate that rather than a forced 100 loop alll the time.

rsmarples added a commit to rsmarples/hsd that referenced this issue Jul 11, 2019
Currently, pool::getHost() uses a fixed loop of 0-99 regardless
of how many hosts it has in it's lists so it can use a percentage
to factor in whether it should use an entry or not.
Using network regtest and a single host of 127.0.0.1:14038, hsd
burns a lot of CPU when refilling peers and this blocks incoming
requests so much it times out.

This is less noticeable the more hosts you have.

Instead, calculate how many hosts we have in our list and then
the percentage of each index for it. We can then iterate a much
smmaller list using the same percentage calculations.

This brings CPU utlisation on my NetBSD Xen DOMU on regtest with
one host from 100% to 6% and connecting 4 to peers on testnet
from 85% to 8%, allowing RPC commands to now work.

Fixes handshake-org#220.
@tynes
Copy link
Contributor

tynes commented Jul 25, 2019

Definitions

  • Pool - hsd/lib/net/pool
  • Peer - Another node that implements the Handshake Protocol
  • Loader Peer - The peer used to download blocks
  • hosts.json - The file that contains the peer info
  • Identity Key - Brontide p2p Identity key

Summary

In it's current form, the Pool will continuously try to connect to peers when it does not
have a minimum number of peers. When the network is sparse or running on a local regtest,
the node will be in an infinite loop trying to add more peers. This loop has caused
problems where the machine's full resources are used in this loop, making requests timeout.
If the node is also trying to stay in consensus, this would add to the problem.

A way to fix this problem is to use less iterations in Pool.getHost. That function is is used
when adding an outbound connection and also for selecting the loader peer. It is currently set at
100 and larger values are better when there are more byzantine entries in the hosts.json file.

It requires an IP address to get an entry into the hosts.json file. If peers were tracked
as IP + Identity key, it would be much cheaper to get byzantine entries into the hosts.json
file, as it would be as easy as generating a new secp256k1 keypair.

Based on personal experience, I would never see more than ~25 iterations on Bitcoin mainnet
using bcoin.

Solutions

  • Config option to set the number of iterations in the loop
  • Dynamically calculate the number of iterations in the loop
  • Hard code in a smaller number of iterations

@pinheadmz pinheadmz self-assigned this Mar 21, 2021
@nodech nodech closed this as completed in 721dd9a Apr 7, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants