Skip to content

Conversation

@vasild
Copy link
Contributor

@vasild vasild commented Feb 19, 2019

getRandomDownloadBlocksPeer() would end up in an "infinite" recursion
loop if the chain of the node is >10 blocks forked from the chain of all
other nodes. This is because __getRecentBlockIds() returns just the
highest 10 blocks of the node's chain.

getRandomDownloadBlocksPeer() is not necessary with peer verification
because during peer verification we check for common blocks in a more
robust way (not just the highest 10), going back to the genesis block if
necessary. So, for sure, if a peer is verified, we know that he has
common blocks with us. If a peer could not be verified, then he would
not be in the list of our peers. Thus, just use getRandomPeer() instead
of getRandomDownloadBlocksPeer().

Also, improve some diagnostic messages to contain a reason why a
failure occurred.

Proposed changes

Types of changes

  • Bugfix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Refactoring (improve a current implementation without adding a new feature or fixing a bug)
  • Breaking change (fix or feature that would cause existing functionality to not work as expected)
  • Build (changes that affect the build system)
  • Docs (documentation only changes)
  • Test (adding missing tests or fixing existing tests)
  • Other... Please describe:

Checklist

  • I have read the CONTRIBUTING documentation
  • Lint and unit tests pass locally with my changes
  • I have added tests that prove my fix is effective or that my feature works
  • I have added necessary documentation (if appropriate)

getRandomDownloadBlocksPeer() would end up in an "infinite" recursion
loop if the chain of the node is >10 blocks forked from the chain of all
other nodes. This is because __getRecentBlockIds() returns just the
highest 10 blocks of the node's chain.

getRandomDownloadBlocksPeer() is not necessary with peer verification
because during peer verification we check for common blocks in a more
robust way (not just the highest 10), going back to the genesis block if
necessary. So, for sure, if a peer is verified, we know that he has
common blocks with us. If a peer could not be verified, then he would
not be in the list of our peers. Thus, just use getRandomPeer() instead
of getRandomDownloadBlocksPeer().

Also, improve some diagnostic messages to contain a reason why a
failure occurred.
@ghost
Copy link

ghost commented Feb 19, 2019

@air1one @faustbrian @supaiku0 - please review this in the next few days. Be sure to explicitly select labels so I know what's going on.

If no reviewer appears after a week, a reminder will be sent out.

@ghost ghost requested a review from air1one February 19, 2019 18:10
@vasild
Copy link
Contributor Author

vasild commented Feb 19, 2019

Notice: this patch will re-enable peer verification

@codecov-io
Copy link

codecov-io commented Feb 19, 2019

Codecov Report

Merging #2121 into develop will decrease coverage by 0.06%.
The diff coverage is 51.85%.

Impacted file tree graph

@@             Coverage Diff             @@
##           develop    #2121      +/-   ##
===========================================
- Coverage    79.35%   79.28%   -0.07%     
===========================================
  Files          329      329              
  Lines         7889     7898       +9     
  Branches      1127     1105      -22     
===========================================
+ Hits          6260     6262       +2     
- Misses        1597     1604       +7     
  Partials        32       32
Impacted Files Coverage Δ
packages/core-p2p/src/monitor.ts 52.21% <40%> (-1.14%) ⬇️
packages/core-p2p/src/peer.ts 84.21% <58.82%> (-3.79%) ⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update a102716...a63cd65. Read the comment docs.

@spkjp
Copy link
Contributor

spkjp commented Feb 20, 2019

Looks good to me.

spkjp
spkjp previously requested changes Feb 20, 2019
Copy link
Contributor

@spkjp spkjp left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Needs some changes, I ran into a new issue on my node.

const peerVerifier = new PeerVerifier(this);

if (deadline <= new Date().getTime()) {
throw new PeerPingTimeoutError(delay);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When my node is trying to fetch a peer list, this exception is thrown and then it fails to get any new peers.

2019-02-20 03:06:31][INFO]: Fetching a fresh peer list from http://167.114.29.54:4002
[2019-02-20 03:06:33][DEBUG]: Request to http://167.114.29.54:4002/peer/blocks/common?ids=7650603120054377288,15579271131030939594,6475133262905089782,15240575364085420699,3955350625842386444,8592197812490173822,14213463318813299626,17826610844628864089,1106400051818429549 failed: timeout of 138ms exceeded

Happens everytime. If increase the ping timeout in getPeers it manages to get a peer list.

Copy link
Contributor Author

@vasild vasild Feb 20, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The error message you posted does not originate from throw new PeerPingTimeoutError(delay), it comes from somewhere deeper - where we do try to access /peer/blocks/.... However, more importantly, I noticed that this ping() call from getPeers() is not necessary because the latter is only interested in /peer/list. As removing it is a separate, somewhat unrelated change, I created another PR for it: #2123

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[2019-02-20 03:06:33][DEBUG]: Request to http://167.114.29.54:4002/peer/blocks/common?ids=7650603120054377288,15579271131030939594,6475133262905089782,15240575364085420699,3955350625842386444,8592197812490173822,14213463318813299626,17826610844628864089,1106400051818429549 failed: timeout of 138ms exceeded
[2019-02-20 03:06:33][INFO]: Peer verify 167.114.29.54: failure: could not get blocks starting from height 1566489 from peer: exception: timeout elapsed before successful completion of the verification

getPeers -> peer.ping() -> peerVerifier.checkState(body, deadline)) -> boom

The nsect calls probe which does hasCommonBlocks(Object.keys(probesHeightById), msRemaining) and msRemaining is reduced in each probe iteration... until it is really small (like 138ms) and then it is bound to fail.

The other PR removes ping from getPeers so this won't pose a problem in this scenario anymore. Though the behavior is still odd. 🤔

@ghost
Copy link

ghost commented Feb 20, 2019

@vasild Your pull request needs some changes. Please wait for a comment from one of our developers for more information.

@faustbrian faustbrian dismissed spkjp’s stale review February 20, 2019 14:56

Resolved in another PR

@faustbrian faustbrian merged commit d963957 into develop Feb 20, 2019
@ghost ghost deleted the remove-getRandomDownloadBlocksPeer branch February 20, 2019 14:56
@ghost ghost removed the Status: Needs Review label Feb 20, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants