-
Notifications
You must be signed in to change notification settings - Fork 523
go-algorand v3.1.3-beta #3139
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Merged
go-algorand v3.1.3-beta #3139
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
## Summary During fast catchup, we restart the transaction sync service very quickly. This can cause a network message being sent, and the response would be returned to the "restarted" txnsync. Since we don't want to disconnect the network connection itself ( which could have some messages enqueued ), the transaction sync would need to store the "returned" channel before sending the message. That would avoid the data race ( and safely ignore the incoming message ). ## Test Plan Use existing testing, and confirm against that.
## Summary This PR adds the node ( both client and server ), the ability to measure the time it takes to establish an outgoing connection ( excluding the TCP connection time ). This duration is captured as the initial latency, which would need to get updated via a pingpong style logic. ## Test Plan - [x] Extend existing unit tests - [x] mainnet-model testing is needed as well to confirm correctness
## Summary
I noticed log messages like this:
```
{"file":"txHandler.go","function":"github.com/algorand/go-algorand/data.(*solicitedAsyncTxHandler).loop","level":"info","line":541,"msg":"solicitedAsyncTxHandler was unable to relay transaction message : %!v(MISSING)","time":"2021-10-20T23:18:40.220089Z"}
{"file":"txHandler.go","function":"github.com/algorand/go-algorand/data.(*solicitedAsyncTxHandler).loop","level":"info","line":541,"msg":"solicitedAsyncTxHandler was unable to relay transaction message : %!v(MISSING)","time":"2021-10-20T23:18:40.220226Z"}
{"file":"txHandler.go","function":"github.com/algorand/go-algorand/data.(*solicitedAsyncTxHandler).loop","level":"info","line":541,"msg":"solicitedAsyncTxHandler was unable to relay transaction message : %!v(MISSING)","time":"2021-10-20T23:18:40.220300Z"}
{"file":"txHandler.go","function":"github.com/algorand/go-algorand/data.(*solicitedAsyncTxHandler).loop","level":"info","line":541,"msg":"solicitedAsyncTxHandler was unable to relay transaction message : %!v(MISSING)","time":"2021-10-20T23:18:40.228731Z"}
{"file":"txHandler.go","function":"github.com/algorand/go-algorand/data.(*solicitedAsyncTxHandler).loop","level":"info","line":541,"msg":"solicitedAsyncTxHandler was unable to relay transaction message : %!v(MISSING)","time":"2021-10-20T23:18:40.228828Z"}
{"file":"txHandler.go","function":"github.com/algorand/go-algorand/data.(*solicitedAsyncTxHandler).loop","level":"info","line":541,"msg":"solicitedAsyncTxHandler was unable to relay transaction message : %!v(MISSING)","time":"2021-10-20T23:18:40.228893Z"}
{"file":"txHandler.go","function":"github.com/algorand/go-algorand/data.(*solicitedAsyncTxHandler).loop","level":"info","line":541,"msg":"solicitedAsyncTxHandler was unable to relay transaction message : %!v(MISSING)","time":"2021-10-20T23:18:40.228946Z"}
{"file":"txHandler.go","function":"github.com/algorand/go-algorand/data.(*solicitedAsyncTxHandler).loop","level":"info","line":541,"msg":"solicitedAsyncTxHandler was unable to relay transaction message : %!v(MISSING)","time":"2021-10-20T23:18:40.229012Z"}
{"file":"txHandler.go","function":"github.com/algorand/go-algorand/data.(*solicitedAsyncTxHandler).loop","level":"info","line":541,"msg":"solicitedAsyncTxHandler was unable to relay transaction message : %!v(MISSING)","time":"2021-10-20T23:18:40.229080Z"}
```
## Test Plan
No test changes.
## Summary The previous web socket version was not flushing the write buffer when writing a Close control message. As a result, we ended up disconnection the connection on one side correctly, while having it in a zombie state on the other side. ## Test Plan Use existing tests. The `TestSlowPeerDisconnection` was already observing the issue being resolved.
## Summary Improve the bandwidth estimation within the transaction sync by having the estimation account for latency, transaction compression time, and time spent waiting in the incoming queue. ## Test Plan Wrote unit tests for correctness, ran network on mainnet model and observed measured bandwidths. Before the bandwidth would converge to the minimum over time as well have erratic inaccuracies. Now the numbers look much more in range, at most a factor of 2 off.
## Summary Running betanet, we've seen errors such as ``` unable to enqueue incoming message into peer incoming message backlog. disconnecting from peer. ``` as well as multiple ``` Peer 162.202.32.72:56543 disconnected: SlowConnection ``` The goal of this PR is to remove from the pending incoming message queue entries that are associated with network peers that have been disconnected ( by the network package ), or have been scheduled for disconnection ( by the transaction synch package ). Removing these would increase the "available" space in the incoming message queue and would prevent redundant disconnections. ## Test Plan - [x] unit tests updated. - [x] ran s1 network with and without load, successfully.
## Summary The catchup was occasionally reporting ``` (1): fetchAndWrite(13932148): ledger write failed: block evaluation for round 13932148 requires sequential evaluation while the latest round is 13932148 ``` This issue indicates that the catchup was attempting to validate a block which is not the latest+1, but rather newer. In this case, we can safely ignore this error, and skip applying this block, since the block was already added to the ledger. ## Test Plan Tested manually.
Codecov Report
@@ Coverage Diff @@
## rel/beta #3139 +/- ##
============================================
+ Coverage 43.84% 43.91% +0.07%
============================================
Files 385 386 +1
Lines 86323 86471 +148
============================================
+ Hits 37849 37978 +129
- Misses 42519 42520 +1
- Partials 5955 5973 +18
Continue to review full report at Codecov.
|
tsachiherman
approved these changes
Oct 25, 2021
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
catchup: ignore benign evaluator failures (#3135)
txnsync: delete-able incoming message queue (#3126)
txnsync: Improve Bandwidth Estimation in Txnsync (#3096)
network: update the websocket library. (#3131)
txnsync: Add err to solicitedAsyncTxHandler.loop log message (#3123)
network: add initial support for latency tracking (#3028)
txnsync: fix potential race during TestBasicCatchpointCatchup (#3033)