Skip to content

Conversation

@tsachiherman
Copy link
Contributor

Summary

While running a longevity tests on the transaction sync feature branch, the following issue showed up on some of the participating nodes :

{"file":"actions.go","function":"github.com/algorand/go-algorand/agreement.pseudonodeAction.do","level":"error","line":386,"msg":"pseudonode.MakeVotes call failed(attest) pseudonode input channel is full","time":"2021-07-19T13:28:35.078410Z"}

Digging further, it seems that both pseudonodeVotesTask.execute as well as pseudonodeProposalsTask.execute could lead to this situation ( same issue ). There are two distinct issues in the current implementation that have been addressed in this PR:

  1. The call to verifyVote might silently fail. In that case, the caller ( i.e. the execute method ) would still expect to find the result in the passed-in output channel. When this does not happen, the method could be blocked indefinitely.
  2. When the execute method tries to send the output result back to the output channel, it might fail doing so due to an issue on the "other" side. When that happens, we have had no error indication. The changes in the codebase would now allow a timeout for these messages to ensure it won't be "stuck" forever.

Test Plan

Use existing unit tests. Run a longevity test.

@tsachiherman tsachiherman requested a review from a user July 19, 2021 16:31
@tsachiherman tsachiherman self-assigned this Jul 19, 2021
@tsachiherman tsachiherman requested a review from cce July 19, 2021 16:33
@ghost
Copy link

ghost commented Jul 19, 2021

Is there a reason why we are not trying to merge this to master but only to the txnsync branch?

@tsachiherman
Copy link
Contributor Author

Is there a reason why we are not trying to merge this to master but only to the txnsync branch?

Yes - I want to verify the fix on this branch ( where I was able to reproduce it ), and then port it to master.

It will take few days of runtime to reproduce it, so I figured that I'll start as soon as I can on this branch.

@codecov-commenter
Copy link

codecov-commenter commented Jul 19, 2021

Codecov Report

Merging #2582 (79bb444) into feature/txnsync (dd84666) will increase coverage by 0.02%.
The diff coverage is 41.02%.

Impacted file tree graph

@@                 Coverage Diff                 @@
##           feature/txnsync    #2582      +/-   ##
===================================================
+ Coverage            42.93%   42.96%   +0.02%     
===================================================
  Files                  369      370       +1     
  Lines                83744    83808      +64     
===================================================
+ Hits                 35958    36007      +49     
- Misses               41978    41999      +21     
+ Partials              5808     5802       -6     
Impacted Files Coverage Δ
agreement/cryptoVerifier.go 73.04% <16.66%> (-2.69%) ⬇️
agreement/asyncVoteVerifier.go 79.66% <33.33%> (-2.16%) ⬇️
agreement/pseudonode.go 65.43% <43.93%> (-6.60%) ⬇️
ledger/blockqueue.go 81.03% <0.00%> (-2.88%) ⬇️
network/requestTracker.go 70.25% <0.00%> (-0.87%) ⬇️
network/wsNetwork.go 61.09% <0.00%> (ø)
libgoal/lockedFileLinux.go
util/db/fullfsync_darwin.go 100.00% <0.00%> (ø)
libgoal/lockedFileUnix.go 0.00% <0.00%> (ø)
util/db/dbutil.go 39.54% <0.00%> (+0.56%) ⬆️
... and 7 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update dd84666...79bb444. Read the comment docs.

@tsachiherman tsachiherman merged commit 4836ea4 into algorand:feature/txnsync Jul 19, 2021
@tsachiherman tsachiherman deleted the tsachi/aggreement_full_backlog branch July 19, 2021 18:58
@cce cce mentioned this pull request Aug 11, 2021
@tsachiherman tsachiherman restored the tsachi/aggreement_full_backlog branch August 16, 2021 15:21
@cce
Copy link
Contributor

cce commented Apr 19, 2022

follow-up: this was split off and released in #2741

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants