-
Notifications
You must be signed in to change notification settings - Fork 741
e2e: Increase all ANR timeouts to 2m to ensure CI reliability. #1733
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for spotting this!
// start is async, so wait some time for cluster health | ||
time.Sleep(time.Minute) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Was this sleep just never needed? Like a minute long sleep is long
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe it was needed in the past, but it doesn't seem to be needed now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not neede anymore
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And to be clear, this sleep ensured each e2e job was wasting 30-40s. At least with a timeout the check can complete as soon as the nodes are ready, but this sleep will never exit early if the nodes are healthy earlier than expected.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The missing piece here is understanding that Health
actually blocks until Healthy
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yup that was very old ANR tech dep. with the first grpc server implementation, start (with blockchain creation -I know this is not the case but this is a copy probably-) broked posterior health call without sleeping some time.
Why this should be merged
CI seems to be exceeding many ANR-related timeouts. Rather than bumping timeouts piecemeal, all ANR timeouts are set to the same constant of 2 minutes.
How this works
How this was tested