-
Notifications
You must be signed in to change notification settings - Fork 773
Healthcheck for zero ingress connection count #3719
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Merged
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
a5a21ff
to
b8ad75a
Compare
de6b79f
to
d7fdf19
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall, looks great. I've added few comments.
Thanks, addressed/responded to your comments. |
Thanks. Will re-review shortly.
…On Tue, Feb 18, 2025, 2:31 PM yacovm ***@***.***> wrote:
Overall, looks great. I've added few comments.
Thanks, addressed/responded to your comments.
—
Reply to this email directly, view it on GitHub
<#3719 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AF2OOHY34J6BCLZ6OD4ZA532QODB7AVCNFSM6AAAAABXDA2QMCVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDMNRWG42DSNJRG4>
.
You are receiving this because you commented.Message ID:
***@***.***>
[image: yacovm]*yacovm* left a comment (ava-labs/avalanchego#3719)
<#3719 (comment)>
Overall, looks great. I've added few comments.
Thanks, addressed/responded to your comments.
—
Reply to this email directly, view it on GitHub
<#3719 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AF2OOHY34J6BCLZ6OD4ZA532QODB7AVCNFSM6AAAAABXDA2QMCVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDMNRWG42DSNJRG4>
.
You are receiving this because you commented.Message ID:
***@***.***>
|
tsachiherman
approved these changes
Feb 19, 2025
ceyonur
reviewed
Feb 24, 2025
maru-ava
approved these changes
Feb 26, 2025
cd6d014
to
a52673e
Compare
maru-ava
approved these changes
Feb 27, 2025
d820099
to
2005233
Compare
Parking this as a draft before I manually test it again after this code change |
4169d09
to
1cf5056
Compare
@StephenButtolph the PR is ready again for re-review. |
ff2344e
to
d1b4d92
Compare
This commit adds a healthcheck that fails if: - The node is a validator of the primary network - The node has zero ingress connections - Enough time (defaults to 20 min) has passed since the node finished bootstrapping Signed-off-by: Yacov Manevich <yacov.manevich@avalabs.org>
Signed-off-by: Yacov Manevich <yacov.manevich@avalabs.org>
Signed-off-by: Yacov Manevich <yacov.manevich@avalabs.org>
Signed-off-by: Yacov Manevich <yacov.manevich@avalabs.org>
Signed-off-by: Yacov Manevich <yacov.manevich@avalabs.org>
cf8c3d6
to
9c9ca56
Compare
Signed-off-by: Yacov Manevich <yacov.manevich@avalabs.org>
StephenButtolph
approved these changes
Mar 12, 2025
cam-schultz
pushed a commit
that referenced
this pull request
Mar 24, 2025
Signed-off-by: Yacov Manevich <yacov.manevich@avalabs.org>
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Why this should be merged
This PR adds a new health check that monitors if the node is a primary network validator and has no node connecting to it.
Our current health checks only monitor whether the node is connected to enough stake. However, a node may be connected to enough stake by opening egress connections to all other validators, but also at the same time, not be reachable from any validator or non-validator node.
This health check does just that - it alerts when:
How this works
A node can establish a connection through two pathways:
We count the number of connections of the latter type via an atomic counter, and add a health check that checks the three conditions mentioned earlier.
How this was tested
Added unit tests and also did a manual test:
I launched a validator node on Fuji (testnet) and I added it to the validators:
I then caused all other nodes to disconnect from it by blocking port 9651:
After a short time, the health check started failing:
and probing the health check API showed it:
I then restored connectivity via deleting the
iptables
rule:and observed that the health check recovered:
Need to be documented in RELEASES.md?
Added a section about the health check added.