-
Notifications
You must be signed in to change notification settings - Fork 820
Implement periodic writing of alertmanager state to storage. #4031
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The logic makes sense to me! I left a couple of nits. Waiting for the final PR to do a deeper review but I haven't seen any issue so far 👏
When ring-based/sharding replication is enabled, the alertmanager state (silences, notification log) is periodically written to object storage so that it can be used to recover from an all-replica outage. Only one of the replicas is responsible for writing the state (position 0). Signed-off-by: Steve Simpson <steve.simpson@grafana.com>
Signed-off-by: Steve Simpson <steve.simpson@grafana.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good job! LGTM (modulo a couple of nits)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Everything looks good to me, just besides the minor point of the wording discussion about the documentation part.
Thanks!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. Nice small readable PR!
Signed-off-by: Steve Simpson <steve.simpson@grafana.com>
Signed-off-by: Steve Simpson <steve.simpson@grafana.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for addressing my feedback! 🙏
What this PR does:
When ring-based/sharding replication is enabled, the alertmanager state
(silences, notification log) is periodically written to object storage
so that it can be used to recover from an all-replica outage. Only one
of the replicas is responsible for writing the state (position 0).
Checklist
Documentation addedCHANGELOG.md
updated - the order of entries should be[CHANGE]
,[FEATURE]
,[ENHANCEMENT]
,[BUGFIX]