-
Notifications
You must be signed in to change notification settings - Fork 820
Add gossip interval flag for alertmanger #3667
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add gossip interval flag for alertmanger #3667
Conversation
dbc2fe0
to
924699a
Compare
pkg/alertmanager/multitenant.go
Outdated
@@ -117,6 +118,7 @@ func (cfg *MultitenantAlertmanagerConfig) RegisterFlags(f *flag.FlagSet) { | |||
|
|||
f.StringVar(&cfg.ClusterBindAddr, "cluster.listen-address", defaultClusterAddr, "Listen address for cluster.") | |||
f.StringVar(&cfg.ClusterAdvertiseAddr, "cluster.advertise-address", "", "Explicit address to advertise in cluster.") | |||
f.DurationVar(&cfg.ClusterGossipInterval, "cluster.gossip-interval", cluster.DefaultGossipInterval, "The interval between sending messages that need to be gossiped that haven't been able to piggyback on probing messages. If this is set to zero, non-piggyback gossip is disabled. By lowering this value (more frequent) gossip messages are propagated across cluster more quickly at the expense of increased bandwidth.") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not too sure why we are not using an -alertmanager.
prefix for all of those config flags, as currently it might mislead users to think it is affecting other parts of the system to (e.g. memberlist)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is non-piggyback gossip?
If we allow configuring this value (which I think is already pretty low for gossip, and wouldn't recommend going to lower values), we should also enable configuration of pull/push interval for faster state convergence. (In addition to that, configuring GossipNodes would be useful too, but that doesn't seem configurable in AM).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not too sure why we are not using an -alertmanager. prefix for all of those config flags, as currently it might mislead users to think it is affecting other parts of the system to (e.g. memberlist)
I think it's a bug and we should fix it (but need to address it in a separate PR). I've opened an issue in the meanwhile: #3670
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is non-piggyback gossip?
That was a copy paste from the memberlist code comment. I think it is confusing more than helping so I reverted to the alertmanager description.
pull/push interval
Added that
Gossip Nodes
Yes that seems to be fixed at quorum or 3 whatever is bigger: https://github.com/prometheus/alertmanager/blob/ce108378d4c8d580804452dcc3a31222067793b2/cluster/cluster.go#L184-L187
Aliging to #3677
I have added some comments here (but meanwhile I used HA in the same way)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, thanks!
pkg/alertmanager/multitenant.go
Outdated
@@ -117,6 +118,7 @@ func (cfg *MultitenantAlertmanagerConfig) RegisterFlags(f *flag.FlagSet) { | |||
|
|||
f.StringVar(&cfg.ClusterBindAddr, "cluster.listen-address", defaultClusterAddr, "Listen address for cluster.") | |||
f.StringVar(&cfg.ClusterAdvertiseAddr, "cluster.advertise-address", "", "Explicit address to advertise in cluster.") | |||
f.DurationVar(&cfg.ClusterGossipInterval, "cluster.gossip-interval", cluster.DefaultGossipInterval, "The interval between sending messages that need to be gossiped that haven't been able to piggyback on probing messages. If this is set to zero, non-piggyback gossip is disabled. By lowering this value (more frequent) gossip messages are propagated across cluster more quickly at the expense of increased bandwidth.") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is non-piggyback gossip?
If we allow configuring this value (which I think is already pretty low for gossip, and wouldn't recommend going to lower values), we should also enable configuration of pull/push interval for faster state convergence. (In addition to that, configuring GossipNodes would be useful too, but that doesn't seem configurable in AM).
924699a
to
c238f33
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, but I've a comment about config naming.
@simonswine We merged #3677. Could you rebase |
The interval between sending messages that need to be gossiped that haven't been able to piggyback on probing messages. If this is set to zero, non-piggyback gossip is disabled. By lowering this value (more frequent) gossip messages are propagated across cluster more quickly at the expense of increased bandwidth. Signed-off-by: Christian Simon <simon@swine.de>
c238f33
to
714f10c
Compare
@pracucci this is ready for another review now |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, you did great. Thanks!
Signed-off-by: Christian Simon simon@swine.de
What this PR does:
This allows to configure a different gossip interval for alertmanager. This allow to adapt the default of
200ms
to different values, which can help to address delayed gossip notifications in high volume and latency deployments.Checklist
CHANGELOG.md
updated - the order of entries should be[CHANGE]
,[FEATURE]
,[ENHANCEMENT]
,[BUGFIX]