-
Notifications
You must be signed in to change notification settings - Fork 810
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Alertmanager: Replicate state using the Ring #3839
Alertmanager: Replicate state using the Ring #3839
Conversation
a833fdc
to
f2ece22
Compare
da19b35
to
5e83714
Compare
4ebdf3f
to
a67a6d7
Compare
Fixes #2650 |
08c3a32
to
27686f0
Compare
796ab51
to
e747559
Compare
pkg/alertmanager/alertmanager.go
Outdated
@@ -189,14 +239,16 @@ func New(cfg *Config, reg *prometheus.Registry) (*Alertmanager, error) { | |||
} | |||
|
|||
am.dispatcherMetrics = dispatch.NewDispatcherMetrics(am.registry) | |||
|
|||
am.state.WaitReady() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Two questions:
- Why is it needed now, while wasn't before?
New()
is called while taking theam.alertmanagersMtx.Lock()
inMultitenantAlertmanager.setConfig()
. What are the implications of waiting for ready? How long could take to before ready?
Generally speaking I believe it's a bad design "waiting for something" in a constructor function (like this New()
) but I would like to better understand why we need it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We may remove WaitReady()
call from this PR to unblock it and address it separately.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Generally speaking I believe it's a bad design "waiting for something" in a constructor function (like this
New()
) but I would like to better understand why we need it.
100%.
(Personally I would like to see this to follow our Services
model, but I am not sure it makes sense here.)
type State interface { | ||
AddState(string, cluster.State, prometheus.Registerer) cluster.ClusterChannel | ||
Position() int | ||
WaitReady() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
WaitReady()
is a blocking call, and should take context as argument so that caller can cancel/timeout waiting if needed. That also implies returning error to communicate success.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The implementation for WaitReady
and Settle
are part of the next PR, if you don't mind I'd like to leave it out of this PR for now.
partialMerges: prometheus.NewDesc( | ||
"cortex_alertmanager_partial_state_merges_total", | ||
"Number of times we have received a partial state to merge for a key.", | ||
[]string{"key"}, nil), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
By using key
as label, we will have at least 2 labels per user, right? (one for notifications, one for silences). Do we need so many new metrics? Would it make sense to use "user" only?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We don't have user as part of these labels because key
already includes the user. It's a combination of prefix + userID
. So by using key we "technically get both". Even though it breaks the nomenclature (of always using user
).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We have per-user registries, so we could do per-user output, aggregating over all keys. WDYT?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is that different from the current approach? An aggregation across all keys is not possible if the keys are per user e.g. sil:user-3
or nfl:user-2
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If I understand it correctly, we have per-user registries, which only use "key" label. During alertmanagerMetrics.Collect
, when then call SendSumOfCountersWithLabels
with key
label. We could instead call SendSumOfCountersPerUser
(eg. data.SendSumOfCountersPerUser(out, m.partialMerges, "alertmanager_partial_state_merges_total")
), which would return sum(alertmanager_partial_state_merges_total)
per user-registry, and then add user
label to the output. I think that would be enough. WDYT?
|
||
// Settle waits until the alertmanagers are ready (and sets the appropriate internal state when it is). | ||
// The idea is that we don't want to start working" before we get a chance to know most of the notifications and/or silences. | ||
func (s *state) Settle(ctx context.Context, _ time.Duration) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Settle ignores context currently. It should not do that, and should report error back if context is finished before settling has finished.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm leaving the implementation of WaitReady
and Settle
to the next PR - is it OK If we skip it for now? These two involve a full state replication and I fear we need the implementation of those to make sense of the big picture.
} | ||
|
||
s.stateReplicationTotal.WithLabelValues(p.Key).Inc() | ||
ctx := context.Background() //TODO: I probably need a better context |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hint: If state
was a services.Service
(= object with lifecycle in Cortex), it would already have its own context. :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you help me break this down?
I have not seen us use services within services (That's different from services with sub-services) and even if we do, at its current state it seems like overkill. It's more code to manage the service, in particular, because the State itself it's satisfied by two different things (our state
or the upstream Peer
) which in my eyes brings us little benefits.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm only talking about state
being a service (and not Peer
), since it runs background process and its lifecycle needs to be managed.
Another possibility is to use custom context initialized in new, and canceled when stopping.
partialStateMergesTotal: promauto.With(r).NewCounterVec(prometheus.CounterOpts{ | ||
Name: "alertmanager_partial_state_merges_total", | ||
Help: "Number of times we have received a partial state to merge for a key.", | ||
}, []string{"key"}), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need per-key metrics or would per-user (i.e. per-state) be enough?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
per-key includes both user and state so it would always be at most 2 per user.
3c92a47
to
ff857d1
Compare
ff857d1
to
dfc210a
Compare
partialMerges: prometheus.NewDesc( | ||
"cortex_alertmanager_partial_state_merges_total", | ||
"Number of times we have received a partial state to merge for a key.", | ||
[]string{"key"}, nil), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We have per-user registries, so we could do per-user output, aggregating over all keys. WDYT?
func (s *state) WaitReady() { | ||
//TODO: At the moment, we settle in a separate go-routine (see multitenant.go as we create the Peer) we should | ||
// mimic that behaviour here once we have full state replication. | ||
s.Settle(context.Background(), time.Second) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We don't need to copy WaitReady
from upstream. We can define our own interface with what makes sense for us, and adapt upstream type to our interface via a wrapper.
Introducing context to upstream's WaitReady
seems pretty straightforward too.
7a640d1
to
57f6670
Compare
Needs #3903 |
230147a
to
884245e
Compare
} | ||
|
||
s.stateReplicationTotal.WithLabelValues(p.Key).Inc() | ||
ctx := context.Background() //TODO: I probably need a better context |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm only talking about state
being a service (and not Peer
), since it runs background process and its lifecycle needs to be managed.
Another possibility is to use custom context initialized in new, and canceled when stopping.
Alertmanager typically uses the memberlist gossip based protocol to replcate state across replicas. In cortex, we used the same fundamentals to provide some sort of high availability mode. Now that we have support for sharding instances across many machines, we can leverage the ring to find the corresponding instances and send the updates via gRPC. Signed-off-by: gotjosh <josue@grafana.com>
884245e
to
36286e5
Compare
Signed-off-by: gotjosh <josue@grafana.com>
a5de453
to
ce7b3cb
Compare
Signed-off-by: gotjosh <josue@grafana.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM in current state, with caveat that sharding-support is still work in progress (don't use it yet! :)). Non-sharding code seems to be unaffected though.
* Alertmanager: Replicate state using the Ring Alertmanager typically uses the memberlist gossip based protocol to replcate state across replicas. In cortex, we used the same fundamentals to provide some sort of high availability mode. Now that we have support for sharding instances across many machines, we can leverage the ring to find the corresponding instances and send the updates via gRPC. Signed-off-by: gotjosh <josue@grafana.com> * Appease the linter and wordsmithing Signed-off-by: gotjosh <josue@grafana.com> * Always wait for the missing metrics Signed-off-by: gotjosh <josue@grafana.com>
What this PR does:
Alertmanager typically uses the memberlist gossip based protocol to
replicate state across replicas. In cortex, we used the same fundamentals
to provide some sort of high availability mode.
Now that we have support for sharding instances across many machines, we
can leverage the ring to find the corresponding instances and send the
updates via gRPC.
This is part of the proposal #3574
And follows up #3664 , #3671
Marking it as a draft as I have a few todos left that I need to address but the logic is pretty much what you see on the tin at the moment.
Checklist
CHANGELOG.md
updated - the order of entries should be[CHANGE]
,[FEATURE]
,[ENHANCEMENT]
,[BUGFIX]