-
Notifications
You must be signed in to change notification settings - Fork 820
Add a metric for the compaction interval config value. #4040
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add a metric for the compaction interval config value. #4040
Conversation
@@ -302,6 +303,10 @@ func newCompactor( | |||
Name: "cortex_compactor_tenants_processing_failed", | |||
Help: "Number of tenants failed processing during the current compaction run. Reset to 0 when compactor is idle.", | |||
}), | |||
compactionRunInterval: promauto.With(registerer).NewGauge(prometheus.GaugeOpts{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would this be a good candidate for a GaugeFunc metric?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good question. I supposed it would be if we currently, or plan to, support live reloading of this config value (and others obviously) on the compactor. If not, would GaugeFunc get us anything over the way I'm doing it here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No big preference, but I would set the metric value here in newCompactor()
and not at the end of starting()
. If I'm not missing anything, metrics can be scraped while starting as well.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No preference here either, I don't think a GaugeFunc would buy you anything beyond the config reload thing you mentioned. My thinking was really just to remove the call to set the value of the gauge which seemed a bit far away from the place the metric was defined 👍
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM (modulo a nit)
@@ -302,6 +303,10 @@ func newCompactor( | |||
Name: "cortex_compactor_tenants_processing_failed", | |||
Help: "Number of tenants failed processing during the current compaction run. Reset to 0 when compactor is idle.", | |||
}), | |||
compactionRunInterval: promauto.With(registerer).NewGauge(prometheus.GaugeOpts{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No big preference, but I would set the metric value here in newCompactor()
and not at the end of starting()
. If I'm not missing anything, metrics can be scraped while starting as well.
pkg/compactor/compactor.go
Outdated
@@ -302,6 +303,10 @@ func newCompactor( | |||
Name: "cortex_compactor_tenants_processing_failed", | |||
Help: "Number of tenants failed processing during the current compaction run. Reset to 0 when compactor is idle.", | |||
}), | |||
compactionRunInterval: promauto.With(registerer).NewGauge(prometheus.GaugeOpts{ | |||
Name: "cortex_compactor_run_interval_seconds", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe cortex_compactor_compaction_interval_seconds
to keep it consistent with the config option name compaction_interval
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@cstyan Before merging, could you add a CHANGELOG entry, please? Looks like an [ENHANCEMENT]
.
Changelog entry added @pracucci @pstibrany @56quarters thanks for the reviews |
Signed-off-by: Callum Styan <callumstyan@gmail.com>
based on review comments. Signed-off-by: Callum Styan <callumstyan@gmail.com>
Signed-off-by: Callum Styan <callumstyan@gmail.com>
Adds a metric to the compactor that exposes the configured
CompactionInterval
as a gauge, in seconds. The intention is to be able to use this metric, the last successful compaction timestamp metric, andtime()
to more accurately calculate the # of compactions in a row that have failed. This would improve this alert in cortex-jsonnet which is hardcoded to look for > 1 failures in a 2h time period.I added a quick check to the first test I found that actually started the compactor, just to confirm the metric was being set. If you'd like a more indepth test, or to move that check to it's own test, just let me know.
Signed-off-by: Callum Styan callumstyan@gmail.com
Checklist
CHANGELOG.md
updated - the order of entries should be[CHANGE]
,[FEATURE]
,[ENHANCEMENT]
,[BUGFIX]