-
Notifications
You must be signed in to change notification settings - Fork 188
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Metrics framework #926
Metrics framework #926
Conversation
Signed-off-by: Cody Littley <cody@eigenlabs.org>
Signed-off-by: Cody Littley <cody@eigenlabs.org>
Signed-off-by: Cody Littley <cody@eigenlabs.org>
Signed-off-by: Cody Littley <cody@eigenlabs.org>
Signed-off-by: Cody Littley <cody@eigenlabs.org>
Signed-off-by: Cody Littley <cody@eigenlabs.org>
Signed-off-by: Cody Littley <cody@eigenlabs.org>
Signed-off-by: Cody Littley <cody@eigenlabs.org>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would be good to validate if we can migrate an existing component to use the metrics framework. Or is the plan for only the newer V2 components to utilize it?
common/metrics/config.go
Outdated
// for this list, use the format "metricName:metricLabel" if the metric has a label, or just "metricLabel" | ||
// if the metric does not have a label. Any fully qualified metric name that matches exactly with an entry | ||
// in this list will be blacklisted (i.e. it will not be reported). | ||
MetricsBlacklist []string |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should metrics filtering be something that is taken care of by the application or the component that is collecting the metrics?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When we were preparing for the (since abandoned) traffic generator, this seemed like a useful concept. I had added a large number of metrics, but there was concern that some of them were low bang for the buck (since it costs us $$$ to store them). We were planning on disabling a bunch of metrics with configuration changes, and turning them on in the future if we ever had an issue where they would be useful for debugging.
That being said, if people don't think this is a useful feature, it would be fairly straight forward to remove. What do others think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah all I'm saying is that it's probably possible to filter the metrics, logs in a similar manner with the grafana agent.
This is pretty useful now though because we haven't figure out how to do that in the grafana agent. Regardless, if we want to save more money we would need to learn how to do it in the grafana agent because we're not always running applications that allow metrics filtering in this manner.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Makes sense. I have removed the metrics blacklisting feature from this framework.
My plan was to leave existing non-v2 code alone. This doesn't really change the core framework and backend, it's mostly just a convenience wrapper. Are there any existing things that use metrics that will persist after v2 is enabled and v1 is deprecated? |
Signed-off-by: Cody Littley <cody@eigenlabs.org>
The churner would be one example |
Signed-off-by: Cody Littley <cody@eigenlabs.org>
Signed-off-by: Cody Littley <cody@eigenlabs.org>
Signed-off-by: Cody Littley <cody@eigenlabs.org>
Signed-off-by: Cody Littley <cody@eigenlabs.org>
Signed-off-by: Cody Littley <cody@eigenlabs.org>
Signed-off-by: Cody Littley <cody@eigenlabs.org>
Signed-off-by: Cody Littley <cody@eigenlabs.org>
@dmanc I don't think modifying the churner belongs as a part of this PR. Here's a draft PR that contains the changes to the churner. If we merge this PR, I will open another with just the changes to the churner's metrics: #934 The only difference between the existing churner metrics and the new churner metrics, after this change, will be that the metric Note that there may be some deviations in the metrics framework code between this PR and the churner PR I link to above (since these branches are being worked in parallel). Please treat this PR as the source of truth for the metrics framework code. |
v := reflect.ValueOf(labelTemplate) | ||
t := v.Type() | ||
labeler.templateType = t | ||
for i := 0; i < t.NumField(); i++ { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This panics if labelTemplate
is not a struct
We should check if v.Kind() == reflect.Struct
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good idea, done.
if v.Kind() != reflect.Struct {
return nil, fmt.Errorf("label template must be a struct")
}
As an aside, add the reflection library to the list of things that make me annoyed at the people who designed golang. One of the core principals is that things should return errors, not panic. This is exactly the sort of situation where returning errors would be way better than panicking.
} | ||
|
||
if !m.isAlive.Load() { | ||
return errors.New("metrics server already stopped") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should it still call m.server.Close()
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unless I'm missing something, if we trigger this statement the server will have already been closed.
// labelMaker encapsulates logic for creating labels for metrics. | ||
type labelMaker struct { | ||
keys []string | ||
emptyValues []string |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
how is emptyValues
used?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If a label is set up with a non-null template, but no labels are provided at runtime, then this emptyValues
list is passed go prometheus. Prometheus returns an error if you don't pass in the expected number of flags.
func (l *labelMaker) extractValues(label any) ([]string, error) {
// ...
if label == nil {
return l.emptyValues, nil
}
We could create a new empty list each time, but I thought it would be more resource efficient to just reuse the same empty list over and over.
description string, | ||
labelTemplate any) (CountMetric, error) { | ||
|
||
labeler, err := newLabelMaker(labelTemplate) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should we only create labeler when labelTemplate is not nil?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The labeler becomes a no-op when the label template is nil. The purpose of using this pattern was to simplify the business logic a little. Instead of wrapping each use of the labeler in an if statement depending on whether the labeler is enabled or not, we can instead use the labeler in the same way regardless of whether or not we have a non-nil template.
This being said, if you don't like this pattern, let me know and I'll make the suggested change.
l = label[0] | ||
} | ||
|
||
values, err := m.labeler.extractValues(l) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what if metric has no labels?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The labeler handles this edge case.
- When
l
isnil
,m.labeler.extractValues(l)
returns a list of empty strings with length equal to the number of flags in the template. - If the template is nil,
m.labeler.extractValues(l)
returns an empty list.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see. Does m.vec.WithLabelValues(values...)
handle empty values gracefully?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
m.vec.WithLabelValues()
requires that the number of provided values be exactly equal to the number of registered keys. It's ok if a value is an empty string, but the number of strings must match.
common/metrics/latency_metric.go
Outdated
nanoseconds := float64(latency.Nanoseconds()) | ||
milliseconds := nanoseconds / 1e6 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why don't we use Milliseconds()
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
converted to float64(time.Millisecond)
common/metrics/metrics.go
Outdated
// | ||
// The label parameter accepts zero or one label. If the label type does not match the template label type provided | ||
// when creating the metric, an error will be returned. | ||
Increment(label ...any) error |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ideally, metrics methods don't return any errors. Applications shouldn't need to consider errors from methods like this
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've removed the error, it will now log when it encounters a problem. Is that ok? Or should we instead panic?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's probably sufficient. We shouldn't panic over logging issues
Signed-off-by: Cody Littley <cody@eigenlabs.org>
Signed-off-by: Cody Littley <cody@eigenlabs.org>
l = label[0] | ||
} | ||
|
||
values, err := m.labeler.extractValues(l) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see. Does m.vec.WithLabelValues(values...)
handle empty values gracefully?
common/metrics/metrics.go
Outdated
// | ||
// The label parameter accepts zero or one label. If the label type does not match the template label type provided | ||
// when creating the metric, an error will be returned. | ||
Increment(label ...any) error |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's probably sufficient. We shouldn't panic over logging issues
Why are these changes needed?
This is a small wrapper around promethius that will significantly reduce boilerplate code when dealing with metrics. As an added perk, it is also capable of automatically generating a markdown containing documentation on all metrics the system is using.
Checks