-
Notifications
You must be signed in to change notification settings - Fork 5.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix prom client output #3337
Fix prom client output #3337
Conversation
ae9866f
to
9778252
Compare
To prevent rejection, does the HELP string need to be the same or would it be enough if the TYPE string was equal? |
Yes, unfortunately the HELP string needs to be the same. |
@jdoupe I used your ideas and opened an alternative pull request #3351, it does both more and less than this pull request, but is more in line with how I want to address the issue (using the existing metric value types). There is more info over on that pull request, can you take a look and tell me what you think? |
I agree with your approach, and amazingly I did the same thing yesterday (make the prom input not drop the counter and gauge types) - you beat me to the punch. However, my main goal was to make the histogram and summary types "work" as they are my current problem. |
If you want you can force push your new PR here, or just close this and open a new one. |
9778252
to
bf7b1a9
Compare
Force pushed... yup, needs some clean up. I'll try to get some more time on it later or tomorrow. |
03c734c
to
771832f
Compare
Rebased after merge of #3350... |
accumulator.go
Outdated
@@ -28,6 +28,18 @@ type Accumulator interface { | |||
tags map[string]string, | |||
t ...time.Time) | |||
|
|||
// AddCounter is the same as AddFields, but will add the metric as a "Counter" type |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Comment needs updated
accumulator.go
Outdated
tags map[string]string, | ||
t ...time.Time) | ||
|
||
// AddCounter is the same as AddFields, but will add the metric as a "Counter" type |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Comment needs updated
@@ -226,6 +254,36 @@ func CreateSampleID(tags map[string]string) SampleID { | |||
return SampleID(strings.Join(pairs, ",")) | |||
} | |||
|
|||
func (p *PrometheusClient) createFamily(pvt prometheus.ValueType, point telegraf.Metric, sample *Sample, mname string, sampleID SampleID) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This function doesn't always do what it says it does "createFamily", and its does a lot more. I think it probably needs to be something like func addMetricFamily(name string) MetricFamily
and only do the first half and then another function to addSample(
.
@@ -39,7 +45,10 @@ type MetricFamily struct { | |||
// Samples are the Sample belonging to this MetricFamily. | |||
Samples map[SampleID]*Sample | |||
// Type of the Value. | |||
ValueType prometheus.ValueType | |||
PromValueType prometheus.ValueType |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe we can remove the prometheus.ValueType then?
@@ -242,86 +300,123 @@ func (p *PrometheusClient) Write(metrics []telegraf.Metric) error { | |||
labels[sanitize(k)] = v | |||
} | |||
|
|||
// Prometheus doesn't have a string value type, so convert string | |||
// fields to labels. | |||
for fn, fv := range point.Fields() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this loop can stay, since it shouldn't interact with the different types and its theoretically possible to have a string added to a summary.
Side note: This is a symptom of how the ValueType was added later to our metrics, and how the types really should be per field not per metric. We will probably try to redo the Metrics class in the next year but for now we have to deal with it.
for fn, fv := range point.Fields() { | ||
switch fn { | ||
case "sum": | ||
sum = fv.(float64) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We need to guard against this because we don't have full control that it will always be true, and in the other spots where we type assert.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This one really got me. Type assertions suck, and switching on .(type) seemed to be the only solution, but as you noted, we don't have control what the type would be which would result in a pretty long set of cases. After rolling it around awhile (and not being a real in depth go guy), would doing something like strconv.ParseFloat(fmt.Sprint(fv), 64)
be too silly or performance degrading? (Convert the value to a string because you don't need to know the type for that, and then convert the string to a float64. If it's not a number, presumably we've already converted it to a label.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wouldn't do the string conversion, but you could do the two value return form: f, ok := i.(float64)
. I guess we need to decide what we are going to do if a field is not a float (do we just drop it?).
You could also try switching over the type instead of the field name, this might end up reading better.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The current functionality is to drop anything but an int64 or float64. (https://github.com/influxdata/telegraf/blob/master/plugins/outputs/prometheus_client/prometheus_client.go#L254)
That seems to have worked thus far, perhaps we just keep going with it until someone finds otherwise. I've gone ahead and made the summary and histogram value handling the same as the gauge and counter handling.
default: | ||
metric, err = prometheus.NewConstMetric(desc, family.PromValueType, sample.Value, labels...) | ||
if err != nil { | ||
log.Printf("E! Error creating prometheus metric, "+ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could share the error handling and log message after the switch
771832f
to
54fc42d
Compare
Tests failed. Wasn't me! |
54fc42d
to
e7ce89b
Compare
Working on fixing / adding tests... |
64c7f7b
to
cad142c
Compare
cad142c
to
ce88bab
Compare
Ready for review once again! |
Required for all PRs:
UPDATE: After the merge of #3351, this PR now only addresses the handling of history and summary types.
This contains a couple fixes for the prometheus_client output.
For example, when using the prometheus input plugin, field names of "counter" and "gauge" were created (instead of just "value"), and then the metrics would have "counter" and "gauge" appended to them on the output.
Upon de-mangling the metric names, it became apparent that when using the prometheus input in association with the prometheus output that the metrics created by the prometheus client library were in direct conflict with the same metrics from other scraped prometheus clients. (go_* and process_*). The conflict is not from the metric name itself, but the combination of the metric and the "HELP" text. The internal prometheus client library would generate the metrics with its default (and appropriate) HELP text, and then when a prometheus target was scraped and output by the prometheus client library in the plugin, it gives those same metric names a generic HELP text (because that text isn't saved anywhere in the telegraf data structures). Since the original HELP text doesn't match the generic HELP text for the same metric, the prometheus client library rejects it.
The fix presented is to disable to the collection of the internal client library metrics. They would be a bit conflated with the main telegraf process anyway, and not sure if they'd be directly meaningful.
However, I've put the unregistering of the metrics under the control of a configuration variable, which is false by default so as not to adversely affect existing installations any more than necessary.