-
Notifications
You must be signed in to change notification settings - Fork 2.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Switch to Prometheus decoders for Prometheus custom metric endpoint parsing #1473
Switch to Prometheus decoders for Prometheus custom metric endpoint parsing #1473
Conversation
|
For reference, the parser is located in https://github.com/prometheus/common/tree/master/expfmt |
e3f72e9
to
80e3f7b
Compare
@fabxc Would be you be able to take a look if you have some time? Switched to expfmt decoders & seems to work nicely (few tweaks to tests but nothing fundamental changing in functionality). |
@jimmidyson I'm aware of the issue. I'll try to have it sorted out by EOD |
@k8s-bot test this |
80e3f7b
to
d5c95e3
Compare
FloatValue: metVal, | ||
Timestamp: currentTime, | ||
FloatValue: float64(sample.Value), | ||
Timestamp: sample.Timestamp.Time(), | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure I entirely understand this. We seem to ignore the labels of the samples entire and just append timestamp/value pairs for the metric name?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah yeah I forgot to ask what the purpose of this label was. Considering there are likely to be multiple labels on each metric, what should this be set to? @timstclair?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@timstclair @DirectXMan12 What should the value of label be in this case? A concatenation of all Prometheus labels?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A concatenation of all Prometheus labels?
Yeah, seems like it. That seems... poor. It works ok for one label, but multiple labels could come in any order. If we wanted to make sure this was comparable, we could sort them first. Doesn't look like the label is actually carried through to the Kube summary API.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I agree this is not ideal. Unfortunately this requires an API change to fix properly. Can you file an issue? Sorting + concatenating sounds good for now. Check the allowed symbols on prometheus labels to decide how to concatenate.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In label values all unicode runes are allowed. In Prometheus itself we hence us \0xff
as an invalid rune as a separator for unambiguous concatenation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hope what I've done it right... always confused by unicode stuff like this. Added tests too.
"time" | ||
|
||
dto "github.com/prometheus/client_model/go" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what does dto
stand for?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Data transfer object. Kinda nasty, but better than go
and kinda what we use everywhere else in Prometheus land.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
none-the-less, something more expressive might be in order (perhaps rawmodel
or rawtypes
?)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
for _, line := range lines { | ||
if line == "" { | ||
var ( | ||
decSamples = make(model.Vector, 0, 50) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How'd you pick 50? What happens if this overflows?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's just the initial capacity. The slice is appended to, so it just allocates more if we go beyond.
50 SGTM. It's a number of metrics we can expect from virtually any endpoint, grow allocations beyond that are fine. But we basically save 6 re-allocations for the general case.
Plus, the slice is just reset at the end and re-used.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you should probably add a comment noting that it's just a guess at a useful initial capacity
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
while you're at it, the affected functions should probably have proper godoc.
case dto.MetricType_GAUGE: | ||
return v1.MetricGauge | ||
default: | ||
return v1.MetricType(t.String()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
uhh..., if people are expecting this to be only one of these set values, this is going to throw them for a loop.
v1.MetricType
seems to be an implicit enumeration (since Go doesn't have explicit enumerations :-/), so I really don't think it's a good idea to arbitrarily add extra values to the enum at runtime.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Types aren't arbitrary, it's just that cadvisor doesn't support the same metric types as Prometheus & there's no way to map e.g. a Prometheus histogram to a cadvisor metric type.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Still not sure how to handle this nicely... This PR doesn't fix the existing problem, but doesn't make it worse so will raise a separate issue to figure out how to handle it, but assume that to be in a separate PR.
} | ||
if _, ok := collector.metricsSet[name]; collector.metricsSet != nil && !ok { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this could really use a comment to indicate why it's checking this (e.g. // skip any metrics not on our list to collect
). I realize that the old code doesn't have a comment, but we should fix that (I blame myself :-P).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
for _, line := range lines { | ||
if line == "" { | ||
var ( | ||
decSamples = make(model.Vector, 0, 50) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you should probably add a comment noting that it's just a guess at a useful initial capacity
FloatValue: metVal, | ||
Timestamp: currentTime, | ||
FloatValue: float64(sample.Value), | ||
Timestamp: sample.Timestamp.Time(), | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A concatenation of all Prometheus labels?
Yeah, seems like it. That seems... poor. It works ok for one label, but multiple labels could come in any order. If we wanted to make sure this was comparable, we could sort them first. Doesn't look like the label is actually carried through to the Kube summary API.
"time" | ||
|
||
dto "github.com/prometheus/client_model/go" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
none-the-less, something more expressive might be in order (perhaps rawmodel
or rawtypes
?)
d5c95e3
to
078eed5
Compare
078eed5
to
923dbc5
Compare
Ready for a second round review. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this LGTM.
Some of the work that this PR is doing is reversing or working around the work done by the SampleDecoder (e.g. the SampleDecoder converts labels from pairs to a map, and then we convert back to pairs, and the whole "placing samples into a slice and then into a map" is a bit of extra overhead). Unfortunately, just using the Decoder directly requires a decent bit of boilerplate, so the code in this PR seems reasonable.
case rawmodel.MetricType_GAUGE: | ||
return v1.MetricGauge | ||
default: | ||
return v1.MetricType(t.String()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please file a follow-up issue about issue about this, because people probably won't expect to see any values beyond what's defined in v1
(whether that means defining more types, or an "other" type, or something else)
@timstclair What's left to get this merged? |
LGTM, feel free to merge. |
Fixes #1472
@fabxc This is a hand-rolled parser to collect application metrics from containers. Forgetting the question or application metrics & cadvisor, I assume the better approach for this would be to use Prometheus' own parser? If so, can you point me to it please?