-
Notifications
You must be signed in to change notification settings - Fork 624
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implement LabelSet for metrics #258
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have a few questions, primarily from lack of context.
if len(labels) == 0: | ||
return EMPTY_LABEL_SET | ||
sorted_labels = OrderedDict(sorted(labels.items())) | ||
# Uses statsd encoding for labels |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
any particular motivation for encoding this way?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good question! I am following the Go implementation which apparently uses what statsd is using.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This might need more eyes. I don't know why we'd use the statsd encoding here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The type of encoding is used for potential optimizations that can occur for corresponding exporters that might be used. From the discussion in the metrics-spec gitter, the statsd encoding is a default implementation that is used. Until we implement exporters, the benefits of this won't be that clear. As well in the Go SDK implementation, there is a concept of a LabelEncoder
, which allows plugging in custom encoding for labelsets. Take a look at labelencoder.go
in this PR
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems like it would be much simpler to make LabelSet
hashable and just return the LabelSet
itself without doing any encoding here. The meter shouldn't have an opinion on labelset encoding, that's a problem for exporters.
If there's no benefit of doing the encoding up-front until we have a statsd exporter, I think we should wait to add this until we add the exporter.
"""See `opentelemetry.metrics.LabelSet.""" | ||
|
||
def __init__(self, labels: Dict[str, str] = None, encoded: str = ""): | ||
self.labels = labels |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it required to use a particular LabelSet implementation with a specific Meter implementation? I notice that there's no methods or properties for the API, but this is exposing the "labels" and "encoded" properties which are being utilized.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That is a good question. Yes, a particular LabelSet implementation should only work with a specific Meter implementation. I've included validation in the metric methods for this. If they do not match, we return the empty label set.
"%s:%s" % (key, value) for (key, value) in sorted_labels.items() | ||
) | ||
# If LabelSet exists for this meter in memory, use existing one | ||
if not self.labels.get(encoded): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this code would be better in LabelSet itself. The encoded field could be implemented as a property which lazy-populates the value in the same way that this is done.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wouldn't that mean the LabelSet
instances would need to be able to access the cache of label encodings labels
unique per Meter
instance somehow? Would that really be cleaner?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree that a lazy LabelSet.encoded
wouldn't work since the point is to prevent instantiating the LabelSet
, but the encoding code shouldn't be here either.
counter = metrics.Counter("name", "desc", "unit", float, label_keys) | ||
counter.add(label_values, 1.0) | ||
handle = counter.get_handle(label_values) | ||
counter.add(label_set, 1.0) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
will these methods also support standard dicts as well as LabelSets? I think we should.
Looking at https://github.com/open-telemetry/oteps/pull/49/files it looks like the primary motivation for LabelSet is performance. I don't see a lot here that will enable better performance, and it's adding another layer of abstraction which introduces complexity.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The primary motivation for LabelSet is performance, yes. It is the canonicalization of the label keys/values into unique identifiers that is very expensive. This is why LabelSets are created once using specific label keys/values, which are then canonicalized once, and then are able to be used multiple times without having to perform this expensive action. In the implementation, this is demonstrated by using a dictionary of encoded strings, to find the corresponding metric handle that was created from the canonicalized label keys/values.
Your suggestion of supporting standard dicts is valid though, the underlying logic will just take in the dict and construct a Labelset. This is more for convenience rather than an performance improvement. However, the Go and JavaScript implementations only allow passing in LabelSets for now so I'd like to stay consistent. Good idea for the future though.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not to get into the project approach, but I think half the purpose of implementing in multiple languages is to identify patterns that work or don't for various languages. I think we should feel empowered to contribute feedback around the approach upstream.
That all said, I understand the underlying philosophy that using an existing labelset will omit some processing that needs to happen every time. If I understand correctly, you're saying we save this by pre-encoding the strings.
I'm not really sure that pre-encoding the strings saves us any time here. Maybe feedback for the statsite encoding side, but that encoding makes the assumption that whatever is consuming the key-value pairs understands the statsite encoding and can optimize on that, which I don't believe is true.
For example, if we implement some sort of processor that needs to parse and filter the key-value pairs, then work will have to occur to parse out the key-value pairs, and reconstruct them into a dictionary or some secondary data structure to parse out the values. That negates the performance benefit when compared to just using a dict.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe this was addressed with the exporters utilizing specific encodings for optimization. With introducing of an "encoding layer", where users can configure their own encodings I believe this would save some time.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@toumorokoshi: out of curiosity, how would you implement this? Make label_set
a Union[Dict[str, str], LabelSet]
or have separate methods for each type?
@lzchen and I talked about it, and this seems to be a case where the conventions of the language shape the API. Java makes it easy to support both arg types with overloading, python doesn't.
That said, I'd personally prefer that we didn't require LabelSets
anywhere in the API. This looks like a bit of premature optimization on the part of the spec, at the expense of the simplicity of the API. Especially when we consider applications that don't reuse labels. But this is a problem for the spec, not this PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No complaints about the implementation, but I don't have the context yet to stamp the changes. This is a good reminder for me and other reviewers to check in on the prototype changes happening in go.
"""See `opentelemetry.metrics.Metric.get_handle`.""" | ||
handle = self.handles.get(label_values) | ||
handle = self.handles.get(label_set.encoded) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's clear that I'm missing a lot of context here too. I thought LabelSet
s were supposed to be opaque, but here we're relying on the statsd encoding?
Why not just make LabelSet
s hashable?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See my comments on statsd encoding. We might want to have different encoding implementations.
if len(labels) == 0: | ||
return EMPTY_LABEL_SET | ||
sorted_labels = OrderedDict(sorted(labels.items())) | ||
# Uses statsd encoding for labels |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This might need more eyes. I don't know why we'd use the statsd encoding here.
Codecov Report
@@ Coverage Diff @@
## master #258 +/- ##
==========================================
+ Coverage 85.76% 86.05% +0.28%
==========================================
Files 33 33
Lines 1609 1628 +19
Branches 180 182 +2
==========================================
+ Hits 1380 1401 +21
+ Misses 182 181 -1
+ Partials 47 46 -1
Continue to review full report at Codecov.
|
""" | ||
|
||
|
||
class DefaultLabelSet(LabelSet): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this required? Should LabelSet be made an ABC?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just like how metric
and handle
have default implementations, we don't want to have the meter
functions to return None objects (it would be get_label_set
in this case). Also a LabelSet has no methods so I didn't think it was necessary to use an ABC (unless my understanding of the benefits is mistaken).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My point here is that since LabelSet
is not an ABC we technically don't need DefaultLabelSet
since we could return a plain LabelSet
instance from Meter.get_label_set
.
sorted_labels = OrderedDict(sorted(labels.items())) | ||
# Uses statsd encoding for labels | ||
encoded = "|#" + ",".join( | ||
"%s:%s" % (key, value) for (key, value) in sorted_labels.items() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think this encoding is unambiguous: E.g. get_label_set({"foo:bar": "baz"})
and get_label_set({"foo", "bar:baz"})
would produce the same encoded value.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've changed the implementation to follow the Go default implementation of the label encoder (although it will still have the same unambiguous problem with key/values with '=' in it). I am not sure if it is a safe assumption to assume that the "=" will not occur. Thoughts?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe rise this issue in the Go repository? Or ask in the spec repository? My assumption would be that while we might want to restrict the key strings, we should not restrict values to not contain e.g. |#
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've filed an issue here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Approving as this is an evolving specification, and smaller changes are easier to merge in and amend rather than large changes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Two blocking comments here:
- Defaulting to
EMPTY_LABEL_SET
is likely to cause some problems if people also use the empty label set to record real metrics - The statsd encoding here seems like an unnecessary complication considering we don't actually use it except as an internal key
@@ -141,6 +176,7 @@ def __init__( | |||
description: str, | |||
unit: str, | |||
value_type: Type[metrics_api.ValueT], | |||
meter: "Meter", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is the only reason to include the meter here to prevent using labelsets created for other meter instances? As I understand the spec, we don't want to guarantee that labelsets created via one implementation can be used in another (i.e. they're defined in the SDK, not the API), but that's seems like a very different concern.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note that the go prototype is doing this: https://github.com/open-telemetry/opentelemetry-go/blob/e6d725626d4629220a2de0112570adf80d50be21/sdk/metric/sdk.go#L283. @c24t to take this up in specs.
if len(labels) == 0: | ||
return EMPTY_LABEL_SET | ||
sorted_labels = OrderedDict(sorted(labels.items())) | ||
# Uses statsd encoding for labels |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems like it would be much simpler to make LabelSet
hashable and just return the LabelSet
itself without doing any encoding here. The meter shouldn't have an opinion on labelset encoding, that's a problem for exporters.
If there's no benefit of doing the encoding up-front until we have a statsd exporter, I think we should wait to add this until we add the exporter.
"%s:%s" % (key, value) for (key, value) in sorted_labels.items() | ||
) | ||
# If LabelSet exists for this meter in memory, use existing one | ||
if not self.labels.get(encoded): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree that a lazy LabelSet.encoded
wouldn't work since the point is to prevent instantiating the LabelSet
, but the encoding code shouldn't be here either.
Changes open-telemetry#258 to remove meter attr from LabelSet and Metic, and remove statsd encoding.
TODO:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One typo and one test that needs to be updated/improved, but otherwise it LGTM. Thanks for slogging through this long review cycle @lzchen!
counter = metrics.Counter("name", "desc", "unit", float, label_keys) | ||
counter.add(label_values, 1.0) | ||
handle = counter.get_handle(label_values) | ||
counter.add(label_set, 1.0) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@toumorokoshi: out of curiosity, how would you implement this? Make label_set
a Union[Dict[str, str], LabelSet]
or have separate methods for each type?
@lzchen and I talked about it, and this seems to be a case where the conventions of the language shape the API. Java makes it easy to support both arg types with overloading, python doesn't.
That said, I'd personally prefer that we didn't require LabelSets
anywhere in the API. This looks like a bit of premature optimization on the part of the spec, at the expense of the simplicity of the API. Especially when we consider applications that don't reuse labels. But this is a problem for the spec, not this PR.
Address [#222]
Specs: https://github.com/open-telemetry/opentelemetry-specification/blob/master/specification/api-metrics-user.md#label-set-calling-convention
Part of Go implemetation: open-telemetry/opentelemetry-go#172
The primary purpose of LabelSets are to have an optimal way of re-using handles with the same label values. We achieve this by having the keys and values of the labels encoded and stored in each LabelSet instance, so we can have an easy lookup to the corresponding handle for each metric instrument. The encoding method used follows what the Go implementation is doing, which is apparently taken from statsd.
For label keys that are missing or extra in LabelSets that differ from the label keys specified in the creation of a metric, the exporters will deal with those use cases.