Implement LabelSet for metrics #258

lzchen · 2019-11-01T19:18:02Z

Address [#222]
Specs: https://github.com/open-telemetry/opentelemetry-specification/blob/master/specification/api-metrics-user.md#label-set-calling-convention

Part of Go implemetation: open-telemetry/opentelemetry-go#172

The primary purpose of LabelSets are to have an optimal way of re-using handles with the same label values. We achieve this by having the keys and values of the labels encoded and stored in each LabelSet instance, so we can have an easy lookup to the corresponding handle for each metric instrument. The encoding method used follows what the Go implementation is doing, which is apparently taken from statsd.

For label keys that are missing or extra in LabelSets that differ from the label keys specified in the creation of a metric, the exporters will deal with those use cases.

toumorokoshi

I have a few questions, primarily from lack of context.

toumorokoshi · 2019-11-06T04:56:26Z

opentelemetry-sdk/src/opentelemetry/sdk/metrics/__init__.py

+        if len(labels) == 0:
+            return EMPTY_LABEL_SET
+        sorted_labels = OrderedDict(sorted(labels.items()))
+        # Uses statsd encoding for labels


any particular motivation for encoding this way?

Good question! I am following the Go implementation which apparently uses what statsd is using.

This might need more eyes. I don't know why we'd use the statsd encoding here.

The type of encoding is used for potential optimizations that can occur for corresponding exporters that might be used. From the discussion in the metrics-spec gitter, the statsd encoding is a default implementation that is used. Until we implement exporters, the benefits of this won't be that clear. As well in the Go SDK implementation, there is a concept of a LabelEncoder , which allows plugging in custom encoding for labelsets. Take a look at labelencoder.go in this PR

It seems like it would be much simpler to make LabelSet hashable and just return the LabelSet itself without doing any encoding here. The meter shouldn't have an opinion on labelset encoding, that's a problem for exporters.

If there's no benefit of doing the encoding up-front until we have a statsd exporter, I think we should wait to add this until we add the exporter.

toumorokoshi · 2019-11-06T05:03:44Z

opentelemetry-sdk/src/opentelemetry/sdk/metrics/__init__.py

+    """See `opentelemetry.metrics.LabelSet."""
+
+    def __init__(self, labels: Dict[str, str] = None, encoded: str = ""):
+        self.labels = labels


Is it required to use a particular LabelSet implementation with a specific Meter implementation? I notice that there's no methods or properties for the API, but this is exposing the "labels" and "encoded" properties which are being utilized.

That is a good question. Yes, a particular LabelSet implementation should only work with a specific Meter implementation. I've included validation in the metric methods for this. If they do not match, we return the empty label set.

toumorokoshi · 2019-11-06T05:40:45Z

opentelemetry-sdk/src/opentelemetry/sdk/metrics/__init__.py

+            "%s:%s" % (key, value) for (key, value) in sorted_labels.items()
+        )
+        # If LabelSet exists for this meter in memory, use existing one
+        if not self.labels.get(encoded):


I think this code would be better in LabelSet itself. The encoded field could be implemented as a property which lazy-populates the value in the same way that this is done.

Wouldn't that mean the LabelSet instances would need to be able to access the cache of label encodings labels unique per Meter instance somehow? Would that really be cleaner?

I agree that a lazy LabelSet.encoded wouldn't work since the point is to prevent instantiating the LabelSet, but the encoding code shouldn't be here either.

toumorokoshi · 2019-11-06T05:43:08Z

opentelemetry-sdk/tests/metrics/test_metrics.py

        counter = metrics.Counter("name", "desc", "unit", float, label_keys)
-        counter.add(label_values, 1.0)
-        handle = counter.get_handle(label_values)
+        counter.add(label_set, 1.0)


will these methods also support standard dicts as well as LabelSets? I think we should.

Looking at https://github.com/open-telemetry/oteps/pull/49/files it looks like the primary motivation for LabelSet is performance. I don't see a lot here that will enable better performance, and it's adding another layer of abstraction which introduces complexity.

The primary motivation for LabelSet is performance, yes. It is the canonicalization of the label keys/values into unique identifiers that is very expensive. This is why LabelSets are created once using specific label keys/values, which are then canonicalized once, and then are able to be used multiple times without having to perform this expensive action. In the implementation, this is demonstrated by using a dictionary of encoded strings, to find the corresponding metric handle that was created from the canonicalized label keys/values.

Your suggestion of supporting standard dicts is valid though, the underlying logic will just take in the dict and construct a Labelset. This is more for convenience rather than an performance improvement. However, the Go and JavaScript implementations only allow passing in LabelSets for now so I'd like to stay consistent. Good idea for the future though.

Not to get into the project approach, but I think half the purpose of implementing in multiple languages is to identify patterns that work or don't for various languages. I think we should feel empowered to contribute feedback around the approach upstream.

That all said, I understand the underlying philosophy that using an existing labelset will omit some processing that needs to happen every time. If I understand correctly, you're saying we save this by pre-encoding the strings.

I'm not really sure that pre-encoding the strings saves us any time here. Maybe feedback for the statsite encoding side, but that encoding makes the assumption that whatever is consuming the key-value pairs understands the statsite encoding and can optimize on that, which I don't believe is true.

For example, if we implement some sort of processor that needs to parse and filter the key-value pairs, then work will have to occur to parse out the key-value pairs, and reconstruct them into a dictionary or some secondary data structure to parse out the values. That negates the performance benefit when compared to just using a dict.

I believe this was addressed with the exporters utilizing specific encodings for optimization. With introducing of an "encoding layer", where users can configure their own encodings I believe this would save some time.

@toumorokoshi: out of curiosity, how would you implement this? Make label_set a Union[Dict[str, str], LabelSet] or have separate methods for each type?

@lzchen and I talked about it, and this seems to be a case where the conventions of the language shape the API. Java makes it easy to support both arg types with overloading, python doesn't.

That said, I'd personally prefer that we didn't require LabelSets anywhere in the API. This looks like a bit of premature optimization on the part of the spec, at the expense of the simplicity of the API. Especially when we consider applications that don't reuse labels. But this is a problem for the spec, not this PR.

c24t

No complaints about the implementation, but I don't have the context yet to stamp the changes. This is a good reminder for me and other reviewers to check in on the prototype changes happening in go.

c24t · 2019-11-07T03:49:07Z

opentelemetry-sdk/src/opentelemetry/sdk/metrics/__init__.py

        """See `opentelemetry.metrics.Metric.get_handle`."""
-        handle = self.handles.get(label_values)
+        handle = self.handles.get(label_set.encoded)


It's clear that I'm missing a lot of context here too. I thought LabelSets were supposed to be opaque, but here we're relying on the statsd encoding?

Why not just make LabelSets hashable?

See my comments on statsd encoding. We might want to have different encoding implementations.

c24t · 2019-11-07T03:49:52Z

opentelemetry-sdk/src/opentelemetry/sdk/metrics/__init__.py

+        if len(labels) == 0:
+            return EMPTY_LABEL_SET
+        sorted_labels = OrderedDict(sorted(labels.items()))
+        # Uses statsd encoding for labels


This might need more eyes. I don't know why we'd use the statsd encoding here.

codecov-io · 2019-11-15T23:22:07Z

Codecov Report

Merging #258 into master will increase coverage by 0.28%.
The diff coverage is 100%.

@@            Coverage Diff             @@
##           master     #258      +/-   ##
==========================================
+ Coverage   85.76%   86.05%   +0.28%     
==========================================
  Files          33       33              
  Lines        1609     1628      +19     
  Branches      180      182       +2     
==========================================
+ Hits         1380     1401      +21     
+ Misses        182      181       -1     
+ Partials       47       46       -1

Impacted Files	Coverage Δ
...elemetry-api/src/opentelemetry/metrics/__init__.py	`86% <100%> (+1.55%)`	⬆️
...etry-sdk/src/opentelemetry/sdk/metrics/__init__.py	`99.05% <100%> (+2.31%)`	⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update d3bb228...03e072c. Read the comment docs.

Oberon00 · 2019-11-18T12:01:06Z

opentelemetry-api/src/opentelemetry/metrics/__init__.py

+    """
+
+
+class DefaultLabelSet(LabelSet):


Is this required? Should LabelSet be made an ABC?

Just like how metric and handle have default implementations, we don't want to have the meter functions to return None objects (it would be get_label_set in this case). Also a LabelSet has no methods so I didn't think it was necessary to use an ABC (unless my understanding of the benefits is mistaken).

My point here is that since LabelSet is not an ABC we technically don't need DefaultLabelSet since we could return a plain LabelSet instance from Meter.get_label_set.

opentelemetry-api/tests/metrics/test_metrics.py

Oberon00 · 2019-11-18T12:08:00Z

opentelemetry-sdk/src/opentelemetry/sdk/metrics/__init__.py

+        sorted_labels = OrderedDict(sorted(labels.items()))
+        # Uses statsd encoding for labels
+        encoded = "|#" + ",".join(
+            "%s:%s" % (key, value) for (key, value) in sorted_labels.items()


I don't think this encoding is unambiguous: E.g. get_label_set({"foo:bar": "baz"}) and get_label_set({"foo", "bar:baz"}) would produce the same encoded value.

I've changed the implementation to follow the Go default implementation of the label encoder (although it will still have the same unambiguous problem with key/values with '=' in it). I am not sure if it is a safe assumption to assume that the "=" will not occur. Thoughts?

Maybe rise this issue in the Go repository? Or ask in the spec repository? My assumption would be that while we might want to restrict the key strings, we should not restrict values to not contain e.g. |#.

I've filed an issue here

toumorokoshi

Approving as this is an evolving specification, and smaller changes are easier to merge in and amend rather than large changes.

c24t

Two blocking comments here:

Defaulting to EMPTY_LABEL_SET is likely to cause some problems if people also use the empty label set to record real metrics
The statsd encoding here seems like an unnecessary complication considering we don't actually use it except as an internal key

opentelemetry-sdk/src/opentelemetry/sdk/metrics/__init__.py

opentelemetry-sdk/tests/metrics/test_metrics.py

c24t · 2019-11-26T23:24:19Z

opentelemetry-sdk/src/opentelemetry/sdk/metrics/__init__.py

@@ -141,6 +176,7 @@ def __init__(
        description: str,
        unit: str,
        value_type: Type[metrics_api.ValueT],
+        meter: "Meter",


Is the only reason to include the meter here to prevent using labelsets created for other meter instances? As I understand the spec, we don't want to guarantee that labelsets created via one implementation can be used in another (i.e. they're defined in the SDK, not the API), but that's seems like a very different concern.

Note that the go prototype is doing this: https://github.com/open-telemetry/opentelemetry-go/blob/e6d725626d4629220a2de0112570adf80d50be21/sdk/metric/sdk.go#L283. @c24t to take this up in specs.

opentelemetry-sdk/src/opentelemetry/sdk/metrics/__init__.py

c24t · 2019-11-26T23:37:56Z

opentelemetry-sdk/src/opentelemetry/sdk/metrics/__init__.py

+        if len(labels) == 0:
+            return EMPTY_LABEL_SET
+        sorted_labels = OrderedDict(sorted(labels.items()))
+        # Uses statsd encoding for labels


It seems like it would be much simpler to make LabelSet hashable and just return the LabelSet itself without doing any encoding here. The meter shouldn't have an opinion on labelset encoding, that's a problem for exporters.

If there's no benefit of doing the encoding up-front until we have a statsd exporter, I think we should wait to add this until we add the exporter.

c24t · 2019-11-26T23:42:49Z

opentelemetry-sdk/src/opentelemetry/sdk/metrics/__init__.py

+            "%s:%s" % (key, value) for (key, value) in sorted_labels.items()
+        )
+        # If LabelSet exists for this meter in memory, use existing one
+        if not self.labels.get(encoded):


I agree that a lazy LabelSet.encoded wouldn't work since the point is to prevent instantiating the LabelSet, but the encoding code shouldn't be here either.

Changes open-telemetry#258 to remove meter attr from LabelSet and Metic, and remove statsd encoding.

lzchen · 2019-11-28T00:52:19Z

TODO:

Implement encoder API and default encoding
Investigate specs for checking meter when using labelsets

c24t

One typo and one test that needs to be updated/improved, but otherwise it LGTM. Thanks for slogging through this long review cycle @lzchen!

opentelemetry-sdk/src/opentelemetry/sdk/metrics/__init__.py

opentelemetry-sdk/tests/metrics/export/test_export.py

c24t · 2019-12-02T17:33:27Z

opentelemetry-sdk/tests/metrics/test_metrics.py

        counter = metrics.Counter("name", "desc", "unit", float, label_keys)
-        counter.add(label_values, 1.0)
-        handle = counter.get_handle(label_values)
+        counter.add(label_set, 1.0)


@toumorokoshi: out of curiosity, how would you implement this? Make label_set a Union[Dict[str, str], LabelSet] or have separate methods for each type?

@lzchen and I talked about it, and this seems to be a case where the conventions of the language shape the API. Java makes it easy to support both arg types with overloading, python doesn't.

That said, I'd personally prefer that we didn't require LabelSets anywhere in the API. This looks like a bit of premature optimization on the part of the spec, at the expense of the simplicity of the API. Especially when we consider applications that don't reuse labels. But this is a problem for the spec, not this PR.

lzchen added 2 commits October 31, 2019 16:13

Implement labelset

3629f5a

add tests

6a433fe

lzchen requested review from a-feld, c24t, carlosalberto, Oberon00, reyang and toumorokoshi as code owners November 1, 2019 19:18

lzchen added 3 commits November 1, 2019 14:22

fix lint

d816da8

fix lint

df80425

fix docs

c960056

toumorokoshi reviewed Nov 6, 2019

View reviewed changes

c24t reviewed Nov 7, 2019

View reviewed changes

lzchen added 6 commits November 14, 2019 18:20

include meter check

893640a

merge

b3985bd

fix test

7e9e7ad

lint

f2842ff

lint

f390f87

lint

4c050d2

Oberon00 reviewed Nov 18, 2019

View reviewed changes

lzchen added 2 commits November 18, 2019 12:00

Address comments

2fe06ca

abc

30e5e5a

toumorokoshi approved these changes Nov 19, 2019

View reviewed changes

c24t reviewed Nov 26, 2019

View reviewed changes

c24t added a commit to c24t/opentelemetry-python that referenced this pull request Nov 28, 2019

Edits from in-person review with @lzchen

6fe2e38

Changes open-telemetry#258 to remove meter attr from LabelSet and Metic, and remove statsd encoding.

c24t mentioned this pull request Nov 28, 2019

Edits from in-person review with @lzchen c24t/opentelemetry-python#2

Open

Apply Chris' changes

f375128

fix lint

ce33371

c24t reviewed Dec 2, 2019

View reviewed changes

Use mypy==0.740

792449e

c24t force-pushed the labelset branch from 79f38e7 to 792449e Compare December 3, 2019 05:04

Merge branch 'master' into labelset

03e072c

c24t mentioned this pull request Dec 3, 2019

Fix loader type annotations for mypy>=0.750 #313

Closed

c24t approved these changes Dec 3, 2019

View reviewed changes

c24t merged commit 4ead3f4 into open-telemetry:master Dec 3, 2019

lzchen deleted the labelset branch December 3, 2019 17:34

lzchen mentioned this pull request Dec 3, 2019

Implement "LabelSet" in metrics #222

Closed

ocelotl mentioned this pull request Dec 3, 2019

Set status for ended spans #297

Merged

c24t mentioned this pull request Dec 3, 2019

Add typeguard, use with pytest #316

Closed

srikanthccv pushed a commit to srikanthccv/opentelemetry-python that referenced this pull request Nov 1, 2020

feat: tslint rules for license headers (open-telemetry#258)

b125521

Implement LabelSet for metrics #258

Implement LabelSet for metrics #258

Conversation

lzchen commented Nov 1, 2019

toumorokoshi left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

c24t left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

lzchen Nov 14, 2019 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

codecov-io commented Nov 15, 2019 • edited Loading

Codecov Report

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

toumorokoshi left a comment

Choose a reason for hiding this comment

c24t left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

lzchen commented Nov 28, 2019 • edited Loading

c24t left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

lzchen Nov 14, 2019 •

edited

Loading

codecov-io commented Nov 15, 2019 •

edited

Loading

lzchen commented Nov 28, 2019 •

edited

Loading