-
Notifications
You must be signed in to change notification settings - Fork 2.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[connector/servicegraph] Unit test failure: TestConnectorConsume/test_fix_failed_label_not_work #33998
Comments
Pinging code owners:
See Adding Labels via Comments if you do not have permissions to add labels yourself. |
I talked to @t00mas and he'll be looking into this soon. |
The problem is due to the servicegraph connector unit tests making use of
When the servicegraph connector test fails, it's because the names of consecutive metrics are the same, but the datapoint attributes are different between the actual metric and the expected metric from the golden dataset. Perhaps Alternatively, perhaps |
I think that |
I don't completely understand the In each case it ranges over a key, and creates a new metric each time, even though the name of the metric is the same in all cases. My understanding was that if this were inside an application using the Otel SDK that the SDK would aggregate all those keys (which I think represent different attribute sets) into a single metric entry but with multiple datapoints. The servicegraphconnector code on the other hand, seems to create multiple entries for the same metric, with just one datapoint per metric entry. Perhaps this is breaking the preconditions which |
This still appears to be happening: https://github.com/open-telemetry/opentelemetry-collector-contrib/actions/runs/9964617070/job/27533331662?pr=34104 |
+1 freq: #33994 |
Given the frequency / flakiness I would suggest to skip this test for now #34120 |
…peMetrics metric entry (#34070) Tests in `connector/servicegraph` were failing because the servicegraph `buildMetrics()` code was creating multiple metric entries for the same metric name within a single MetricScope. Although this may not be forbidden by the Otel specification, I think there is a general assumption that a metric name does not appear more than once within the same MetricScope. Instead, different values (e.g. with different sets of attribute values) should be created as separate datapoints within the same metric. The `pmetrictest.CompareScopeMetrics` test functionality is not designed to handle multiple metric entries with the same `Name()`. Instead, it is assumed that in cases where Order is ignored, the [first entry found in the actual metrics which matches the name of the expected metrics](https://github.com/open-telemetry/opentelemetry-collector-contrib/blob/ca9a8d14df471ed6a7dda7562ddfe659ec52127d/pkg/pdatatest/pmetrictest/metrics.go#L186-L195) *must* be the metric to compare. This fix changes the `buildMetrics()` code to create one metric within a scope, and instead create multiple datapoints per metric when there are entries where the datapoint attribute set is unique (i.e. all entries in the internal maps `serviceGraphConnector.req{Total,FailedTotal,ServerDurationSecondsCount}` are coalesced into a single named metric as appropriate.) _Note_: I don't have any past experience working with servicegraphconnector, but just observing that `collectClientLatencyMetrics()` and `collectServerLatencyMetrics()` both range over the same map - `p.reqServerDurationSecondsCount`, although the actual values collected in `collectClientLatencyMetrics()` are from `p.reqServerDurationSeconds{Count,Sum,BucketCounts}` and the values collected in `collectServerLatencyMetrics()` are from `p.reqClientDurationSeconds{Count,Sum,BucketCounts}`. This seems a little asymmetrical, but I don't have enough experience to say whether this is an error or not. **Description:** Fixes #33998 **Testing:** All other unit tests now complete, and the previously failing unit test now works reliably. **Documentation:** No documentation added. This is a unit test fix.
Component(s)
connector/servicegraph
Describe the issue you're reporting
Failing CI/CD action link
Failure occurred in unrelated PR, running on ubuntu.
Failure output:
The text was updated successfully, but these errors were encountered: