spanmetricsconnector produces a lot of exemplars #23872

povilasv · 2023-06-30T12:15:51Z

Component(s)

connector/spanmetrics

What happened?

Description

Every single span sent to span2metrics adds an Exemplar to the histogram metric, which can be quite a lot of data. Since Exemplars record, span id, trace id, latency and timestamp.

General question - whether this actually defeats the purpose of the Histogram? As with histograms I mainly expect aggregated data , not every single recorded event.

I added a bunch of debugging info on span2metric connector and I can see them:

  (v1.Exemplar) time_unix_nano:1688126657105088872 as_double:0.125755 span_id:[31 19 133 30 111 26 52 140] trace_id:[95 19 233 90 229 29 22 1
34 83 84 4 117 25 77 5 180] ,                                                                                                                
  (v1.Exemplar) time_unix_nano:1688126657105088872 as_double:0.125691 span_id:[252 153 209 139 224 166 71 240] trace_id:[173 169 237 196 254 
215 126 201 50 55 147 42 111 33 255 52] ,                                                                                                    
  (v1.Exemplar) time_unix_nano:1688126657105088872 as_double:0.127795 span_id:[167 212 174 43 46 170 241 103] trace_id:[94 109 78 4 69 159 17
8 164 62 111 190 199 101 49 196 217] ,                                                                                                       
  (v1.Exemplar) time_unix_nano:1688126657105088872 as_double:0.125864 span_id:[110 140 57 72 41 141 239 176] trace_id:[49 16 131 177 44 198 1
06 255 13 100 16 18 198 125 139 66]                                                                                                          
 })

I couldn't find other way to print exemplars to console. But my test shows If in histogram sample count is 248819, then exemplar count is also 248819

Steps to Reproduce

add some println where we send the aggregated histogrma
run otel collector
./bin/telemetrygen_linux_amd64 traces --otlp-insecure --otlp-endpoint localhost:4317 --duration 30s

Expected Result

Actual Result

Collector version

v0.80.0

Environment information

Environment

OS: (e.g., "Ubuntu 20.04")
Compiler(if manually compiled): (e.g., "go 14.2")

OpenTelemetry Collector configuration

receivers:
  otlp:
    protocols:
      grpc:
      http:


connectors:
  spanmetrics:
    histogram:
      explicit:
        buckets: [100us, 1ms, 2ms, 6ms, 10ms, 100ms, 250ms]
    dimensions:
      - name: http.method
        default: GET
      - name: http.status_code
    dimensions_cache_size: 1000
    aggregation_temporality: "AGGREGATION_TEMPORALITY_CUMULATIVE"    
    metrics_flush_interval: 15s 

exporters:
  logging:
    verbosity: detailed

service:
  pipelines:
    traces:
      receivers: [otlp]
      exporters: [spanmetrics]
    metrics:
      receivers: [otlp, spanmetrics]
      exporters: [logging]

Log output

No response

Additional context

No response

The text was updated successfully, but these errors were encountered:

github-actions · 2023-06-30T12:16:12Z

Pinging code owners:

connector/spanmetrics: @albertteoh @kovrus

See Adding Labels via Comments if you do not have permissions to add labels yourself.

povilasv · 2023-06-30T12:18:01Z

One solution could be having an option to disable exemplars in spanmetricsconnector?

I can work on this if you folks agree.

Frapschen · 2023-07-04T14:46:02Z

I think we can simply have the option to disable exemplars. BTW, can we have some strategies to determine whether one metric event needs an exemplar？ The first strategy coming into my mind is that we can have a latency threshold to determine it. e.g.

exemplar:
  enabled: true
  strategy:
    latencyThreshold: 500ms

albertteoh · 2023-07-07T08:38:19Z

I like @Frapschen's idea of adding configurability into examplars. I would only suggest to use the naming convention of duration (used in this spanmetrics connector) rather than latency (used in the deprecated spanmetrics processor)

I also do prefer @Frapschen's enabled config rather than a disabled config option (so exemplars are disabled by default). It does, however, mean a breaking change. I'm okay with that as long as it's documented in the changelog, and better to get it right now while still in alpha, than later when it's considered stable.

Thoughts? @povilasv @kovrus

povilasv · 2023-07-09T19:48:47Z

I wan't to do this step by step, so pushed a PR which disables Exemplars by default -> #24048

I like the concept of strategies. I was thinking we could have another "OnePerHistogramBucket" strategy, which would allow to collect up to one exemplar per histogram's latency bucket.

Since in agent deployment model we might get quite a lot of different services sending spans, and for one service X ms is slow, for another it's fast, so hard to have one threshold :)

So config would be like this:

exemplar:
  enabled: true
  strategy:
    # either durationThreshold or onePerHistogramBucket should be set
    durationThreshold: 500ms
    onePerHistogramBucket: true

Thoughts?

Also should we add the existing "collect all" behaviour as strategy?

**Description:** <Describe what has changed.>  Breaking change! Allows enabling / disabling Exemplars. **Link to tracking Issue:** <Issue number if applicable> #23872 **Testing:** <Describe what testing was performed and which tests were added.> - Added unit test **Documentation:** <Describe the documentation added.> - Added docs --------- Co-authored-by: Albert <26584478+albertteoh@users.noreply.github.com> Co-authored-by: Pablo Baeyens <pbaeyens31+github@gmail.com>

povilasv · 2023-07-13T10:36:42Z

FYI we just merged the default exemplars disabled behaviour. What should we do for exemplar "sampling" strategies? Which one should we implement? Let me know, I can find some time to work on it :)

github-actions · 2023-09-12T03:29:16Z

This issue has been inactive for 60 days. It will be closed in 60 days if there is no activity. To ping code owners by adding a component label, see Adding Labels via Comments, or if you are unsure of which component this issue relates to, please ping @open-telemetry/collector-contrib-triagers. If this issue is still relevant, please ping the code owners or leave a comment explaining why it is still relevant. Otherwise, please close it.

Pinging code owners:

connector/spanmetrics: @albertteoh

See Adding Labels via Comments if you do not have permissions to add labels yourself.

povilasv · 2023-09-12T14:02:37Z

Given the original issue is fixed. Im closing this, we can add / discuss strategies in different issues

povilasv added bug Something isn't working needs triage New item requiring triage labels Jun 30, 2023

github-actions bot added the connector/spanmetrics label Jun 30, 2023

povilasv mentioned this issue Jul 9, 2023

[connector/spanmetrics] disable exemplars by default #24048

Merged

JaredTan95 removed the needs triage New item requiring triage label Jul 10, 2023

github-actions bot added the Stale label Sep 12, 2023

mx-psi removed the Stale label Sep 12, 2023

povilasv closed this as completed Sep 12, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

spanmetricsconnector produces a lot of exemplars #23872

spanmetricsconnector produces a lot of exemplars #23872

povilasv commented Jun 30, 2023 •

edited

Loading

github-actions bot commented Jun 30, 2023

povilasv commented Jun 30, 2023

Frapschen commented Jul 4, 2023

albertteoh commented Jul 7, 2023 •

edited

Loading

povilasv commented Jul 9, 2023 •

edited

Loading

povilasv commented Jul 13, 2023 •

edited

Loading

github-actions bot commented Sep 12, 2023

povilasv commented Sep 12, 2023

spanmetricsconnector produces a lot of exemplars #23872

spanmetricsconnector produces a lot of exemplars #23872

Comments

povilasv commented Jun 30, 2023 • edited Loading

Component(s)

What happened?

Description

Steps to Reproduce

Expected Result

Actual Result

Collector version

Environment information

Environment

OpenTelemetry Collector configuration

Log output

Additional context

github-actions bot commented Jun 30, 2023

povilasv commented Jun 30, 2023

Frapschen commented Jul 4, 2023

albertteoh commented Jul 7, 2023 • edited Loading

povilasv commented Jul 9, 2023 • edited Loading

povilasv commented Jul 13, 2023 • edited Loading

github-actions bot commented Sep 12, 2023

povilasv commented Sep 12, 2023

povilasv commented Jun 30, 2023 •

edited

Loading

albertteoh commented Jul 7, 2023 •

edited

Loading

povilasv commented Jul 9, 2023 •

edited

Loading

povilasv commented Jul 13, 2023 •

edited

Loading