-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Refactor ReceiveResourceSpans to share code with otelSpanToDDSpan fro… #29435
Conversation
Regression DetectorRegression Detector ResultsRun ID: 8aca0ace-2fa3-497d-a213-9bdcbf16b408 Metrics dashboard Target profiles Baseline: b755dfa Performance changes are noted in the perf column of each table:
No significant changes in experiment optimization goalsConfidence level: 90.00% There were no significant changes in experiment optimization goals at this confidence level and effect size tolerance.
|
perf | experiment | goal | Δ mean % | Δ mean % CI | trials | links |
---|---|---|---|---|---|---|
➖ | basic_py_check | % cpu utilization | +2.75 | [+0.10, +5.41] | 1 | Logs |
➖ | uds_dogstatsd_to_api_cpu | % cpu utilization | +2.64 | [+1.90, +3.37] | 1 | Logs |
➖ | tcp_syslog_to_blackhole | ingress throughput | +0.49 | [+0.43, +0.56] | 1 | Logs |
➖ | idle | memory utilization | +0.01 | [-0.04, +0.05] | 1 | Logs |
➖ | tcp_dd_logs_filter_exclude | ingress throughput | +0.00 | [-0.01, +0.01] | 1 | Logs |
➖ | uds_dogstatsd_to_api | ingress throughput | -0.02 | [-0.10, +0.05] | 1 | Logs |
➖ | pycheck_lots_of_tags | % cpu utilization | -0.04 | [-2.45, +2.37] | 1 | Logs |
➖ | file_tree | memory utilization | -0.06 | [-0.17, +0.04] | 1 | Logs |
➖ | idle_all_features | memory utilization | -0.40 | [-0.49, -0.30] | 1 | Logs |
➖ | otel_to_otel_logs | ingress throughput | -0.56 | [-1.37, +0.24] | 1 | Logs |
Bounds Checks
perf | experiment | bounds_check_name | replicates_passed |
---|---|---|---|
✅ | idle | memory_usage | 10/10 |
Explanation
A regression test is an A/B test of target performance in a repeatable rig, where "performance" is measured as "comparison variant minus baseline variant" for an optimization goal (e.g., ingress throughput). Due to intrinsic variability in measuring that goal, we can only estimate its mean value for each experiment; we report uncertainty in that value as a 90.00% confidence interval denoted "Δ mean % CI".
For each experiment, we decide whether a change in performance is a "regression" -- a change worth investigating further -- if all of the following criteria are true:
-
Its estimated |Δ mean %| ≥ 5.00%, indicating the change is big enough to merit a closer look.
-
Its 90.00% confidence interval "Δ mean % CI" does not contain zero, indicating that if our statistical model is accurate, there is at least a 90.00% chance there is a difference in performance between baseline and comparison variants.
-
Its configuration does not mark it "erratic".
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, just requesting a couple of minor edits
releasenotes/notes/refactor-receiveresourcespans-1a7a1b027982f39c.yaml
Outdated
Show resolved
Hide resolved
Go Package Import DifferencesBaseline: b755dfa
|
Tags: info.Tags{ | ||
Lang: lang, | ||
TracerVersion: fmt.Sprintf("otlp-%s", resourceAttributesMap[semconv.AttributeTelemetrySDKVersion]), | ||
EndpointVersion: "opentelemetry_grpc_v1", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
whats this ? why this value .
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
From a conversation with the trace agent team, it's needed downstream.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you please add a comment, where is it need downstream
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As discussed, lets make performance changes before merging this
…ns and improve performance
/merge |
🚂 MergeQueue: waiting for PR to be ready This merge request is not mergeable yet, because of pending checks/missing approvals. It will be added to the queue as soon as checks pass and/or get approvals. Use |
This comment has been minimized.
This comment has been minimized.
/merge |
🚂 MergeQueue: waiting for PR to be ready This merge request is not mergeable yet, because of pending checks/missing approvals. It will be added to the queue as soon as checks pass and/or get approvals. Use |
This merge request was unqueued If you need support, contact us on Slack #devflow! |
/merge |
🚂 MergeQueue: waiting for PR to be ready This merge request is not mergeable yet, because of pending checks/missing approvals. It will be added to the queue as soon as checks pass and/or get approvals. Use |
/merge -c |
This merge request was unqueued If you need support, contact us on Slack #devflow! |
/gitlab estimate-cost |
🚂 Devflow: Estimated cost is $182.86 for Refactor ReceiveResourceSpans to share code with otelSpanToDDSpan fro… (cumulated CI duration: 94h3m45s) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks good to me from trace-agent side of things, Do you have a deprecation / testing plan here to move everyone to v2 or will both implementations need to continue to exist
Stats: info.NewStats(), | ||
} | ||
|
||
tags := tagstats.AsTags() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you have (or want) any telemetry here to track adoption of this feature flag? (you might get this for free through the configuration telemetry but I'm not sure how well that will work for a feature flag)
This comment has been minimized.
This comment has been minimized.
📥 📢 Info, this pull request increases the binary size of serverless extension by 24576 bytes. Each MB of binary size increase means about 10ms of additional cold start time, so this pull request would increase cold start time by 0ms. Debug infoIf you have questions, we are happy to help, come visit us in the #serverless slack channel and provide a link to this comment. We suggest you consider adding the |
Serverless Benchmark Results
tl;drUse these benchmarks as an insight tool during development.
What is this benchmarking?The The benchmark is run using a large variety of lambda request payloads. In the charts below, there is one row for each event payload type. How do I interpret these charts?The charts below comes from The benchstat docs explain how to interpret these charts.
I need more helpFirst off, do not worry if the benchmarks are failing. They are not tests. The intention is for them to be a tool for you to use during development. If you would like a hand interpreting the results come chat with us in Benchmark stats
|
/merge |
🚂 MergeQueue: pull request added to the queue The median merge time in Use |
…m Concentrator
What does this PR do?
JIRA Link: https://datadoghq.atlassian.net/browse/OTEL-1726
Refactor
ReceiveResourceSpans
such that it uses the same mapping logic asotelSpanToDDSpan
frompkg/trace/stats/otel_util.go
. Also leverage other helpers frompkg/trace/traceutil/otel_util.go
.Moved “otelSpanToDDSpan” to new
trace/transform
module because putting it intraceutil
module resulted in circular import betweentrace/config
andtrace/traceutil
Add a new feature flag
enable_receive_resource_spans_v2
underapm_config.features
to gate this refactor.Removed functionality compared to the old
ReceiveResourceSpans
:No longer populate these fields in
tagstats
:LangVersion, Interpreter, LangVendor
No longer check for
lang
andcontainerID
in HTTP headerNo longer check for resource-related values (container, env, hostname) in span attributes. We had support for it before, but that behaviour doesn't follow OTel spec
Added functionality:
o.conf.Ignore
to ignore spans with specified resource namesMotivation
The goals of this refactor were to make the code more readable, and to make the logic for setting
span.Resource
andspan.Type
easier to change in a future PR.Describe how to test/QA your changes
Add
enable_receive_resource_spans_v2
flag in apm_config.features in datadog.yaml. Build agent locally, run it, and send through traces (I tested using calendar app). Verify in the frontend that the data received is the same before and after the refactor.Benchmarks run in
otlp_test.go
:Possible Drawbacks / Trade-offs
Additional Notes