-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[apm] standardize peer tag aggregation #20550
Conversation
…specific peer.service field
…ted configurations
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Left a couple of suggestions
Co-authored-by: May Lee <mayl@alumni.cmu.edu>
Co-authored-by: May Lee <mayl@alumni.cmu.edu>
Co-authored-by: May Lee <mayl@alumni.cmu.edu>
Co-authored-by: May Lee <mayl@alumni.cmu.edu>
Thank you May! I have committed your suggestions. |
…aDog/datadog-agent into apm/standardize-peer-tag-aggregation
This reverts commit fc01656.
Bloop Bleep... Dogbot HereRegression Detector ResultsRun ID: 6decd9ed-9188-4a29-bd96-d3a4746cd3dc ExplanationA regression test is an integrated performance test for Because a target's optimization goal performance in each experiment will vary somewhat each time it is run, we can only estimate mean differences in optimization goal relative to the baseline target. We express these differences as a percentage change relative to the baseline target, denoted "Δ mean %". These estimates are made to a precision that balances accuracy and cost control. We represent this precision as a 90.00% confidence interval denoted "Δ mean % CI": there is a 90.00% chance that the true value of "Δ mean %" is in that interval. We decide whether a change in performance is a "regression" -- a change worth investigating further -- if both of the following two criteria are true:
The table below, if present, lists those experiments that have experienced a statistically significant change in mean optimization goal performance between baseline and comparison SHAs with 90.00% confidence OR have been detected as newly erratic. Negative values of "Δ mean %" mean that baseline is faster, whereas positive values of "Δ mean %" mean that comparison is faster. Results that do not exhibit more than a ±5.00% change in their mean optimization goal are discarded. An experiment is erratic if its coefficient of variation is greater than 0.1. The abbreviated table will be omitted if no interesting change is observed. No interesting changes in experiment optimization goals with confidence ≥ 90.00% and |Δ mean %| ≥ 5.00%. Fine details of change detection per experiment.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
✅ for agent-shared-component owned files
pkg/config/config_template.yaml
Outdated
## @env DD_APM_PEER_TAGS_AGGREGATION - bool - default: false | ||
## [BETA] Enables aggregation of peer related tags (e.g., `peer.service`, `db.instance`, etc.) in the Agent. | ||
## If disabled, aggregated trace stats will not include these tags as dimensions on trace metrics. | ||
## For the best experience, Datadog also recommends enabling `compute_stats_by_span_kind`. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Presumably we recommend enabling compute_stats_by_span_kind
if peer_tags_aggregation
is enabled?
Do we recommend enabling peer_tags_aggregation
?
Will people reading this know what it means?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll rephrase. The idea here is that if you're using peer_tags_aggregation
, you will likely also want the span kind flag enabled too.
pkg/config/config_template.yaml
Outdated
## If disabled, aggregated trace stats will not include these tags as dimensions on trace metrics. | ||
## For the best experience, Datadog also recommends enabling `compute_stats_by_span_kind`. | ||
## If enabling both causes the Agent to consume too many resources, try disabling `compute_stats_by_span_kind` first. | ||
## If the overhead remains high, it will be due to a high cardinality of peer tags from the traces. You may need to check your instrumentation. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What should they be looking for in their instrumentation? Can we link them any documentation here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can provide some more guidance in this comment. We do not have clear documentation that would speak to this specific concern.
…r_tags_aggregation (#29089) **Description:** Deprecate peer_service_aggregation in favor of peer_tags_aggregation. Counterpart of DataDog/datadog-agent#20550.
…r_tags_aggregation (open-telemetry#29089) **Description:** Deprecate peer_service_aggregation in favor of peer_tags_aggregation. Counterpart of DataDog/datadog-agent#20550.
What does this PR do?
Builds on previous work by standardizing on a set of default peer tags over which to aggregate.
The tags themselves are an approved list of fields that are then converted to
peer.*
tags on the backend for use with trace metrics.The previous
peer_tags
configuration can still be used to provide supplementary tags if necessary, but we intend for this flag to only be used in exceptional cases. This field cannot be used to supply arbitrary tags, as all tags ultimately are vetted in the backend.Motivation
To make it easier for customers to adopt peer tags.
Additional Notes
Possible Drawbacks / Trade-offs
Describe how to test/QA your changes
Please ask me for the testing doc.
Reviewer's Checklist
Triage
milestone is set.major_change
label if your change either has a major impact on the code base, is impacting multiple teams or is changing important well-established internals of the Agent. This label will be use during QA to make sure each team pay extra attention to the changed behavior. For any customer facing change use a releasenote.changelog/no-changelog
label has been applied.qa/skip-qa
label is not applied.team/..
label has been applied, indicating the team(s) that should QA this change.need-change/operator
andneed-change/helm
labels have been applied.k8s/<min-version>
label, indicating the lowest Kubernetes version compatible with this feature.