Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[apm] peer.service aggregation for trace stats, option to compute stats based on span.kind #16103

Merged
merged 53 commits into from
Apr 6, 2023

Conversation

jdgumz
Copy link
Contributor

@jdgumz jdgumz commented Mar 14, 2023

What does this PR do?

Adds peer.service to stats payloads emitted by the trace agent.

This PR also adds the ability for the agent to check a span's span.kind field to determine if the span is eligible for stats. This is a somewhat complementary but arguably necessary check to ensure that spans with peer.service (client/producer spans) get picked up for stats. Additionally, we'd want to ensure that the ingress spans (ones with kind equal to server/consumer) also are picked up for trace stats computation.

Motivation

Feature enhancement: peer.service lets us compute edge-specific statistics between APM-instrumented services and remote services.

Additional Notes

N/A

Possible Drawbacks / Trade-offs

Additional aggregation in the agent.

Describe how to test/QA your changes

  • Enable both apm_config.peer_service_aggregation and apm_config.compute_stats_by_span_kind.
  • Send spans to the trace agent with peer.service set in the meta.
  • OR send tracer client stats payloads that have PeerService set in the grouped stats.
  • Validate that the exported payloads have PeerService set in the grouped stats.

Reviewer's Checklist

  • If known, an appropriate milestone has been selected; otherwise the Triage milestone is set.
  • Use the major_change label if your change either has a major impact on the code base, is impacting multiple teams or is changing important well-established internals of the Agent. This label will be use during QA to make sure each team pay extra attention to the changed behavior. For any customer facing change use a releasenote.
  • A release note has been added or the changelog/no-changelog label has been applied.
  • Changed code has automated tests for its functionality.
  • Adequate QA/testing plan information is provided if the qa/skip-qa label is not applied.
  • At least one team/.. label has been applied, indicating the team(s) that should QA this change.
  • If applicable, docs team has been notified or an issue has been opened on the documentation repo.
  • If applicable, the need-change/operator and need-change/helm labels have been applied.
  • If applicable, the k8s/<min-version> label, indicating the lowest Kubernetes version compatible with this feature.
  • If applicable, the config template has been updated.

@jdgumz jdgumz requested a review from a team as a code owner March 14, 2023 18:48
Comment on lines 535 to 537
sp.Meta = map[string]string{"peer.service": "remote-service"}
spans := []*pb.Span{sp}
traceutil.ComputeTopLevel(spans)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we update concentrator.go to compute stats for all peer.service spans, regardless of whether it is _top_level or _dd.measured?

Copy link
Contributor

@pkalmakis pkalmakis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks good to me, although this implies that the tracer needs to explicitly set _dd.measured:1 for any span with peer.service set if they want stats. Should we be collecting stats for all peer.service spans? Is there danger of us computing too many stats if we do this?

jdgumz and others added 5 commits March 15, 2023 10:50
Co-authored-by: Diana Shevchenko <40775148+dianashevchenko@users.noreply.github.com>
Co-authored-by: Diana Shevchenko <40775148+dianashevchenko@users.noreply.github.com>
….com:DataDog/datadog-agent into apm/peer-service-aggregation-for-trace-stats
@jdgumz jdgumz requested a review from a team as a code owner March 21, 2023 18:54
Copy link
Contributor

@alai97 alai97 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, two copy suggestions!

jdgumz and others added 4 commits March 21, 2023 15:09
Co-authored-by: Peter Kalmakis <peter.kalmakis@datadoghq.com>
…627c18.yaml

Co-authored-by: Austin Lai <76412946+alai97@users.noreply.github.com>
…627c18.yaml

Co-authored-by: Austin Lai <76412946+alai97@users.noreply.github.com>
Copy link
Contributor

@ahmed-mez ahmed-mez left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, let's just document this in pkg/config/config_template.yaml

Copy link
Member

@songy23 songy23 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ecdeaa4 looks good from OTel

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants