Collect some of the `trace_api.*` metrics in the trace flusher #418

iamluc · 2024-05-07T09:46:37Z

What does this PR do?

This PR splits the tracer_utils into smaller parts and collects metrics for:

trace_api.requests.
trace_api.errors.
trace_api.responses.
These metrics are then sent by the "self telemetry" worker of the sidecar.

Motivation

What inspired you to submit this pull request?

Additional Notes

Anything else we should know when reviewing?

How to test the change?

Describe here in detail how the change can be validated.

codecov-commenter · 2024-05-13T13:34:11Z

Codecov Report

Attention: Patch coverage is 20.93863% with 219 lines in your changes are missing coverage. Please review.

Project coverage is 65.92%. Comparing base (9b531d5) to head (ef3c62e).

Additional details and impacted files

@@            Coverage Diff             @@
##             main     #418      +/-   ##
==========================================
- Coverage   66.26%   65.92%   -0.34%     
==========================================
  Files         189      191       +2     
  Lines       23650    23767     +117     
==========================================
- Hits        15672    15669       -3     
- Misses       7978     8098     +120

Components	Coverage Δ
crashtracker	`19.34% <ø> (ø)`
datadog-alloc	`98.43% <ø> (ø)`
data-pipeline	`51.45% <0.00%> (ø)`
data-pipeline-ffi	`0.00% <ø> (ø)`
ddcommon	`81.52% <ø> (ø)`
ddcommon-ffi	`74.93% <ø> (ø)`
ddtelemetry	`53.72% <ø> (ø)`
ipc	`81.27% <ø> (ø)`
profiling	`77.52% <ø> (ø)`
profiling-ffi	`60.05% <ø> (ø)`
serverless	`0.00% <ø> (ø)`
sidecar	`31.02% <0.00%> (-0.74%)`	⬇️
sidecar-ffi	`0.00% <ø> (ø)`
spawn-worker	`54.98% <ø> (ø)`
trace-mini-agent	`69.12% <66.66%> (ø)`
trace-normalization	`97.79% <ø> (ø)`
trace-obfuscation	`95.74% <ø> (ø)`
trace-protobuf	`25.64% <ø> (ø)`
trace-utils	`64.07% <28.42%> (-4.79%)`	⬇️

ekump · 2024-05-20T17:02:41Z

trace-utils/src/send_data.rs

+
+#[derive(Debug, Clone)]
+pub struct SendData {
+    pub tracer_payloads: Vec<pb::TracerPayload>,


Do we really need to expose tracer_payloads publicly? From what I can tell, it's only for the tests in the trace-mini-agent, which is not a good pattern. We shouldn't be exposing internal implementation details just for unit tests in this way.

If we are only exposing tracer_payloads for tests, could we refactor the tests to use httpmock and properly test the outputs? If that is something that we want to explore, I think we could do that in a follow-up PR since we want to get this merged in soon. Maybe we can just add a TODO comment then?

This is a good point. We shouldn't expose internals unnecessarily in this way, but I think that going all the way to sending to the httpmock isn't necessary. Just adding a test only method on the SendData implementation that hands out payload number X, is fine by me.

I struggled with this because coalesce_send_data needs to mutate the tracer_payloads fields so I thought about implementing a getter to return a mutable reference but in the end is kind of the same as making the the field public. Also I thought about restructuring the SendData and TraceFlusher to improve the encapsulation but I realized it'd overcomplicate this PR. Anyway if you guys think the getter is a better solution we can go that way or if you can think of a better solution please let me know.

That can be public to the crate, right?

ekump · 2024-05-20T17:22:16Z

trace-utils/src/send_data.rs

+        }
+    }
+
+    async fn update(


@hoolioh How do you envision this working with retry logic? Would we call update for each attempt of a request payload? Or should update only be called for a payload once, regardless of how many retries it took?

Since the retry logic tries to minimize errors I'd choose the latter because it'd be confusing to send errors or dropped statistics for traces that will end up being sent.

@hoolioh Ah, that's a great point. Maybe retry metrics should be it's own thing at some point? It may be useful information.

I think that the retries can be deduced from the number of API requests, the number of failures of different types, as well as the number of dropped traces.

ekump · 2024-05-20T17:22:54Z

trace-utils/src/send_data.rs

+}
+
+impl SendDataResult {
+    fn new() -> SendDataResult {


Since you're not taking in any parameters, should this just be a Default impl?

ekump · 2024-05-20T17:33:19Z

trace-utils/src/trace_utils.rs

-use serde::{Deserialize, Serialize};
 use std::collections::HashMap;

+pub use crate::send_data::SendData;


nitpick: Should we move trace_utils to use a mod.rs file like we do in the sidecar to handle exporting like this? Feels a bit awkward / indirect like this. (If we want to do this, we can do in a separate PR).

ekump · 2024-05-20T17:50:48Z

sidecar/src/self_telemetry.rs

+            futures.push(self.send(
+                self.trace_api_responses,
+                *count as f64,
+                vec![Tag::new("status_code", status_code.to_string().as_str()).unwrap()],


The most minor of nitpicks, feel free to ignore: &status_code.to_string() also works and is a bit less verbose.

ekump · 2024-05-20T18:15:16Z

sidecar/src/service/tracing/trace_flusher.rs

        }
    }
+
+    pub fn collect_metrics(&self) -> TraceFlusherMetrics {


I know it's technically not mutable, but I'm not sure the name collect_metrics conveys the actual mutability of this function on TraceFlusherMetrics? Also, I don't think it's actually collecting metrics at this point (the caller is doing the collecting)?

What about calling the function remove(), similar to the remove function in HashMap? Or something like fetch_and_clear?

bantonsson

As mentioned internally, it’s fine to merge this as is and make changes in the follow up PR.

github-actions bot added mini-agent sidecar labels May 7, 2024

iamluc force-pushed the luc/sidecar-collect-trace_api-metrics branch from e0090f8 to f442b4d Compare May 7, 2024 09:57

github-actions bot added the data-pipeline label May 7, 2024

iamluc force-pushed the luc/sidecar-collect-trace_api-metrics branch from f442b4d to 2d2ab5e Compare May 7, 2024 10:06

hoolioh force-pushed the luc/sidecar-collect-trace_api-metrics branch 4 times, most recently from 19a7ade to 6fda63a Compare May 13, 2024 13:24

ekump reviewed May 20, 2024

View reviewed changes

bantonsson approved these changes May 21, 2024

View reviewed changes

hoolioh marked this pull request as ready for review May 21, 2024 10:39

hoolioh requested review from a team as code owners May 21, 2024 10:39

iamluc and others added 2 commits May 21, 2024 12:39

Collect some trace_api.* metrics in the trace flusher

59385d5

Refactor trace_utils.rs into smaller units.

ef3c62e

hoolioh force-pushed the luc/sidecar-collect-trace_api-metrics branch from 6fda63a to ef3c62e Compare May 21, 2024 10:39

hoolioh merged commit f44bf09 into main May 21, 2024

hoolioh deleted the luc/sidecar-collect-trace_api-metrics branch May 21, 2024 11:06

iamluc mentioned this pull request May 28, 2024

Collect trace_api.{requests,responses,errors} metrics in the background trace sender DataDog/dd-trace-php#2672

Merged

2 tasks

Collect some of the trace_api.* metrics in the trace flusher #418

Collect some of the trace_api.* metrics in the trace flusher #418

Uh oh!

Conversation

iamluc commented May 7, 2024 • edited by hoolioh Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What does this PR do?

Motivation

Additional Notes

How to test the change?

Uh oh!

codecov-commenter commented May 13, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ekump May 20, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

bantonsson left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

Collect some of the `trace_api.*` metrics in the trace flusher #418

Collect some of the `trace_api.*` metrics in the trace flusher #418

iamluc commented May 7, 2024 •

edited by hoolioh

Loading

codecov-commenter commented May 13, 2024 •

edited

Loading

ekump May 20, 2024 •

edited

Loading