Skip to content

Conversation

@iamluc
Copy link
Contributor

@iamluc iamluc commented May 7, 2024

What does this PR do?

This PR splits the tracer_utils into smaller parts and collects metrics for:

  • trace_api.requests.
  • trace_api.errors.
  • trace_api.responses.
    These metrics are then sent by the "self telemetry" worker of the sidecar.

Motivation

What inspired you to submit this pull request?

Additional Notes

Anything else we should know when reviewing?

How to test the change?

Describe here in detail how the change can be validated.

@iamluc iamluc force-pushed the luc/sidecar-collect-trace_api-metrics branch from e0090f8 to f442b4d Compare May 7, 2024 09:57
@iamluc iamluc force-pushed the luc/sidecar-collect-trace_api-metrics branch from f442b4d to 2d2ab5e Compare May 7, 2024 10:06
@hoolioh hoolioh force-pushed the luc/sidecar-collect-trace_api-metrics branch 4 times, most recently from 19a7ade to 6fda63a Compare May 13, 2024 13:24
@codecov-commenter
Copy link

codecov-commenter commented May 13, 2024

Codecov Report

Attention: Patch coverage is 20.93863% with 219 lines in your changes are missing coverage. Please review.

Project coverage is 65.92%. Comparing base (9b531d5) to head (ef3c62e).

Additional details and impacted files
@@            Coverage Diff             @@
##             main     #418      +/-   ##
==========================================
- Coverage   66.26%   65.92%   -0.34%     
==========================================
  Files         189      191       +2     
  Lines       23650    23767     +117     
==========================================
- Hits        15672    15669       -3     
- Misses       7978     8098     +120     
Components Coverage Δ
crashtracker 19.34% <ø> (ø)
datadog-alloc 98.43% <ø> (ø)
data-pipeline 51.45% <0.00%> (ø)
data-pipeline-ffi 0.00% <ø> (ø)
ddcommon 81.52% <ø> (ø)
ddcommon-ffi 74.93% <ø> (ø)
ddtelemetry 53.72% <ø> (ø)
ipc 81.27% <ø> (ø)
profiling 77.52% <ø> (ø)
profiling-ffi 60.05% <ø> (ø)
serverless 0.00% <ø> (ø)
sidecar 31.02% <0.00%> (-0.74%) ⬇️
sidecar-ffi 0.00% <ø> (ø)
spawn-worker 54.98% <ø> (ø)
trace-mini-agent 69.12% <66.66%> (ø)
trace-normalization 97.79% <ø> (ø)
trace-obfuscation 95.74% <ø> (ø)
trace-protobuf 25.64% <ø> (ø)
trace-utils 64.07% <28.42%> (-4.79%) ⬇️


#[derive(Debug, Clone)]
pub struct SendData {
pub tracer_payloads: Vec<pb::TracerPayload>,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we really need to expose tracer_payloads publicly? From what I can tell, it's only for the tests in the trace-mini-agent, which is not a good pattern. We shouldn't be exposing internal implementation details just for unit tests in this way.

If we are only exposing tracer_payloads for tests, could we refactor the tests to use httpmock and properly test the outputs? If that is something that we want to explore, I think we could do that in a follow-up PR since we want to get this merged in soon. Maybe we can just add a TODO comment then?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a good point. We shouldn't expose internals unnecessarily in this way, but I think that going all the way to sending to the httpmock isn't necessary. Just adding a test only method on the SendData implementation that hands out payload number X, is fine by me.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I struggled with this because coalesce_send_data needs to mutate the tracer_payloads fields so I thought about implementing a getter to return a mutable reference but in the end is kind of the same as making the the field public. Also I thought about restructuring the SendData and TraceFlusher to improve the encapsulation but I realized it'd overcomplicate this PR. Anyway if you guys think the getter is a better solution we can go that way or if you can think of a better solution please let me know.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That can be public to the crate, right?

}
}

async fn update(
Copy link
Contributor

@ekump ekump May 20, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@hoolioh How do you envision this working with retry logic? Would we call update for each attempt of a request payload? Or should update only be called for a payload once, regardless of how many retries it took?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since the retry logic tries to minimize errors I'd choose the latter because it'd be confusing to send errors or dropped statistics for traces that will end up being sent.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@hoolioh Ah, that's a great point. Maybe retry metrics should be it's own thing at some point? It may be useful information.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that the retries can be deduced from the number of API requests, the number of failures of different types, as well as the number of dropped traces.

}

impl SendDataResult {
fn new() -> SendDataResult {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since you're not taking in any parameters, should this just be a Default impl?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed

use serde::{Deserialize, Serialize};
use std::collections::HashMap;

pub use crate::send_data::SendData;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nitpick: Should we move trace_utils to use a mod.rs file like we do in the sidecar to handle exporting like this? Feels a bit awkward / indirect like this. (If we want to do this, we can do in a separate PR).

futures.push(self.send(
self.trace_api_responses,
*count as f64,
vec![Tag::new("status_code", status_code.to_string().as_str()).unwrap()],
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The most minor of nitpicks, feel free to ignore: &status_code.to_string() also works and is a bit less verbose.

}
}

pub fn collect_metrics(&self) -> TraceFlusherMetrics {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know it's technically not mutable, but I'm not sure the name collect_metrics conveys the actual mutability of this function on TraceFlusherMetrics? Also, I don't think it's actually collecting metrics at this point (the caller is doing the collecting)?

What about calling the function remove(), similar to the remove function in HashMap? Or something like fetch_and_clear?

Copy link
Contributor

@bantonsson bantonsson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As mentioned internally, it’s fine to merge this as is and make changes in the follow up PR.

@hoolioh hoolioh marked this pull request as ready for review May 21, 2024 10:39
@hoolioh hoolioh requested review from a team as code owners May 21, 2024 10:39
@hoolioh hoolioh force-pushed the luc/sidecar-collect-trace_api-metrics branch from 6fda63a to ef3c62e Compare May 21, 2024 10:39
@hoolioh hoolioh merged commit f44bf09 into main May 21, 2024
@hoolioh hoolioh deleted the luc/sidecar-collect-trace_api-metrics branch May 21, 2024 11:06
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants