-
Notifications
You must be signed in to change notification settings - Fork 16
Collect some of the trace_api.* metrics in the trace flusher
#418
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
e0090f8 to
f442b4d
Compare
f442b4d to
2d2ab5e
Compare
19a7ade to
6fda63a
Compare
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #418 +/- ##
==========================================
- Coverage 66.26% 65.92% -0.34%
==========================================
Files 189 191 +2
Lines 23650 23767 +117
==========================================
- Hits 15672 15669 -3
- Misses 7978 8098 +120
|
|
|
||
| #[derive(Debug, Clone)] | ||
| pub struct SendData { | ||
| pub tracer_payloads: Vec<pb::TracerPayload>, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we really need to expose tracer_payloads publicly? From what I can tell, it's only for the tests in the trace-mini-agent, which is not a good pattern. We shouldn't be exposing internal implementation details just for unit tests in this way.
If we are only exposing tracer_payloads for tests, could we refactor the tests to use httpmock and properly test the outputs? If that is something that we want to explore, I think we could do that in a follow-up PR since we want to get this merged in soon. Maybe we can just add a TODO comment then?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a good point. We shouldn't expose internals unnecessarily in this way, but I think that going all the way to sending to the httpmock isn't necessary. Just adding a test only method on the SendData implementation that hands out payload number X, is fine by me.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I struggled with this because coalesce_send_data needs to mutate the tracer_payloads fields so I thought about implementing a getter to return a mutable reference but in the end is kind of the same as making the the field public. Also I thought about restructuring the SendData and TraceFlusher to improve the encapsulation but I realized it'd overcomplicate this PR. Anyway if you guys think the getter is a better solution we can go that way or if you can think of a better solution please let me know.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That can be public to the crate, right?
| } | ||
| } | ||
|
|
||
| async fn update( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@hoolioh How do you envision this working with retry logic? Would we call update for each attempt of a request payload? Or should update only be called for a payload once, regardless of how many retries it took?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since the retry logic tries to minimize errors I'd choose the latter because it'd be confusing to send errors or dropped statistics for traces that will end up being sent.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@hoolioh Ah, that's a great point. Maybe retry metrics should be it's own thing at some point? It may be useful information.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that the retries can be deduced from the number of API requests, the number of failures of different types, as well as the number of dropped traces.
| } | ||
|
|
||
| impl SendDataResult { | ||
| fn new() -> SendDataResult { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since you're not taking in any parameters, should this just be a Default impl?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed
| use serde::{Deserialize, Serialize}; | ||
| use std::collections::HashMap; | ||
|
|
||
| pub use crate::send_data::SendData; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nitpick: Should we move trace_utils to use a mod.rs file like we do in the sidecar to handle exporting like this? Feels a bit awkward / indirect like this. (If we want to do this, we can do in a separate PR).
| futures.push(self.send( | ||
| self.trace_api_responses, | ||
| *count as f64, | ||
| vec![Tag::new("status_code", status_code.to_string().as_str()).unwrap()], |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The most minor of nitpicks, feel free to ignore: &status_code.to_string() also works and is a bit less verbose.
| } | ||
| } | ||
|
|
||
| pub fn collect_metrics(&self) -> TraceFlusherMetrics { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I know it's technically not mutable, but I'm not sure the name collect_metrics conveys the actual mutability of this function on TraceFlusherMetrics? Also, I don't think it's actually collecting metrics at this point (the caller is doing the collecting)?
What about calling the function remove(), similar to the remove function in HashMap? Or something like fetch_and_clear?
bantonsson
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As mentioned internally, it’s fine to merge this as is and make changes in the follow up PR.
6fda63a to
ef3c62e
Compare
What does this PR do?
This PR splits the
tracer_utilsinto smaller parts and collects metrics for:These metrics are then sent by the "self telemetry" worker of the sidecar.
Motivation
What inspired you to submit this pull request?
Additional Notes
Anything else we should know when reviewing?
How to test the change?
Describe here in detail how the change can be validated.