-
Notifications
You must be signed in to change notification settings - Fork 510
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add new histogram to generator - messaging system latency #3453
Add new histogram to generator - messaging system latency #3453
Conversation
Thanks for putting some time into this. It's a fairly small change should be a pretty easy review, but we do need this to be configurable. We are currently trying to keep metrics generator series down and histograms tend to generate a lot of series. I will give this more time tomorrow. |
Just pushed some changes to allow controlling the metric from config. Also, the metric is initialized even if the config is set to |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
a few questions, but looking good!
also, we will need a changelog
@@ -315,8 +329,14 @@ func (p *Processor) spanFailed(span *v1_trace.Span) bool { | |||
return span.GetStatus().GetCode() == v1_trace.Status_STATUS_CODE_ERROR | |||
} | |||
|
|||
func unixNanosDiffSec(unixNanoStart uint64, unixNanoEnd uint64) float64 { | |||
// handling potential underflow of unit64s substraction | |||
diff := int64(unixNanoEnd) - int64(unixNanoStart) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
instead of casting we could check to see if diff > end for overflow. is that better?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
also, we recently made a change to protect span metrics from negative latencies:
https://github.com/grafana/tempo/pull/3412/files
@mdisibio should we make sure the same protections are here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
instead of casting we could check to see if diff > end for overflow. is that better?
I agree we can make it cleaner,
Really liked your idea, very clever 🙃 My only concern about it is how clear it will be. After making the check if diff > end, if underflow did happen, meaning the subtraction yields a negative number - don't we want to return it?
If not, maybe we could make this function a bit more specific and move the negative values "protection" into it.. and then.. return 0?
It could also be that simple:
func unixNanosDiffSec(unixNanoStart uint64, unixNanoEnd uint64) float64 {
if unixNanoStart > unixNanoEnd {
// To prevent underflow, return 0.
return 0
}
// Safe subtraction.
return float64(unixNanoEnd-unixNanoStart) / 1e9
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
let's follow suit with the linked PR for consistency and return 0.
i think your suggested unixNanosDiffSec
in this PR is perfect.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
so this one is resolved? 🤔
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added to changelog.
Will update docs after making a decision about the "negative diffs"
@@ -315,8 +329,14 @@ func (p *Processor) spanFailed(span *v1_trace.Span) bool { | |||
return span.GetStatus().GetCode() == v1_trace.Status_STATUS_CODE_ERROR | |||
} | |||
|
|||
func unixNanosDiffSec(unixNanoStart uint64, unixNanoEnd uint64) float64 { | |||
// handling potential underflow of unit64s substraction | |||
diff := int64(unixNanoEnd) - int64(unixNanoStart) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
instead of casting we could check to see if diff > end for overflow. is that better?
I agree we can make it cleaner,
Really liked your idea, very clever 🙃 My only concern about it is how clear it will be. After making the check if diff > end, if underflow did happen, meaning the subtraction yields a negative number - don't we want to return it?
If not, maybe we could make this function a bit more specific and move the negative values "protection" into it.. and then.. return 0?
It could also be that simple:
func unixNanosDiffSec(unixNanoStart uint64, unixNanoEnd uint64) float64 {
if unixNanoStart > unixNanoEnd {
// To prevent underflow, return 0.
return 0
}
// Safe subtraction.
return float64(unixNanoEnd-unixNanoStart) / 1e9
}
Thanks for the work. This is looking good. heads up you will need this commit in your branch (merge or rebase) to get CI to pass: |
Synced from upstream + added docs. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looking good, just have a few small thoughts.
Also, what do you think about adding the ConnectionType to the expired edge metric? If someone is wanting to do service graph metrics through queues I'm concerned it will create a lot of expired edges. It might be nice for an operator to know if their expired edges are largely queues or synchronous calls.
docs/sources/tempo/metrics-generator/service_graphs/estimate-cardinality.md
Outdated
Show resolved
Hide resolved
docs/sources/tempo/metrics-generator/service_graphs/estimate-cardinality.md
Show resolved
Hide resolved
…ardinality.md use present when possible Co-authored-by: Kim Nylander <104772500+knylander-grafana@users.noreply.github.com>
0b29823
to
7121ecb
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ready to merge this one? @joe-elliott
Just rebased 🙃
Thank you for updating the docs |
@joe-elliott I think we're ready 🥳 |
running CI! |
Thanks, Adir! |
* introduce new service-graph metric for messaging-system latency * added tests for new histogram values * fix linting * make new metric optional via config * fix typo * fix failing tests * add feature to changelog * negative times diff consistency - return 0 instead of negative * update docs * Update docs/sources/tempo/metrics-generator/service_graphs/estimate-cardinality.md use present when possible Co-authored-by: Kim Nylander <104772500+knylander-grafana@users.noreply.github.com> * change 1e9 to time const * added a reference to the "wait" config of the processor * fixed indentations and formatting stuff from rebasing * removed mistaken println found by linter --------- Co-authored-by: Kim Nylander <104772500+knylander-grafana@users.noreply.github.com>
What this PR does:
This PR adds a new metric calculated by the metrics-generator service-graph processor.
The metric is a histogram for the latency between 2 services communicating through a messaging system.
Which issue(s) this PR fixes:
Fixes #3232
Checklist
CHANGELOG.md
updated - the order of entries should be[CHANGE]
,[FEATURE]
,[ENHANCEMENT]
,[BUGFIX]