-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
bugfix: native histogram: data race in addExemplar #1609
Conversation
Signed-off-by: György Krajcsovits <gyorgy.krajcsovits@grafana.com>
Instead of checking non thread safe slice capacity Signed-off-by: György Krajcsovits <gyorgy.krajcsovits@grafana.com>
mdIdx was redundant when len(exemplars)>1, so got rid of it, rIdx is enough. Don't compare timestamp of incoming exemplar to timestamp of minimal distance exemplar. Most of the time the incoming exemplar will be newer. And if not, the previous code just replaced an exemplar one index after the minimal distance exemplar. Which had an index out of range bug, plus is essentially random. Signed-off-by: György Krajcsovits <gyorgy.krajcsovits@grafana.com> # Conflicts: # prometheus/histogram.go
Code looks good, but I'm not quite sure if we should do this kind of optimization via ttl as an atomic int. Those atomic operations still have a cost (and it could be significant on systems with many cores). The usual code path is to call |
Well, TTL is constant, so in theory we don't need to use atomics. Pushed that version. |
Signed-off-by: György Krajcsovits <gyorgy.krajcsovits@grafana.com>
5b005fe
to
6c6aa53
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for updating the test in the first commit to show the bug. Helps a lot with the review :)
I do understand that using ttl
instead of cap(n.exemplars)
solves the data race condition, so LGTM now that tests pass.
About the index out of range problem. I'm having a hard time understanding the codebase, but I believe in Beorn's expertise here. Out of scope for this PR but I think more descriptive variable names could help
md = -1.0 // Logarithm of the delta of the closest pair of exemplars. | ||
rIdx = -1 // Index of the older exemplar within the closest pair and where we need to insert the new exemplar. | ||
cLog float64 // Logarithm of the current exemplar. | ||
pLog float64 // Logarithm of the previous exemplar. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmmm, totally out of scope for this PR, but more descriptive variable names would remove the necessity for commenting on their meaning 😕
I'm not sure if I'm missing something, but is there a reason to name these variables rIdx
, cLog
, pLog
instead of removeIndex
, currentLog
or previousLog
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Urgh, reading again replaceIndex
is probably a better name for rIdx
. Does it make nIdx
obsolete? (Do I understand correctly that nIdx
is newIndex
? 😅
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Historically, we had very different "tastes" playing out between various Prometheus developers, all the way from very long and descriptive names to "use one letter names whenever possible" (slightly exaggerated).
My observation is that Go developers tend towards shorter variable names (have a look at the standard library, for example). But we never explicitly stated a preference for the Prometheus project. In lack of a specific guideline, I would go with generally good practices of programming like "the wider the scope of a variable, the more descriptive its name".
In this case, the variables are local to a (fairly long) function. So I wouldn't be too demanding about the descriptiveness of their names. Furthermore, even previousLog
wouldn't really be self explanatory. (Previous log line?) And a name like logarithmOfPreviousExemplar
would be clearly too clunky. Given that the comment is needed anyway, I actually like the names as they are. Still very short, but better than just c
, p
, and r
(which others might even prefer).
Should we close #1608 in favor of this one? |
|
Ack, I'll update the other PR and we can just close this if we merge that. |
Signed-off-by: György Krajcsovits <gyorgy.krajcsovits@grafana.com>
Signed-off-by: György Krajcsovits <gyorgy.krajcsovits@grafana.com>
Fixes: #1605
Slices
cap
function is not thread safe. Reuse instead thettl
to store the ttl atomically with -1 indicating exemplars turned off.Includes fix for #1607 from #1608 .