-
Notifications
You must be signed in to change notification settings - Fork 2.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[processor/tailsampling] probabilistic policy hash distribution not good enough #27044
Comments
Pinging code owners:
See Adding Labels via Comments if you do not have permissions to add labels yourself. |
@jiekun, do you have a proposal to improve the current one? |
@jpkrohling I am thinking about using:
But I think it's important to know the limitations(/Why) of the previous design. |
I don't think that's going to cut either. We should probably wait for open-telemetry/oteps#235. I think @jmacd is keeping track of this and will work on the collector side of that. |
Thanks for mentioning that. Let me discuss this with @jmacd in Slack later. I think that OTEP of sampling algorithm does help on this issue. It should be easy to reuse some code and align with the OTEP implementation. Next update ETA: before Oct 14th. |
Do you have an update on this, @jiekun? |
apologize for the delay, I am still tracking this issue, and will have update soon. @jpkrohling |
I am not sure why this "auto-incrementing" trace ID policy would be used, ever. What is the use-case for auto-incrementing trace IDs? |
In the sampling SIG we often refer to https://github.com/rurban/smhasher, which is an extensive test for hashing functions which shows, unfortunately, that most of them fail at the task you're setting up for them, when you use auto-incrementing IDs. I also want to correct the statement I made -- I know about uses of auto-incrementing TraceIDs -- there is a performance problem when TraceIDs require so many bits of randomness. That is why the above-referenced specification (working with W3C) has settled on only 56 bits of randomness. I've seen cases where thread-local TraceID generators are used that would auto-increment let's say 12 bits while using 44 bits of randomness. You could experiment with that approach and see how far you can take it. Also, one could argue that the tests here are the problem, they're not good enough. The sampler is definitely outside of its error bounds, so to speak, but then it has hard-coded assumptions--it relies on the randomness from Thanks. I suggest closing this issue, but I'll leave it for the original author to decide. |
@jmacd Hi thanks for your comment! I feel like I need to clearify a few things.
So what I'm questioning about is the design of current impl:
I think the hash algo is fine but what's the best practice after calculated the hash value? For example:
If we agree that the impl is fine, I can close this issue and leave it as is. Thanks for all your point of views. |
A few additional points to consider:
For the current implementation, I think the acceptable behaviour is:
|
I'd be interested in seeing the metrics showing that this is the case. Small variations on big workloads are expected. For small workloads, a higher variance is expected. |
I have some further feedback from @jmacd via Slack. I would like to move this issue to pending status and see if we got further feedback from the original user. If not, then I will close it later. |
I'm closing this issue now. But if someone come up with new issue / idea (and more importantly the example and metrics), feel free to re-open and discuss here. |
Component(s)
No response
What happened?
Description
When testing probabilistic with auto increasing
trace_id
, the sampling rate is not match the expected probability.Steps to Reproduce
The
trace_id
we used inprobabilitistic_test.go
for probabilistic policy is random:However, if we use special (auto increasing)
trace_id
, most test cases will failed:Obviously, the hash result is strongly relevent to the original input in this case. It works well when the
trace_id
is fully random, but failed when using special trace_id.It's still uncleared that if it will affect other
trace_id
pattern (which is auto increasing). But I think we need to discuss if we should keep this algorithm or switch to antoher one.ping original author @kvrhdn as well #3876
Expected Result
Actual Result
TraceID examples
Format:
Collector version
since #3876
Environment information
No response
OpenTelemetry Collector configuration
No response
Log output
No response
Additional context
No response
The text was updated successfully, but these errors were encountered: