-
-
Notifications
You must be signed in to change notification settings - Fork 64
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Probabilistic Data Structures #95
Comments
Sure! That looks very interesting. I'll give the paper a closer look, but I'm not sure if I'll have the bandwidth to work on implementing it for a while. |
Let me see if I can get somewhere quickly from my Java version. |
So I have a working version for the cdf function. The quantile function isn't there yet. It needs more tests. And it needs to conform to your API expectations. And I would like to add in a logHistogram implementation as well. How did you imagine this should work? Right now I have a I currently have a Is this what you are expecting? |
Check out the
As long as Broadcast.broadcastable(o::OnlineStat) = Ref(o) |
That's very helpful. In the short-term, even before I step up to integrating I will add the broadcastable snippet directly. Your comment about cdf being a collision makes me wonder if I shouldn't make a MergingDigest be a Distribution somehow. There is no real reason I couldn't support sampling from the empirical distribution that way. |
In fact, anything that supports a quantile operation can be sampled. |
A package that implements this would be pretty cool. e.g. x = QuantileSampler(thing_with_quantile_method)
rand(x) But for all I know this already exists in Distributions.jl or elsewhere. Edit: Maybe never mind. You're thinking of this?: quantile(thing, rand()) |
See https://github.com/tdunning/TDigest/ for a beginning. I think this does everything needed, but it isn't tested hard enough to make me super confident of that. |
Comments and criticisms are VERY welcome |
There are some issues still on the TDigest.jl largely around sort stability. Secondary problem is lack of mind share on my part due to day job. Not surprisingly, the Julia implementation is simpler than the Java version. |
The text was updated successfully, but these errors were encountered: