Bt-kde

BallTree KDE is a perf oriented implementation of kernel density estimation.
While performing prediction of probabilities on a lot en entries,
I realized classical implementation does not scale well with the number of points.

This implementation takes advantage of the sparsity of the distribution to address this problematic.

Algorithm

This implementation use the BallTree property to 'fuzzy index' the distribution with a custom metric in order to cluster points together.
It reduce the number of queries and ponderate the result by the size of each cluster.
This approximation is valid only if the points are dense enough regarding to the kernel bandwidth.

It is especially adapted for prediction on sparse distributions.

Only the Gaussian kernel is implemented at the moment.

Visually speaking

Performing a KDE is actually equivalent to convolve the distribution by the kernel.

When points of the distribution are close enough regarding to the kernel bandwidth,
We can compute only one ponderated kernel and assume the result will be almost equal to the sum of the 2 kernels.

Green: exact curve
Blue: approximated curve
Red: absolute error

Note: Here is a 1D example, in N dimensions, the closeness of points is evaluated by an ellipse equation.
It allows to use different bandwidth on each dimensions.

Formally speaking

This is also true for aggregation of N points. In such a case, the kernel would be ponderated by N.

Conclusion

As always, the tradeoff we are making here depends on the nature of the data,
but having a few concentrated clusters make this approximation suitable.
Depending on the precision needed, it is also possible to tweak the metric so it aggregate smaller clusters.

At this cost, we dropped the perf by a factor ~20 (mode details on the demo notebook).

This experiment highlight the versatility of BallTree which allow bunch of application with a custom metric. (PS: it is awesome !)

We can also note, it is possible to use different bandwidth on each dimensions.

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
img		img
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
bt_kde.py		bt_kde.py
demo.ipynb		demo.ipynb
ellipse_dist.pyx		ellipse_dist.pyx

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Bt-kde

Algorithm

Visually speaking

Formally speaking

Conclusion

About

Releases

Packages

Languages

License

pelodelfuego/bt-kde

Folders and files

Latest commit

History

Repository files navigation

Bt-kde

Algorithm

Visually speaking

Formally speaking

Conclusion

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages