Skip to content

pelodelfuego/bt-kde

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

7 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Bt-kde

BallTree KDE is a perf oriented implementation of kernel density estimation.
While performing prediction of probabilities on a lot en entries,
I realized classical implementation does not scale well with the number of points.

This implementation takes advantage of the sparsity of the distribution to address this problematic.

Algorithm

This implementation use the BallTree property to 'fuzzy index' the distribution with a custom metric in order to cluster points together.
It reduce the number of queries and ponderate the result by the size of each cluster.
This approximation is valid only if the points are dense enough regarding to the kernel bandwidth.

It is especially adapted for prediction on sparse distributions.

Only the Gaussian kernel is implemented at the moment.

Visually speaking

Performing a KDE is actually equivalent to convolve the distribution by the kernel.

When points of the distribution are close enough regarding to the kernel bandwidth,
We can compute only one ponderated kernel and assume the result will be almost equal to the sum of the 2 kernels.

  • Green: exact curve
  • Blue: approximated curve
  • Red: absolute error

Note: Here is a 1D example, in N dimensions, the closeness of points is evaluated by an ellipse equation.
It allows to use different bandwidth on each dimensions.

Formally speaking

This is also true for aggregation of N points. In such a case, the kernel would be ponderated by N.

Conclusion

As always, the tradeoff we are making here depends on the nature of the data,
but having a few concentrated clusters make this approximation suitable.
Depending on the precision needed, it is also possible to tweak the metric so it aggregate smaller clusters.

At this cost, we dropped the perf by a factor ~20 (mode details on the demo notebook).

This experiment highlight the versatility of BallTree which allow bunch of application with a custom metric. (PS: it is awesome !)

We can also note, it is possible to use different bandwidth on each dimensions.

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published