Skip to content

Commit b72d21c

Browse files
committed
feat: implement TDigest for approx quantile
Adds a [TDigest] implementation providing approximate quantile estimations of large inputs using a small amount of (bounded) memory. A TDigest is most accurate near either "end" of the quantile range (that is, 0.1, 0.9, 0.95, etc) due to the use of a scalaing function that increases resolution at the tails. The paper claims single digit part per million errors for q ≤ 0.001 or q ≥ 0.999 using 100 centroids, and in practice I have found accuracy to be more than acceptable for an apprixmate function across the entire quantile range. The implementation is a modified copy of https://github.com/MnO2/t-digest, itself a Rust port of [Facebook's C++ implementation]. Both Facebook's implementation, and Mn02's Rust port are Apache 2.0 licensed. [TDigest]: https://arxiv.org/abs/1902.04023 [Facebook's C++ implementation]: https://github.com/facebook/folly/blob/main/folly/stats/TDigest.h
1 parent b05feda commit b72d21c

File tree

2 files changed

+819
-0
lines changed

2 files changed

+819
-0
lines changed

datafusion/src/physical_plan/mod.rs

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -655,6 +655,7 @@ pub mod sort;
655655
pub mod sort_preserving_merge;
656656
pub mod stream;
657657
pub mod string_expressions;
658+
pub(crate) mod tdigest;
658659
pub mod type_coercion;
659660
pub mod udaf;
660661
pub mod udf;

0 commit comments

Comments
 (0)