- 
                Notifications
    You must be signed in to change notification settings 
- Fork 1.7k
Description
Is your feature request related to a problem or challenge? Please describe what you are trying to do.
I would like to efficiently aggregate (approximate) quantile values from a column of data - "show me the 99th percentile of the latency column in the requests table"
Describe the solution you'd like
Implement TDigest (or similar algorithm) to provide relatively cheap quantile values/estimations.
Describe alternatives you've considered
I've had a look at some other DBs:
- duckdb - tdigest & reservoir sampling
- timescaledb - tdigest & uddsketch
- snowflake - several options, including tdigest for cheap approximations
- presto - qdigest
- influxdb - tdigest
For approximate results, tdigest seems popular, though the uddsketch paper is relatively new and also interesting.
Additional context
Tdigest provides quantile estimatations, I imagine it would expose an approx_quantile(column, quantile) aggregation keeping with the naming of the approx_distinct() aggregation.
Example:
SELECT approx_quantile(latency, 0.99) AS p99 FROM requests;