Skip to content

Add approx_quantile support #1538

@domodwyer

Description

@domodwyer

Is your feature request related to a problem or challenge? Please describe what you are trying to do.
I would like to efficiently aggregate (approximate) quantile values from a column of data - "show me the 99th percentile of the latency column in the requests table"

Describe the solution you'd like
Implement TDigest (or similar algorithm) to provide relatively cheap quantile values/estimations.

Describe alternatives you've considered
I've had a look at some other DBs:

  • duckdb - tdigest & reservoir sampling
  • timescaledb - tdigest & uddsketch
  • snowflake - several options, including tdigest for cheap approximations
  • presto - qdigest
  • influxdb - tdigest

For approximate results, tdigest seems popular, though the uddsketch paper is relatively new and also interesting.

Additional context
Tdigest provides quantile estimatations, I imagine it would expose an approx_quantile(column, quantile) aggregation keeping with the naming of the approx_distinct() aggregation.

Example:

SELECT approx_quantile(latency, 0.99) AS p99 FROM requests;

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions