Skip to content

[Feature Request] Increase default percentiles agg compression from 100 -> 200 #18458

@peteralfonsi

Description

@peteralfonsi

Is your feature request related to a problem? Please describe

This is a followup to #18124.

It occurred to me that the new MergingDigest implementation uses "much less than half" of the memory of the older AVLTreeDigest, so we can afford to increase accuracy by increasing the number of centroids stored in the digest. This is controlled roughly linearly with the compression parameter, where higher compression --> more centroids stored --> higher accuracy, but higher memory usage. Currently its default is 100. I think we should increase this to 200, which significantly improved accuracy in my test.

Since the implementation uses less than half the memory, we would still be using less memory than we did before #18124 was merged.

Benchmark numbers for accuracy and latency are in the Additional Context section.

Describe the solution you'd like

We should increase the default compression from 100 to 200 or maybe higher.

Related component

Search:Aggregations

Describe alternatives you've considered

No response

Additional context

I tested this with OSB's http_logs on a c5.2xl instance.

Doesn't look like there's a latency impact:

Field p50 (AVLTreeDigest, compression=100) p50 (MergingDigest, compression=100) p50 (MergingDigest, compression=200)
timestamp 13085 4910 4880
status 196794 5694 5710

We can check accuracy with status since it's low cardinality so we can easily get ground truth with the terms aggregation.

Percentile True value Reported value (compression = 100) Reported value (compression = 200) Reported value (compression = 300)
1 200.0 200.0 200.0 200.0
5 200.0 200.0 200.0 200.0
25 200.0 200.0 200.0 200.0
50 200.0 200.0 200.0 200.0
75 200.0 203.5288 200.0 200.0
85 304.0 257.8974 262.6762 262.8854
90 304.0 295.2672 303.9587 304.0
95 304.0 304.0 304.0 304.0
99 304.0 307.9816 304.0 304.0
99.9 404.0 404.0 404.0 404.0
99.99 404.0 404.0 404.0 404.0
99.999 500.0 499.4360 500.0 500.0

Note we switch from 200 --> 304 at the 84.48th percentile which is probably why all values perform quite badly at the nearby p85. The digest is also meant to be more accurate at extreme low/high percentiles.

Overall the compression=100 result is surprisingly bad for this use case. I think t-digest in general performs less well for low-cardinality data like this, but it does seem we can fix most of the issues by just increasing compression to 200. 300 doesn't seem to provide much more accuracy than 200.

Metadata

Metadata

Assignees

Labels

Search:AggregationsdiscussIssues intended to help drive brainstorming and decision makingenhancementEnhancement or improvement to existing feature or request

Type

No type

Projects

Status

🏗 In progress

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions