[Feature Request] Increase default percentiles agg `compression` from 100 -> 200

### Is your feature request related to a problem? Please describe

This is a followup to https://github.com/opensearch-project/OpenSearch/pull/18124. 

It occurred to me that the new MergingDigest implementation uses ["much less than half"](https://github.com/tdunning/t-digest/blob/7905f3d2ad18e7d7176811147d1316a3e23d7061/core/src/main/java/com/tdunning/math/stats/MergingDigest.java#L61) of the memory of the older AVLTreeDigest, so we can afford to increase accuracy by increasing the number of centroids stored in the digest. This is controlled roughly linearly with the `compression` parameter, where higher `compression` --> more centroids stored --> higher accuracy, but higher memory usage. Currently its default is 100. I think we should increase this to 200, which significantly improved accuracy in my test.

Since the implementation uses less than half the memory, we would still be using less memory than we did before https://github.com/opensearch-project/OpenSearch/pull/18124 was merged. 

Benchmark numbers for accuracy and latency are in the Additional Context section. 

### Describe the solution you'd like

We should increase the default compression from 100 to 200 or maybe higher. 

### Related component

Search:Aggregations

### Describe alternatives you've considered

_No response_

### Additional context

I tested this with OSB's http_logs on a c5.2xl instance. 

Doesn't look like there's a latency impact: 
| Field | p50 (AVLTreeDigest, compression=100) | p50 (MergingDigest, compression=100) | p50 (MergingDigest, compression=200) |
| -- | -- | -- | -- | 
| timestamp | 13085 | 4910 | 4880 | 
| status | 196794 | 5694 | 5710 | 

We can check accuracy with `status` since it's low cardinality so we can easily get ground truth with the `terms` aggregation. 

| Percentile | True value | Reported value (compression = 100) | Reported value (compression = 200) | Reported value (compression = 300) | 
| -- | -- | -- | -- | -- | 
| 1 | 200.0 | 200.0 | 200.0 | 200.0 | 
| 5 | 200.0 | 200.0 | 200.0 | 200.0 | 
| 25 | 200.0 | 200.0 | 200.0 | 200.0 | 
| 50 | 200.0 | 200.0 | 200.0 | 200.0 | 
| 75 | 200.0 | 203.5288 | 200.0 | 200.0 | 
| 85 | 304.0 | 257.8974 | 262.6762 | 262.8854 | 
| 90 | 304.0 | 295.2672 | 303.9587 | 304.0 | 
| 95 | 304.0 | 304.0 | 304.0 | 304.0 | 
| 99 | 304.0 | 307.9816 | 304.0 | 304.0 | 
| 99.9 | 404.0 | 404.0 | 404.0 | 404.0 | 
| 99.99 | 404.0 | 404.0 | 404.0 | 404.0 | 
| 99.999 | 500.0 | 499.4360 | 500.0 | 500.0 | 

Note we switch from 200 --> 304 at the 84.48th percentile which is probably why all values perform quite badly at the nearby p85. The digest is also meant to be more accurate at extreme low/high percentiles. 

Overall the compression=100 result is surprisingly bad for this use case. I think t-digest in general performs less well for low-cardinality data like this, but it does seem we can fix most of the issues by just increasing compression to 200. 300 doesn't seem to provide much more accuracy than 200. 

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

[Feature Request] Increase default percentiles agg `compression` from 100 -> 200 #18458

Is your feature request related to a problem? Please describe

Describe the solution you'd like

Related component

Describe alternatives you've considered

Additional context

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Percentile	True value	Reported value (compression = 100)	Reported value (compression = 200)	Reported value (compression = 300)
1	200.0	200.0	200.0	200.0
5	200.0	200.0	200.0	200.0
25	200.0	200.0	200.0	200.0
50	200.0	200.0	200.0	200.0
75	200.0	203.5288	200.0	200.0
85	304.0	257.8974	262.6762	262.8854
90	304.0	295.2672	303.9587	304.0
95	304.0	304.0	304.0	304.0
99	304.0	307.9816	304.0	304.0
99.9	404.0	404.0	404.0	404.0
99.99	404.0	404.0	404.0	404.0
99.999	500.0	499.4360	500.0	500.0

Uh oh!

[Feature Request] Increase default percentiles agg compression from 100 -> 200 #18458

Description

Is your feature request related to a problem? Please describe

Describe the solution you'd like

Related component

Describe alternatives you've considered

Additional context

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions

[Feature Request] Increase default percentiles agg `compression` from 100 -> 200 #18458