-
Couldn't load subscription status.
- Fork 2.3k
Description
Is your feature request related to a problem? Please describe
This is a followup to #18124.
It occurred to me that the new MergingDigest implementation uses "much less than half" of the memory of the older AVLTreeDigest, so we can afford to increase accuracy by increasing the number of centroids stored in the digest. This is controlled roughly linearly with the compression parameter, where higher compression --> more centroids stored --> higher accuracy, but higher memory usage. Currently its default is 100. I think we should increase this to 200, which significantly improved accuracy in my test.
Since the implementation uses less than half the memory, we would still be using less memory than we did before #18124 was merged.
Benchmark numbers for accuracy and latency are in the Additional Context section.
Describe the solution you'd like
We should increase the default compression from 100 to 200 or maybe higher.
Related component
Search:Aggregations
Describe alternatives you've considered
No response
Additional context
I tested this with OSB's http_logs on a c5.2xl instance.
Doesn't look like there's a latency impact:
| Field | p50 (AVLTreeDigest, compression=100) | p50 (MergingDigest, compression=100) | p50 (MergingDigest, compression=200) |
|---|---|---|---|
| timestamp | 13085 | 4910 | 4880 |
| status | 196794 | 5694 | 5710 |
We can check accuracy with status since it's low cardinality so we can easily get ground truth with the terms aggregation.
| Percentile | True value | Reported value (compression = 100) | Reported value (compression = 200) | Reported value (compression = 300) |
|---|---|---|---|---|
| 1 | 200.0 | 200.0 | 200.0 | 200.0 |
| 5 | 200.0 | 200.0 | 200.0 | 200.0 |
| 25 | 200.0 | 200.0 | 200.0 | 200.0 |
| 50 | 200.0 | 200.0 | 200.0 | 200.0 |
| 75 | 200.0 | 203.5288 | 200.0 | 200.0 |
| 85 | 304.0 | 257.8974 | 262.6762 | 262.8854 |
| 90 | 304.0 | 295.2672 | 303.9587 | 304.0 |
| 95 | 304.0 | 304.0 | 304.0 | 304.0 |
| 99 | 304.0 | 307.9816 | 304.0 | 304.0 |
| 99.9 | 404.0 | 404.0 | 404.0 | 404.0 |
| 99.99 | 404.0 | 404.0 | 404.0 | 404.0 |
| 99.999 | 500.0 | 499.4360 | 500.0 | 500.0 |
Note we switch from 200 --> 304 at the 84.48th percentile which is probably why all values perform quite badly at the nearby p85. The digest is also meant to be more accurate at extreme low/high percentiles.
Overall the compression=100 result is surprisingly bad for this use case. I think t-digest in general performs less well for low-cardinality data like this, but it does seem we can fix most of the issues by just increasing compression to 200. 300 doesn't seem to provide much more accuracy than 200.
Metadata
Metadata
Assignees
Labels
Type
Projects
Status