Add PercentileKLL aggregation function#10643
Conversation
Codecov Report
@@ Coverage Diff @@
## master #10643 +/- ##
=============================================
- Coverage 64.40% 13.66% -50.75%
+ Complexity 6446 439 -6007
=============================================
Files 2098 2103 +5
Lines 113315 113521 +206
Branches 17204 17255 +51
=============================================
- Hits 72984 15511 -57473
- Misses 35082 96740 +61658
+ Partials 5249 1270 -3979
Flags with carried forward coverage won't be shown. Click here to find out more.
... and 1444 files with indirect coverage changes 📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more |
pinot-core/src/main/java/org/apache/pinot/core/common/ObjectSerDeUtils.java
Outdated
Show resolved
Hide resolved
.../java/org/apache/pinot/core/query/aggregation/function/PercentileKLLAggregationFunction.java
Show resolved
Hide resolved
|
|
||
| _percentile = arguments.get(1).getLiteral().getDoubleValue(); | ||
| if (numArguments == 3) { | ||
| _kValue = arguments.get(2).getLiteral().getIntValue(); |
There was a problem hiding this comment.
A similar check should be done on the K value. There's could be some memory implications from setting the K to be too high: https://datasketches.apache.org/docs/KLL/KLLAccuracyAndSize.html
There was a problem hiding this comment.
The library already returns a good exception (SketchesArgumentException: K must be >= 8 and <= 65535), so I didn't see much value in doing another check here.
For the memory implication, I think it should be up to the user to decide if the size of the sketch is going to be a problem.
There was a problem hiding this comment.
Makes sense. Did not know about SketchesArgumentException. That should be good enough if bubbled up properly
...t-segment-local/src/main/java/org/apache/pinot/segment/local/customobject/SerializedKLL.java
Outdated
Show resolved
Hide resolved
pinot-segment-spi/src/main/java/org/apache/pinot/segment/spi/AggregationFunctionType.java
Outdated
Show resolved
Hide resolved
...c/main/java/org/apache/pinot/core/query/aggregation/function/AggregationFunctionFactory.java
Show resolved
Hide resolved
...c/main/java/org/apache/pinot/core/query/aggregation/function/AggregationFunctionFactory.java
Show resolved
Hide resolved
.../java/org/apache/pinot/core/query/aggregation/function/PercentileKLLAggregationFunction.java
Show resolved
Hide resolved
pinot-core/src/test/java/org/apache/pinot/queries/PercentileKLLQueriesTest.java
Outdated
Show resolved
Hide resolved
|
|
||
| _percentile = arguments.get(1).getLiteral().getDoubleValue(); | ||
| if (numArguments == 3) { | ||
| _kValue = arguments.get(2).getLiteral().getIntValue(); |
There was a problem hiding this comment.
Makes sense. Did not know about SketchesArgumentException. That should be good enough if bubbled up properly
| * Compares two sketches for a specific percentile value. | ||
| */ | ||
| public class SerializedKLL implements Comparable<SerializedKLL> { | ||
| private final double _quantile; |
There was a problem hiding this comment.
why do we need _quantile inside this SerializedKLL?
if we want to return the KllDoublesSketch as byte[], the toString() is for that purpose?
There was a problem hiding this comment.
Good question. That is used when we want to sort (ORDER BY) sketches.
|
@navina, not yet. I'll add it to OSS documentation soon. |
Introducing a new approximate percentile calculation function
PercentileKLLand its variations (MV & Raw), using Apache Datasketches libraries 'KLL'.This is part of a proposal to improve Apache Datasketches support in Pinot:
(Google Docs Link) [Proposal] Improved Apache DataSketches Support in Pinot
Some advantages listed and discussed in the linked document:
Please leave design related comments on the linked document and code related comments in this PR.
Testing
PercentileKLL,PercentileKLLMV,PercentileRawKLL,PercentileRawKLLMVfeatureperformance