-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add PartitionedDistinctCount aggregation function #5786
Add PartitionedDistinctCount aggregation function #5786
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
well done
@@ -123,6 +123,8 @@ public static AggregationFunction getAggregationFunction(FunctionContext functio | |||
return new DistinctCountAggregationFunction(firstArgument); | |||
case DISTINCTCOUNTBITMAP: | |||
return new DistinctCountBitmapAggregationFunction(firstArgument); | |||
case PARTITIONEDDISTINCTCOUNT: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we need to start using _ separators. I think the parser now handles both automatically
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Discussed offline. We should still keep no underscore within the aggregation function type, and parser will remove the underscore from the input function name to match the function type.
break; | ||
case LONG: | ||
long[] longValues = blockValSet.getLongValuesSV(); | ||
LongOpenHashSet longSet = aggregationResultHolder.getResult(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I thought roaring bitmap works for long as well?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is a long version roaring bitmap Roaring64NavigableMap
which is still in early stage within the current version of RoaringBitmap 0.8.0
we imported. We can test its performance when we upgrade the RoaringBitmap library for the latest version and change it later as it does not involve any backward-incompatible changes.
@@ -28,6 +28,7 @@ | |||
MINMAXRANGE("minMaxRange"), | |||
DISTINCTCOUNT("distinctCount"), | |||
DISTINCTCOUNTBITMAP("distinctCountBitmap"), | |||
PARTITIONEDDISTINCTCOUNT("partitionedDistinctCount"), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shall we change the function name to something like SEGMENTEDDISTINCTCOUT
or let PARTITIONEDDISTINCTCOUNT
to take a parameter of MergeLevel to indicate segment level
or instance level
merge?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point. Renamed to SEGMENTPARTITIONEDDISTINCTCOUNT
fa12da2
to
c441c8a
Compare
Add a new
PartitionedDistinctCountAggregationFunction
to calculate the number of distinct values when values are partitioned for each segment.This function calculates the exact number of distinct values (using raw value instead of hash code) within the segment, then simply sums up the results from different segments to get the final result.