Conversation
pinot-core/src/main/java/org/apache/pinot/core/operator/blocks/CombinedTransformBlock.java
Outdated
Show resolved
Hide resolved
pinot-core/src/main/java/org/apache/pinot/core/operator/blocks/CombinedTransformBlock.java
Outdated
Show resolved
Hide resolved
pinot-core/src/main/java/org/apache/pinot/core/operator/blocks/CombinedFilterBlock.java
Outdated
Show resolved
Hide resolved
pinot-core/src/main/java/org/apache/pinot/core/operator/filter/CombinedFilterOperator.java
Outdated
Show resolved
Hide resolved
...t-core/src/main/java/org/apache/pinot/core/operator/transform/CombinedTransformOperator.java
Outdated
Show resolved
Hide resolved
richardstartin
left a comment
There was a problem hiding this comment.
I think the code could be more concise in places and this would make the logic easier to follow.
pinot-core/src/main/java/org/apache/pinot/core/operator/SwimLaneDocIdSetOperator.java
Outdated
Show resolved
Hide resolved
pinot-core/src/main/java/org/apache/pinot/core/plan/AggregationPlanNode.java
Outdated
Show resolved
Hide resolved
pinot-perf/src/main/java/org/apache/pinot/perf/BenchmarkFilteredAggregations.java
Outdated
Show resolved
Hide resolved
pinot-perf/src/main/java/org/apache/pinot/perf/BenchmarkFilteredAggregations.java
Outdated
Show resolved
Hide resolved
pinot-core/src/main/java/org/apache/pinot/core/operator/filter/CombinedFilterOperator.java
Outdated
Show resolved
Hide resolved
pinot-core/src/main/java/org/apache/pinot/core/operator/filter/CombinedFilterOperator.java
Outdated
Show resolved
Hide resolved
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## master #7916 +/- ##
============================================
- Coverage 71.24% 64.15% -7.10%
+ Complexity 4262 4180 -82
============================================
Files 1607 1602 -5
Lines 83409 83206 -203
Branches 12458 12441 -17
============================================
- Hits 59426 53380 -6046
- Misses 19941 25979 +6038
+ Partials 4042 3847 -195
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. |
pinot-core/src/main/java/org/apache/pinot/core/operator/blocks/ProjectionBlock.java
Outdated
Show resolved
Hide resolved
| @@ -31,16 +32,29 @@ | |||
|
|
|||
| public class StarTreeDocIdSetPlanNode implements PlanNode { | |||
There was a problem hiding this comment.
From what I am seeing if the filterOperator is present, the existing implementation is completely overridden (new constructor and if statement in run method) to the point that the old code isn't being touched at all. I am wondering if it will be better to create a new class (for example StarTreeFilteredDocIdSetPlanNode) and doing that will also avoid the null checks?
There was a problem hiding this comment.
This class is pretty brief, and I would honestly avoid adding new plan nodes unless there is a major functionality that is different.
pinot-core/src/main/java/org/apache/pinot/core/plan/AggregationPlanNode.java
Outdated
Show resolved
Hide resolved
pinot-core/src/main/java/org/apache/pinot/core/operator/SwimLaneDocIdSetOperator.java
Outdated
Show resolved
Hide resolved
...ain/java/org/apache/pinot/core/query/aggregation/function/FilterableAggregationFunction.java
Outdated
Show resolved
Hide resolved
|
Thanks you @atris ! |
|
@Jackie-Jiang is working on a parallel implementation, so closing this PR to avoid conflict |
|
Discussed with @Jackie-Jiang and we will be convening on this PR itself, so reviving it. Sorry for the confusion. |
Jackie-Jiang
left a comment
There was a problem hiding this comment.
Overall logic looks good. Let's try to further clean up the code
pinot-core/src/main/java/org/apache/pinot/core/operator/blocks/FilterBlock.java
Outdated
Show resolved
Hide resolved
pinot-core/src/main/java/org/apache/pinot/core/operator/ProjectionOperator.java
Outdated
Show resolved
Hide resolved
pinot-core/src/main/java/org/apache/pinot/core/plan/AggregationGroupByOrderByPlanNode.java
Outdated
Show resolved
Hide resolved
...re/src/test/java/org/apache/pinot/queries/InnerSegmentAggregationSingleValueQueriesTest.java
Outdated
Show resolved
Hide resolved
...ore/src/test/java/org/apache/pinot/queries/InterSegmentAggregationMultiValueQueriesTest.java
Outdated
Show resolved
Hide resolved
.../apache/pinot/core/query/request/context/utils/BrokerRequestToQueryContextConverterTest.java
Outdated
Show resolved
Hide resolved
pinot-core/src/main/java/org/apache/pinot/core/query/request/context/QueryContext.java
Outdated
Show resolved
Hide resolved
pinot-core/src/main/java/org/apache/pinot/core/query/request/context/QueryContext.java
Outdated
Show resolved
Hide resolved
pinot-core/src/main/java/org/apache/pinot/core/query/request/context/QueryContext.java
Outdated
Show resolved
Hide resolved
pinot-core/src/main/java/org/apache/pinot/core/plan/AggregationPlanNode.java
Outdated
Show resolved
Hide resolved
pinot-core/src/main/java/org/apache/pinot/core/operator/filter/CombinedFilterOperator.java
Outdated
Show resolved
Hide resolved
pinot-core/src/main/java/org/apache/pinot/core/operator/query/FilteredAggregationOperator.java
Outdated
Show resolved
Hide resolved
pinot-core/src/main/java/org/apache/pinot/core/operator/query/FilteredAggregationOperator.java
Outdated
Show resolved
Hide resolved
6150f87 to
7d85659
Compare
Jackie-Jiang
left a comment
There was a problem hiding this comment.
Mostly good. Please reformat the changes using the Pinot Style (you might want to checkout master to import the latest checkstyle settings)
pinot-core/src/main/java/org/apache/pinot/core/startree/plan/StarTreeProjectionPlanNode.java
Outdated
Show resolved
Hide resolved
| _groupByExpressions = Collections.emptyList(); | ||
| groupByColumns = null; | ||
| } | ||
|
|
There was a problem hiding this comment.
Let's revert this file since it is not relevant
| boolean hasFilteredPredicates = _queryContext.isHasFilteredAggregations(); | ||
| BaseOperator<IntermediateResultsBlock> aggOperator; | ||
| if (hasFilteredPredicates) { | ||
| aggOperator = buildFilteredAggOperator(); |
There was a problem hiding this comment.
(minor) this part can be more concise by directly return instead of putting the operator in an local variable
There was a problem hiding this comment.
The format still doesn't align with the pinot style
| /** | ||
| * Builds the operator to be used for non filtered aggregations | ||
| */ | ||
| private BaseOperator<IntermediateResultsBlock> buildNonFilteredAggOperator() { |
There was a problem hiding this comment.
What I meant is to move the current code into this method, and implement buildFilteredAggOperator() separately. The reason being:
- The metadata/dictionary based operator and star-tree does not apply to the filtered aggregation
- Sharing
buildOperators()method can bring extra overhead to non-filtered aggregations
pinot-core/src/main/java/org/apache/pinot/core/query/request/context/QueryContext.java
Outdated
Show resolved
Hide resolved
| @Nullable FilterContext havingFilter, @Nullable List<OrderByExpressionContext> orderByExpressions, int limit, | ||
| int offset, Map<String, String> queryOptions, @Nullable Map<String, String> debugOptions, | ||
| BrokerRequest brokerRequest) { | ||
| BrokerRequest brokerRequest, boolean hasFilteredAggregations, |
There was a problem hiding this comment.
hasFilteredAggregations should not be set through the constructor. It is updated in generateAggregationFunctions()
| private List<ExpressionContext> _selectExpressions; | ||
| private List<String> _aliasList; | ||
| private FilterContext _filter; | ||
| private ExpressionContext _filterExpression; |
There was a problem hiding this comment.
The change in the Builder is not necessary
| private void generateAggregationFunctions(QueryContext queryContext) { | ||
| List<AggregationFunction> aggregationFunctions = new ArrayList<>(); | ||
| List<Pair<AggregationFunction, FilterContext>> filteredAggregationFunctions = new ArrayList<>(); | ||
| List<Pair<FilterContext, AggregationFunction>> aggregationFunctionsWithMetadata = new ArrayList<>(); |
There was a problem hiding this comment.
Please rename the variable
| // NOTE: DO NOT deduplicate the aggregation functions in the SELECT clause because that involves protocol change. | ||
| List<FunctionContext> aggregationsInSelect = new ArrayList<>(); | ||
| List<Pair<FunctionContext, FilterContext>> filteredAggregations = new ArrayList<>(); | ||
| List<Pair<Pair<FilterContext, ExpressionContext>, FunctionContext>> aggregationsInSelect = new ArrayList<>(); |
There was a problem hiding this comment.
I don't think we need to keep ExpressionContext here. List<FunctionContext, FilterContext> should be enough for the following computations
| @SuppressWarnings("rawtypes") | ||
| public class FilteredAggregationOperator extends BaseOperator<IntermediateResultsBlock> { | ||
| private static final String OPERATOR_NAME = "FilteredAggregationOperator"; | ||
| private static final String EXPLAIN_NAME = "FILTERED_AGGREGATE"; |
There was a problem hiding this comment.
To match the naming pattern in AggregateGroupByOperator (and other Aggregation*Operator classes), the EXPLAIN_NAME should be set to AGGREGATE_FILTERED.
|
|
||
| @Override | ||
| public List<Operator> getChildOperators() { | ||
| return _aggFunctionsWithTransformOperator.stream().map(Pair::getRight).collect(Collectors.toList()); |
There was a problem hiding this comment.
Unless this has recently changed, I think stream api usage is not consistent with Pinot coding convention.
There was a problem hiding this comment.
I haven't seen a coding convention mentioning the same, yet. Is this documented somewhere?
There was a problem hiding this comment.
I'm unaware of such a convention and have seen plenty of code using the streams API for performance non-critical operations (like this one) recently.
There was a problem hiding this comment.
@Jackie-Jiang Can you please clarify if stream api usage applies?
There was a problem hiding this comment.
We should avoid using stream api for performance critical operations. This one is at query path, but might not be that performance critical (only called once). Using regular api could give slightly better performance, but IMO both way is okay
There was a problem hiding this comment.
I agree with @richardstartin -- we only need to worry about streams when the code is invoked in a tight loop for multiple iterations -- none of which is applicable in this specific case
Jackie-Jiang
left a comment
There was a problem hiding this comment.
LGTM. Only minor and code format comments. Please apply the pinot format and auto-reformat the changed files
pinot-core/src/main/java/org/apache/pinot/core/plan/AggregationPlanNode.java
Outdated
Show resolved
Hide resolved
pinot-core/src/main/java/org/apache/pinot/core/plan/AggregationPlanNode.java
Show resolved
Hide resolved
There was a problem hiding this comment.
(code format) Please apply the pinot code format and use it to auto-reformat this file. Several changes do not comply with the format
pinot-core/src/main/java/org/apache/pinot/core/plan/AggregationPlanNode.java
Outdated
Show resolved
Hide resolved
pinot-core/src/main/java/org/apache/pinot/core/query/request/context/QueryContext.java
Outdated
Show resolved
Hide resolved
|
@atris This is a great feature. Could you please add some release notes to the PR description which we can refer to when cutting the next release? |
|
Is For example:
I can't seem to make it work, but maybe I'm mistyping something |
|
No it is not supported. For the example you give, |
|
Thanks for the confirmation @Jackie-Jiang Yes that's what we're using right now, but we've got a use case where we may have hundreds of sub query entities, which are currently translated into hundreds of Pinot queries so I was looking for a way to improve that :) |
|
Let me see if I can take a crack on this next week
…On Sat, 9 Apr 2022, 20:54 MrNeocore, ***@***.***> wrote:
Thanks for the confirmation @Jackie-Jiang
<https://github.com/Jackie-Jiang>
Yes that's what we're using right now, but we've got a use case where we
may have hundreds of sub sub query entities, which are currently translated
into hundreds of Pinot queries so I was looking for a way to improve that :)
—
Reply to this email directly, view it on GitHub
<#7916 (comment)>, or
unsubscribe
<https://github.com/notifications/unsubscribe-auth/AANE5Y2RJWTAR3BY2G4S7S3VEGOKHANCNFSM5KGGHAHA>
.
You are receiving this because you were mentioned.Message ID:
***@***.***>
|
|
Thanks for giving it a shot @atris |
|
@atris did we add this to our docs? |
This PR implements support for FILTER clauses in aggregations:
SELECT SUM(COL1) FILTER(WHERE COL2 > 300), AVG(COL2) FILTER (WHERE COL2 < 50) FROM MyTable WHERE COL1 > 50;
The approach implements the swim lane design highlighted in the design document by splitting at the filter operator. The implementation gets the filter block for main predicate and each filter predicate, ANDs them together and returns a combined filter operator.
The main predicate is scanned only once and reused for all filter clauses.
The implementation allows each filter swim lane to use any available indices independently.
If two or more filter clauses have the same predicate, the result will be computed only once and fed to each of the aggregates.
https://docs.google.com/document/d/1ZM-2c0jJkbeJ61m8sJF0qj19t5UYLhnTFvIAz-HCJmk/edit?usp=sharing
Performance benchmark:
3 warm up iterations per run, 5 runs in total. Data set size -- 1.5 million documents. Apple M1 Pro, 32GB RAM
X axis represents number of iterations and Y axis represents latency in MS.
FILTER query, compared to its equivalent CASE query, is 120-140% faster on average.
GROUP BY is not supported yet and will be done in a follow up PR