Do not expand large SEARCH(input, Sarg)#13605
Conversation
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## master #13605 +/- ##
============================================
+ Coverage 61.75% 61.98% +0.23%
+ Complexity 207 198 -9
============================================
Files 2436 2555 +119
Lines 133233 140625 +7392
Branches 20636 21822 +1186
============================================
+ Hits 82274 87167 +4893
- Misses 44911 46848 +1937
- Partials 6048 6610 +562
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. |
|
I've made another attempt in #13614 which is trying to keep the same behavior but avoid the expensive operation. IMO the best solution is to directly keep IN/NOT IN to:
Need more time to explore if this is doable |
|
Even we may need to apply something like this in the future, we are going to let it be in standby for now given we don't know the impact that join main have, specially in NOT IN case |
We have recently upgrade our Calcite dependency from 1.31 to 1.37, which impacted some queries negatively.
Specifically multi-stage queries using
INwith a very large set are spending too much time on broker trying to optimize the query. When using close to 500 entries in the IN set my personal computer was spending close to 6 seconds just to optimize the query.That was due to some new optimizations added in Calcite 1.32 but mainly due to the way we were using Calcite. Specifically, we were expanding all SEARCH expressions into ORs. That may have been needed in the past when PIPELINE_BREAKER was not implemented in Pinot, but right now it doesn't seem to be needed. But even if we need it sometimes, it is not acceptable to spent so much time on optimization phase.
Given it doesn't look like Calcite optimizations can be turned off, this PR changes Pinot to:
inSubQueryThreshold, which is 20.PinotFilterExpandSearchRuleso SEARCH expressions are not expanded when they are not range based and the number of elements is larger than 20.We may add in the future a way to configure that threshold with some config parameter.