-
Notifications
You must be signed in to change notification settings - Fork 3.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
adaptive filter partitioning #15838
adaptive filter partitioning #15838
Conversation
…dexes by feeding selectivity forward
I ran quite a lot of benchmarks to ensure everything was the "same" or "better" performance, will add them in the comments. Some of them are sort of noisy due to my laptop not being completely silent while taking measurements and a sort of low number of total iterations, but they seemed close enough to be satisfied and get an idea of the numbers. |
sql benchmark:
after:
|
|
sql expression benchmark:
|
*/ | ||
@Deprecated | ||
@SuppressWarnings({"unreachable", "unused"}) | ||
default void preFilters(List<Filter> preFilters) |
Check notice
Code scanning / CodeQL
Useless parameter Note
* {@link org.apache.druid.segment.vector.FilteredVectorOffset} | ||
*/ | ||
@SuppressWarnings({"unreachable", "unused"}) | ||
default void postFilters(List<Filter> postFilters) |
Check notice
Code scanning / CodeQL
Useless parameter Note
final long bitmapConstructionStartNs = System.nanoTime(); | ||
for (Filter subfilter : filters) { | ||
boolean needMatcher = true; | ||
final BitmapColumnIndex index = subfilter.getBitmapColumnIndex(columnIndexSelector); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was wondering if we can delegate to child's filter bundles, and have a way of combining the filter bundles?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I do that for AndFilter
, but in the case of OrFilter
its a bit more complicated because we can only use indexes for a bitmap offset if all of the clauses fully support indexes. For the clauses of the OR that use the default implementation of makeFilterBundle
this isn't an issue since it is effectively the same logic we are doing here, however for the case where an AndFilter
is one of the clauses of an OrFilter
, using makeFilterBundle
could partition the clauses of the AND into indexes and matchers, which is not what we want for OR because we need all of the AND clauses to be using indexes for the OR to use indexes. Does that make sense?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fwiw, it should be possible to convert a bitmap index into a matcher by testing the id of the row against the bitmap. It should still be better than loading the value and grabbing the matcher and you can combine it together with a matcher to get the correct semantics inside of the OR here for the times when you need to do it all via a matcher
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The code is actually doing that, if not all of the sub-filter can be converted into indexes, we re-use them to make a matcher, i'll add some comments and such to try to make it clearer
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i did end up doing this to make OrFilter delegate to child bundles. Its a bit more complicated than the initial implementation, but the code comments should explain the possible outcomes for different shapes of child bundles
@JsonCreator | ||
public DruidProcessingIndexesConfig( | ||
@JsonProperty("skipValueRangeIndexScale") @Nullable Double skipValueRangeIndexScale, | ||
@JsonProperty("skipValuePredicateIndexScale") @Nullable Double skipValuePredicateIndexScale | ||
@JsonProperty("skipValueRangeIndexScale") @Nullable Double skipValueRangeIndexScale |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder if we still even need this one?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I left it in place to make it easy for us devs to play around with tunings, I don't ever intend for cluster operators to have to set it
@SuppressWarnings({"unreachable", "unused"}) | ||
default void preFilters(List<Filter> preFilters) | ||
{ | ||
// do nothing, nothing calls this |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This still exists just so that implementations don't suddenly break? If so, perhaps add some comments either here or in the javadoc that it will be removed in the future?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yea, since you currently have to write an extension to customize metrics, i figured it would be less disruptive to deprecate than remove it immediately. Will update the javadocs to indicate that it will go away in the future.
* @deprecated use {@link #filterBundle(FilterBundle.BundleInfo)} instead to collect details about filters which were | ||
* used as value matchers for {@link org.apache.druid.segment.FilteredOffset} or | ||
* {@link org.apache.druid.segment.vector.FilteredVectorOffset} | ||
*/ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same here?
final long bitmapConstructionStartNs = System.nanoTime(); | ||
T result = columnIndex.computeBitmapResult(bitmapResultFactory, selectionRowCount, totalRowCount, includeUnknown); | ||
final long totalConstructionTimeNs = System.nanoTime() - bitmapConstructionStartNs; | ||
if (result != null) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you un-invert this.
if (result == null) {
indexBundle = null;
} else {
... do stuff ...
}
Will be a lot smoother for the reader.
if (result != null) { | ||
ImmutableBitmap bitmap = bitmapResultFactory.toImmutableBitmap(result); | ||
indexBundle = new FilterBundle.SimpleIndexBundle( | ||
Collections.singletonList(new FilterBundle.IndexBundleInfo(toString(), bitmap.size(), totalConstructionTimeNs)), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you really need to force the toString()
here? It's quite a bit of garbage, especially for an IN filter with like 100,000 entries and stuff, I suspect it's gonna have unintended side-effects. Can we force it to something that we know is non-verbose (perhaps an id, could assign ids based on position and nesting (so like, AND(x=y, z=2, OR(a IN (...), b IN (...)))
would end up with x=y as id 1
, z=2 as id 2
, a IN (...)
as id 3.1
(first nested under the 3rd predicate) and b IN (...)
as id 3.2
?)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
that's a good point about huge filter strings. It is super useful to have the breakdown of filter partitioning, but maybe I can think of a way to have safeguards against this case, either adding a new method to filters instead of directly relying on toString to has these safeguards, or make it lazy so that it only happens if debug logging is enabled, or something else.
includeUnknown, | ||
allowPartialIndex | ||
); | ||
if (subBundle.getIndex() != null) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In the code for generating the Cursor
you were checking if the FilterBundle
was null, here you are not. Are you missing a check here or are you needlessly checking there?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As mentioned above, Filter.makeFilterBundle
cannot return null, it must return a bundle that has at least an index or a matcher. The cursor case is for when there is no filter on the cursor, but as mentioned in other comment will try to clean that up a bit
final long bitmapConstructionStartNs = System.nanoTime(); | ||
for (Filter subfilter : filters) { | ||
boolean needMatcher = true; | ||
final BitmapColumnIndex index = subfilter.getBitmapColumnIndex(columnIndexSelector); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fwiw, it should be possible to convert a bitmap index into a matcher by testing the id of the row against the bitmap. It should still be better than loading the value and grabbing the matcher and you can combine it together with a matcher to get the correct semantics inside of the OR here for the times when you need to do it all via a matcher
public <T> FilterBundle makeFilterBundle( | ||
ColumnIndexSelector columnIndexSelector, | ||
BitmapResultFactory<T> bitmapResultFactory, | ||
int selectionRowCount, | ||
int totalRowCount, | ||
boolean includeUnknown, | ||
boolean allowPartialIndex | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I find the logic in this method a little bit hard to follow. I think it would help to have a bit of a pseudo-code explanation of how the method is trying to do its job.
The pseudo-code that I can imagine for this method (but which I'm pretty sure is currently not implemented by this method) looks like
- Compute all sub-filter bundles, store them all independently in a list
- If any of the sub-filter bundles requires a matcher, then convert all bitmap indexes into bitmap index matchers that compare rowIds against the bitmap index and return a matcher-only bundle.
- If all sub-filters are indexes, then OR them together and return an index-only bundle
This means that an OR filter will never produce a "partial bundle". It will either produce an index or a matcher bundle, but not both.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it is more or less doing the pseudo-code you listed here, will try to update the comments to clear things up
if (r == null) { | ||
// all or nothing | ||
return null; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the java doc for BitmapColumnIndex.computeBitmapResult()
doesn't include a definition of what it means to return null
. If a bitmap index cannot be computed, I would've expected index
itself to be null, so either that is never null
OR null
means that it's an empty bitmap. If null
means the empty bitmap, then this if statement is wrong (you just skip the clause). If it is never null, then the if statement is unnecessary and potentially causes confusion.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The 2 argument version is not expected to be null, however the 4 argument version may return null and the javadocs reflect that https://github.com/apache/druid/pull/15838/files#diff-0524c5910c725c7074f7383e79ced3937827cdaae4170916f7f2ff017b21669dR56, which is how this new stuff can work.
If you remember, building the BitmapColumnIndex
is not doing the actual work, rather it returns a thing that you call the 'compute' methods on to do the actual work to build the bitmaps. If BitmapColumnIndex
is null then we cannot possibly use an index, while if not null, the 2 argument version will always build the index. The 4 argument version is what allows this new functionality to let the BitmapColumnIndex
decide if it would have to do too much work to compute the index based on the selected row count so far.
This is a lot cleaner with the way things currently work, otherwise the all of the 'semantic' indexes classes returned by the .as
of the index supplier (ArrayElementIndexes
, DruidPredicateIndexes
, LexicographicalRangeIndexes
, NullValueIndex
, NumericRangeIndexes
, StringValueSetIndexes
, Utf8ValueSetIndexes
, ValueIndexes
) would all need to have their methods that build BitmapColumnIndex
updated to also take the selected and total row count parameters when building the BitmapColumnIndex
. This was a much bigger surface area than just updating BitmapColumnIndex
itself, since it is the common interface that all of these implementations spit out. I'm very much in favor of doing it this way because it is both significantly cleaner and a lot simpler to implement.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we still need the 2-argument method? :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
well, the search query is still calling the 2 argument version and computing the indexes directly from the filter instead of going through the bundle (also a bunch of benchmarks are calling it), so for now I was thinking about leaving it for this PR at least. I'll need to switch search query over to using the bundles anyway though, since its the real replacement for the 'auto' strategy that we removed, so I can look into removing the 2 argument version at that point.
@@ -29,7 +29,7 @@ | |||
public abstract class SimpleImmutableBitmapDelegatingIterableIndex extends SimpleBitmapColumnIndex | |||
{ | |||
@Override | |||
public <T> T computeBitmapResult(BitmapResultFactory<T> bitmapResultFactory, boolean includeUnknown) | |||
public final <T> T computeBitmapResult(BitmapResultFactory<T> bitmapResultFactory, boolean includeUnknown) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why add the final? What are you protecting from?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have done a partial review.
|
||
@Nullable | ||
@Override | ||
public final <T> T computeBitmapResult( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
will be good to describe what "selectionRowCount and totalRowCount mean here in this method
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
these are described in the javadoc of base interface, BitmapColumnIndex
private final int dictionarySize; | ||
private final double scaleThreshold; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: but you could keep just one double number here - dictionarySize * scaleThreshold (dictionaryThreshold)
@@ -78,7 +78,7 @@ public BitmapColumnIndex forPredicate(DruidPredicateFactory matcherFactory) | |||
final java.util.function.Function<Object, ExprEval<?>> evalFunction = | |||
inputValue -> expr.eval(InputBindings.forInputSupplier(inputColumn, inputType, () -> inputValue)); | |||
|
|||
return new SimpleImmutableBitmapIterableIndex() | |||
return new DictionaryScanningBitmapIndex(inputColumnIndexes.getCardinality()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How do you revert to old behaviour?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the old behavior is no longer available. the approach for predicates is pretty conservative in this PR, we skip indexes when the selection count is smaller than the dictionary size, so it shouldn't really be possible to be worse than the previous behavior - predicate indexes must test the predicate against all of the dictionary values and then union all of the matching bitmaps together, so if the selection row count is smaller than the dictionary size it is fewer number of predicate evaluations and no bitmap operations.
I suspect that the selection size could be a bit smaller than dictionary size and come out ahead, but have not measured in this PR, we probably want to consider adding a config at that point to allow reverting to the behavior of this PR during tuning, since it is probably dependent on how expensive the predicate is what the break even point is.
@JsonCreator | ||
public DruidProcessingIndexesConfig( | ||
@JsonProperty("skipValueRangeIndexScale") @Nullable Double skipValueRangeIndexScale, | ||
@JsonProperty("skipValuePredicateIndexScale") @Nullable Double skipValuePredicateIndexScale |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what will be the impact on clusters setting this value?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this parameter is for an easy way myself and other druid devs to experiment with stuff in case I didn't get the default scale value in the best spot; i don't really intend on cluster operators setting this value or for it to be documented. it controls the ratio of number of rows to be scanned vs computing indexes so effectively allowing adjusting the tuning performance of filter operations without code changes, but ideally it will eventually be removed
* Itsa me, your filter bundle. This is a container for all the goodies used for producing filtered cursors, | ||
* a {@link ImmutableBitmap} if the filter can use an index, and a {@link MatcherBundle} which contains functions to | ||
* build {@link ValueMatcher} and {@link VectorValueMatcher} for any filters which must be evaluated row by row during | ||
* the cursor scan. Cursors will use everything that is non-null, and at least one of index or matcher bundle MUST be | ||
* set. | ||
* <p> | ||
* There are a few cases where the filter should set both indexes and matchers. For example, if the filter is a | ||
* composite filter which can be partitioned, such as {@link org.apache.druid.segment.filter.AndFilter}, then the filter | ||
* can be partitioned due to the intersection nature of AND, so the index can be set to reduce the number of rows and | ||
* the matcher bundle will build a matcher which will filter the remainder. This can also be set in the case the index | ||
* is an inexact match, and a matcher must be used to ensure that the remaining values actually match the filter. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
* Itsa me, your filter bundle. This is a container for all the goodies used for producing filtered cursors, | |
* a {@link ImmutableBitmap} if the filter can use an index, and a {@link MatcherBundle} which contains functions to | |
* build {@link ValueMatcher} and {@link VectorValueMatcher} for any filters which must be evaluated row by row during | |
* the cursor scan. Cursors will use everything that is non-null, and at least one of index or matcher bundle MUST be | |
* set. | |
* <p> | |
* There are a few cases where the filter should set both indexes and matchers. For example, if the filter is a | |
* composite filter which can be partitioned, such as {@link org.apache.druid.segment.filter.AndFilter}, then the filter | |
* can be partitioned due to the intersection nature of AND, so the index can be set to reduce the number of rows and | |
* the matcher bundle will build a matcher which will filter the remainder. This can also be set in the case the index | |
* is an inexact match, and a matcher must be used to ensure that the remaining values actually match the filter. | |
* It's me, your filter bundle. This is a container for all the goodies used for producing filtered cursors, | |
* a {@link ImmutableBitmap} if the filter can use an index, and a {@link MatcherBundle} which contains functions to | |
* build {@link ValueMatcher} and {@link VectorValueMatcher} for any filters which must be evaluated row by row during | |
* the cursor scan. Cursors will use everything that is non-null, and at least one of the index or the matcher bundle MUST be | |
* set. | |
* <p> | |
* There are a few cases where the filter should set both indexes and matchers. For example, if the filter is a | |
* composite filter which can be partitioned, such as {@link org.apache.druid.segment.filter.AndFilter}, then the filter | |
* can be partitioned due to the intersection nature of AND, so the index can be set to reduce the number of rows and | |
* the matcher bundle will build a matcher which will filter the remainder. This can also be set in the case the index | |
* is an inexact match, and a matcher must be used to ensure that the remaining values actually match the filter. |
); | ||
} | ||
|
||
public interface IndexBundle |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can see why FilterBundle is FilterBundle but IndexBundle doesn't seem like a container.
|| filter.canVectorizeMatcher(this); | ||
// ideally we would allow stuff to vectorize if we can build indexes even if the value matcher cannot be | ||
// vectorized, this used to be true in fact, but changes to filter partitioning (FilterBundle) have caused | ||
// the only way to know this to be building the bitmaps since BitmapColumnIndex can return null. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
// the only way to know this to be building the bitmaps since BitmapColumnIndex can return null. | |
// the only way to know this to be building the bitmaps since BitmapColumnIndex can return null with adaptive filter partitioning. |
nit
default <T> FilterBundle makeFilterBundle( | ||
ColumnIndexSelector columnIndexSelector, | ||
BitmapResultFactory<T> bitmapResultFactory, | ||
int selectionRowCount, | ||
int totalRowCount, | ||
boolean includeUnknown, | ||
boolean allowPartialIndex |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
some javadocs about these params would be helpful.
allMatcherBundles.addAll(matcherOnlyBundles); | ||
|
||
return new FilterBundle( | ||
null, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
in case matcher bundles are present, can we still use the bitmaps? Specifically, instead of the indexBundle being null here, we can use the unioned bitmap index. It will weed out the definite negatives (0's) when filtering, and the caller can use the matcher bundle to test the 1's
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I realize that it will run into the same problem that the PR is trying to eliminate. But if that's the case, then why not convert the index-only bundle into a value matcher bundle as well, instead of unioning those?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this doesn't work for an OR filter because a FilterBundle
with both an index and matcher is basically an AND operation between the index and matcher, like if the bitmap has the row set the value matcher must also match the actual value at the row. The things that can produce bundles with both indexes and matchers are actual AND filters, and also filters that have ColumnIndexCapabilities.isExact()
with a false value (none exist today, but planning for the future). I tried to explain this in the javadocs for FilterBundle
, lmk if it isn't clear enough.
Anyway, because a bundle with both is an AND, an OR filter can only ever produce index only bundles or matcher only bundles, which is why we convert the child 'index only' bundles and 'index and matcher' bundles into matcher bundles with convertIndexToMatcherBundle
and convertBundleToMatcherOnlyBundle
respectively that use the indexes for cheaper matchers based on just checking the bitmap for a set bit instead of running the matcher on the row value.
The reason to union the index only bundles together is my gut says that having a single matcher bitmap to iterate would be a bit cheaper when doing the row scan than having a bunch of separate matchers with their own bitmaps to iterate. We can't combine the 'index and matcher' bundles into this because they need to check their own index and matcher before being OR'd with the rest of the matchers.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the explanation!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Left a few nits and commentary, but approving anyway as I don't know that any of them are worthy of blocking the PR.
@@ -143,7 +143,7 @@ DimensionMergerV9 makeMerger( | |||
*/ | |||
Comparator<ColumnValueSelector> getEncodedValueSelectorComparator(); | |||
|
|||
/** | |||
/**q |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Q: q?
qq
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lol oops, i suspect this is an artifact of stupid keyboard lag when switching 'spaces' to the one with my terminal on mac
// filterBundle will only be null if the filter itself is null, otherwise check to see if the filter | ||
// can use an index |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If the point is to know if the filter was set or not, can we not change it to explicitly check if the filter was set instead of asking for a bundle to be made and believing that null
from that means that the filter didn't exist?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
// an empty index means the matcher can be skipped | ||
emptyCount++; | ||
} else { | ||
if (!bundle.hasMatcher()) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you really have to invert this? :P
new FilterBundle.SimpleIndexBundle( | ||
indexOnlyBundles.get(0).getIndexInfo(), | ||
index, | ||
indexOnlyBundles.get(0).getIndexCapabilities() | ||
), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why isn't this just indexOnlyBundles.get(0)
? In this case, the index
had better be the same thing as the index in the index-only bundle right?
I'm not sure it really matters, just found myself wondering if there's a reason that this code needs to build a new object.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ah, this should be just re-using the same indexBundle, i think maybe it is stale from before i reworked some stuff and i forgot to clean it up
() -> "OR", | ||
selectionRowCount, | ||
totalBitmapConstructTimeNs, | ||
indexOnlyBundles.stream().map(FilterBundle.IndexBundle::getIndexInfo).collect(Collectors.toList()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure how often this is going to be called, but I've found that the performance of the java streaming API is pretty abysmal. If this was forced into a lazy evaluation that only matter when we are doing debug logging, I wouldn't care, but it looks like it's forced CPU and usage here...
Is there a way to shove this into the lambda for the toString instead (so that this isn't done unless it's actually necessary?)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it should only be called once, but I can make it an iterable on the bundles so that can make this lazy the same way the string is
Description
This PR reworks how filter partitioning is performed, pushing it into
Filter
to "bundle" both index and value matcher creation, allowing filter partitioning to be done adaptively based on the selectivity of prior filters. This was alluded to during discussions in #12388, and contains and enhances the functionality added in #13977 (as well as removes the need to merge #15551), and aims to solve the problems originally described in #3878.The main change centers around a new addition to the
Filter
interface,for which a default implementation is provided that should be sufficient in most cases, and produces a new container type
FilterBundle
,Production of the
FilterBundle
is assisted by a new method onBitmapColumnIndex
which allows index computation to factor
selectionRowCount
andtotalRowCount
into whether or not they should compute indexes, gauged based on the amount of work they need to do vs the amount of work it would take to just use aValueMatcher
/VectorValueMatcher
and scan the 'remaining' rows.The primary beneficiary of this is
AND
andOR
filters, which can now partition not just on whether or not the sub-filter supports indexes at all, but also based on whether or not they should use those indexes based on the selectivity of sub-filters whose indexes have already been computed, which itself is delegated to the sub-filters via the newBitmapColumnIndex
method.This can have a pretty dramatic performance impact:
query:
before:
after:
query:
before:
after:
not even getting into doing anything fancy such as re-ordering sub-filters based on expected cost (for examples null and equality filters could always be evaluated first, then IN and range filters, followed by predicate and additional logical filters, i'll save all of this for follow-up work).
Currently only range and predicate based indexes are implementing this new functionality, but the 'IN' filter could also consider this for cases where it must check the entire set against the dictionary where the set is larger than the number of remaining rows to be scanned. Numeric range thresholds were chosen via benchmarking, which given how cheap numerical comparisons will skip indexes unless the range is quite a bit smaller than the number of rows to be scanned, but are wired to the same undocumented config added in #13977.
Predicate based indexes are using a very conservative "dictionary size greater than number of rows to be scanned" threshold, which should be safe enough to not need any config, since the same or larger number of predicate operations would need to be done to compute the indexes (on top of unioning the indexes themselves) as if we just used a matcher for the remaining rows. This can likely be better, but I'll save that for follow-up PRs.
This functionality completely replaces
FilterAnalysis
, which was previously externally performing filter partitioning for the top level AND filter of a query, otherwise would all or nothing use the index or not when constructing offsets.The new
makeFilterBundle
method should be significantly more powerful because theAND
filter delegates this to its children, meaning that nested AND filters will also be partitioned without doing any additional work or normalization.OR filters also implement
makeFilterBundle
, and have been reworked to allow partial index computation for the sake of creating a value matcher that checks the index set bits rather than computing the actual matcher. This functionality itself is not new, previously it lived inFilteredOffset
, but now it has been moved inside ofOrFilter
which constructs this partial index matcher when constructing itsFilterBundle
.Visibility into the partitioning is available via debug logs, which will print how filters have been partitioned when enabled (it seemed too noisy to be
info
by default) and look something like thiscurrently largely relying on the
Filter.toString
implementation to populate the details.This feels like a safe change to make since everything is very conservative, so I have not retained the ability to force Druid to always use indexes, though I suppose it should be possible if there is demand for it.
As mentioned earlier, I believe there is further improvement to be made, but I think this is a good start.
Release note
Druid query processing now adaptively determines when children of AND filters should compute indexes and when to simply match rows during the scan based on selectivity of other filters. We refer to this concept as 'filter partitioning', and it can result in quite a dramatic performance increase depending on the order which filters are written in the query.
For example, imagine we have a query like
SELECT SUM(longColumn) FROM druid.table WHERE stringColumn1 = '1000' AND stringColumn2 LIKE '%1%'
. Clasically, Druid would always use indexes when processing filters if they are available. This is not always ideal though, imagine ifstringColumn1 = '1000'
matches 100 rows, if we are always using indexes we still have to find every value ofstringColumn2 LIKE '%1%'
that is true to compute the indexes for that filter, which ifstringColumn2
has more than 100 values ends up being worse than if the processing engine had simply checked for a match in those 100 remaining rows.With the new logic in place, Druid now checks the selectivity of indexes as it processes each clause of the AND filter, and if it determines it would take more work to compute the index than to match the remaining rows, it skips computing the index.
As an end user, this means that the order you write filters in a
WHERE
clause of a query can now influence the performance of your queries. Don't worry, since Druid previously always used indexes if available, not ordering your filters cannot cause queries to be any slower than they were before. If you do want to explore this though, we recommend putting filter clauses with 'cheaper' to compute indexes such asIS NULL
,=
, and comparisons such as>
,>=
,<
,<=
near the front ofAND
filters to allow Druid to more efficiently process your queries. Future work will look into automatically processing these filters in the best order to take this burden off of the user, but for now feel free to play around with filter ordering to see if you can improve query performance.Key changed/added classes in this PR
Filter
BitmapColumnIndex
QueryableIndexCursorSequenceBuilder
This PR has: