-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Rewrite bloom filters to use contains API #8442
Conversation
17ca888
to
7ea2cb5
Compare
Err(e) => { | ||
log::error!("Error evaluating row group predicate values when using BloomFilterPruningPredicate {e}"); | ||
log::debug!("Ignoring error reading bloom filter: {e}"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
in general, errors in logs I think are meant to be "something bad happened that caused a user error" -- in this case the query will proceed so I think debug is a better level.
If someone needs it in production logs perhaps we can change it to log
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
debug level looks good to me. non-fatal errors on query path are easy to flow 👍
/// | ||
/// We only checked `BinaryExpr` but it also support `InList`, | ||
/// Because of the `optimizer` will convert `InList` to `BinaryExpr`. | ||
fn prune_expr_with_bloom_filter( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
7ea2cb5
to
4481afe
Compare
|
||
for column_name in literal_columns { | ||
let Some((column_idx, _field)) = | ||
parquet_column(builder.parquet_schema(), arrow_schema, &column_name) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This also fixes a potential bug similar to #8335 by finding the correct fields
4481afe
to
410ea18
Compare
This PR is now ready for review @haohuaijin and @waynexia |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for a series of excellent pr @alamb 👍, It is LGTM!
"BloomFilterPruningPredicate only support binary expr".to_string(), | ||
)), | ||
} | ||
impl PruningStatistics for BloomFilterStatistics { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
❤
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great work 💯 , thanks @alamb
Err(e) => { | ||
log::error!("Error evaluating row group predicate values when using BloomFilterPruningPredicate {e}"); | ||
log::debug!("Ignoring error reading bloom filter: {e}"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
debug level looks good to me. non-fatal errors on query path are easy to flow 👍
Thank you for the review @waynexia and @haohuaijin |
Which issue does this PR close?
Closes #8376
Closes #8397
Rationale for this change
What changes are included in this PR?
We introduced
contained
API in #8440, and now we can use it to simplify the bloom filter codeThis also makes it possible to do things like Support
IN
lists with more than three constants in predicates for bloom filtersAre these changes tested?
Covered by existing tests (including that added in #8433)
Are there any user-facing changes?
No