-
Notifications
You must be signed in to change notification settings - Fork 1.7k
Fix parquet filter_pushdown: respect parquet filter pushdown config in scan #16646
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
I'm trying to understand why this isn't caught by |
I can confirm this new test fails without the change:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you @adriangb 🙏
I don't understand how this fixes wrong results. I am out of time today but will check on this first thing tomorrow
/// Defaults to false | ||
/// | ||
/// [`Expr`]: datafusion_expr::Expr | ||
/// Defaults to false. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How did this result in wrong results?
Is it because the planner assumed filters would be pushed down but they weren't?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well this bit is just a comment 😛
But this produced wrong results because we'd return Supported(filter)
up to the parents -> FilterExec
removed from the plan but ParquetOpener.pushdown_filters = false
-> pruning did not actually happen during the scan.
None => conjunction(allowed_filters.iter().cloned()), | ||
}; | ||
source.predicate = Some(predicate); | ||
source = source.with_pushdown_filters(pushdown_filters); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@alamb this is the relevant LOC that fixes things
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can confirm as well that it works when pointing at this branch using my original code. Thanks!
// If both are disabled, we will not push down the filters. | ||
// By default they are both disabled. | ||
// Regardless of pushdown, we will update the predicate to include the filters | ||
// because even if scan pushdown is disabled we can still use the filters for stats pruning. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I had wondered about this; thanks for updating this comment!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you @adriangb and @ianthetechie
With his and several other issues, I think we should make a 48.0.1 release:
I will begin work on that later today
Thank you @adriangb -- I plan to backport this to the |
…n scan (apache#16646) * respect parquet filter pushdown config in scan * Add test
pushdown_filters
is enabled (starting in 48.0.0) #16588