-
Notifications
You must be signed in to change notification settings - Fork 1.5k
WIP: Test DataFusion with experimental Parquet Filter Pushdown #16222
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
🤖 |
🤖: Benchmark completed Details
|
The clickbench only has several cases with real regression > 20%, and i believe those cases can be improved by combined with adaptive, i think we are at good state. |
I agree -- thank you @zhuqi-lucas I have a few other optimization ideas on #16208 (comment) that will help this case too. It would also be super helpful to profile / review the queries where the performance slows down , like Q14 and Q21 and see if those are the ones where the adaptive filtering would help
Q14: SELECT "SearchEngineID", "SearchPhrase", COUNT(*) AS c FROM hits WHERE "SearchPhrase" <> '' GROUP BY "SearchEngineID", "SearchPhrase" ORDER BY c DESC LIMIT 10; Q21: SELECT "SearchPhrase", MIN("URL"), COUNT(*) AS c FROM hits WHERE "URL" LIKE '%google%' AND "SearchPhrase" <> '' GROUP BY "SearchPhrase" ORDER BY c DESC LIMIT 10; |
apache/arrow-rs#7524 (comment) Thank you @alamb , from previous result, it will help Q14 Q24 Q30 Q31 , which are the major regression from this PR benchmark result, but it seems not help Q21/22.
|
This PR is for testing DataFusion with the code in the following PR
This is the second of 2 experiments:
pushdown_filters
enabled?The first experiment is in