-
Notifications
You must be signed in to change notification settings - Fork 255
feat: pushdown filter for native_iceberg_compat #1566
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: pushdown filter for native_iceberg_compat #1566
Conversation
|
Note that we actually expect performance to be worse with pushdown with the https://blog.xiangpeng.systems/posts/parquet-pushdown/ Once apache/arrow-rs#6921 merges and we pick up that Arrow-rs release we should see the performance benefit. |
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #1566 +/- ##
============================================
+ Coverage 56.12% 58.64% +2.51%
- Complexity 976 1026 +50
============================================
Files 119 123 +4
Lines 11743 12418 +675
Branches 2251 2337 +86
============================================
+ Hits 6591 7282 +691
+ Misses 4012 3965 -47
- Partials 1140 1171 +31 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
4caad5d to
0a70e78
Compare
|
Thank you for taking a look at this! I'll take a pass through this. |
spark/src/test/scala/org/apache/comet/parquet/ParquetReadSuite.scala
Outdated
Show resolved
Hide resolved
|
Looks good, do we still need to wait for arrow-rs based on #1566 (comment) ? |
We can merge it. I'd rather be testing the pushdown logic as the default, even if it's a little bit slower right now. |
parthchandra
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great PR @wForget !
| } | ||
| } | ||
|
|
||
| test("test V1 parquet scan filter pushdown of primitive types uses native_iceberg_compat") { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we enable this test for both native_datafusion and native_iceberg_compat?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I enabled the test for native_datafusion, but the current native_datafusion mode always push down fitlers, I added a comment:
datafusion-comet/spark/src/test/scala/org/apache/comet/parquet/ParquetReadSuite.scala
Lines 1516 to 1519 in 8896f67
| if (scanMode == CometConf.SCAN_NATIVE_DATAFUSION && !pushDown) { | |
| // FIXME: native_datafusion always pushdown data filters | |
| break() | |
| } |
24ac82b to
8896f67
Compare
cc @andygrove if you are doing some benchmarks |
andygrove
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. I agree with @mbutrovich that it is better to get test coverage for this feature even though we expect some slowdown until we pick up the changes from Arrow.
Also, we can always add a config to disable the pushdown if we need to.
Thank you @wForget.
Which issue does this PR close?
Closes #1562.
Rationale for this change
Improve performance of native_iceberg_compat scan
What changes are included in this PR?
How are these changes tested?
added unit test