feat(datafusion): Add Binary scalar value conversion for predicate pushdown#2048
Merged
liurenjie1024 merged 2 commits intoapache:mainfrom Jan 21, 2026
Merged
Conversation
…shdown Add support for converting Binary and LargeBinary DataFusion ScalarValue types to Iceberg Datum, enabling binary predicates to be pushed down to the Iceberg storage layer. This conversion allows SQL queries with binary hex literals (X'...') to push predicates down to Iceberg, improving query performance by filtering data at the storage level rather than in DataFusion. The integration test verifies that binary predicates are successfully pushed down end-to-end: - Without conversion: predicate stays in FilterExec with predicate:[] - With conversion: predicate pushed to IcebergTableScan Other scalar types (Boolean, Timestamp, Decimal) were investigated but excluded because they are not reachable through practical usage: - Boolean: DataFusion aggressively optimizes comparisons (e.g., x=true becomes just x) before reaching the converter - Timestamp/Decimal: SQL literals are converted to strings/other types before reaching the converter Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
aeee2e8 to
d8bd5bc
Compare
Contributor
liurenjie1024
left a comment
There was a problem hiding this comment.
Thanks @viirya for this, generally LGTM!
| } | ||
|
|
||
| #[tokio::test] | ||
| async fn test_binary_predicate_pushdown() -> Result<()> { |
Contributor
There was a problem hiding this comment.
We have added sqllogictests support: https://github.com/liurenjie1024/iceberg-rust/blob/666a9fe1aaf1692583d6f44e4f7a1d52a688b217/crates/sqllogictest/testdata/schedules/df_test.toml#L19
Please move these tests there.
Move the binary predicate pushdown integration test from Rust integration tests to sqllogictest framework for better test organization and coverage. Changes: - Add binary_predicate_pushdown.slt test file - Create test_binary_table in DataFusion engine setup - Update show_tables.slt to include test_binary_table - Add test to df_test.toml schedule - Remove test_binary_predicate_pushdown from integration_datafusion_test.rs Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
liurenjie1024
approved these changes
Jan 21, 2026
Contributor
liurenjie1024
left a comment
There was a problem hiding this comment.
Thanks @viirya for this pr!
Member
Author
|
Thanks @liurenjie1024 |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Which issue does this PR close?
What changes are included in this PR?
Add support for converting Binary and LargeBinary DataFusion ScalarValue types to Iceberg Datum, enabling binary predicates to be pushed down to the Iceberg storage layer.
This conversion allows SQL queries with binary hex literals (X'...') to push predicates down to Iceberg, improving query performance by filtering data at the storage level rather than in DataFusion.
The integration test verifies that binary predicates are successfully pushed down end-to-end:
Other scalar types (Boolean, Timestamp, Decimal) were investigated but excluded because they are not reachable through practical usage:
Are these changes tested?