-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Initial support for StringView
, merge changes from string-view
development branch
#11402
Conversation
* Pin to arrow main * Fix clippy with latest arrow * Uncomment test that needs new arrow-rs to work * Update datafusion-cli Cargo.lock * Update Cargo.lock * tapelo
…10985) * feat: Implement equality = and inequality <> support for StringView * chore: Add tests for the StringView * chore * chore: Update tests for NULL * fix: Used build_array_string! * chore: Update string_coercion function to handle Utf8View type in binary.rs * chore: add tests * chore: ci
* Add more StringView comparison test coverage * add reference * Add another test showing casting on columns works correctly
…11004) * feat: Implement equality = and inequality <> support for BinaryView Signed-off-by: Chojan Shang <psiace@apache.org> * chore: make fmt happy Signed-off-by: Chojan Shang <psiace@apache.org> --------- Signed-off-by: Chojan Shang <psiace@apache.org>
…BinaryView (#11034) * implement large binary * add tests for large string * better comments for string coercion
* refactor: Improve type coercion logic in TypeCoercionRewriter * refactor: Improve type coercion logic in TypeCoercionRewriter * chore * chore: Update test * refactor: Improve type coercion logic in TypeCoercionRewriter * refactor: Remove unused import and update code formatting in unwrap_cast_in_comparison.rs
StringView
, merge changes from string-view
development branch
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!
@thinkharderdev or @avantgardnerio -- would either of you have a few moments to review this PR? We want to merge the work we have so far for StringView into main (that depends on arrow |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm
Thank you @thinkharderdev cc @XiangpengHao I will now make a string-view2 branch for your next sequence of PRs |
…velopment branch (apache#11402) * Update `string-view` branch to arrow-rs main (apache#10966) * Pin to arrow main * Fix clippy with latest arrow * Uncomment test that needs new arrow-rs to work * Update datafusion-cli Cargo.lock * Update Cargo.lock * tapelo * feat: Implement equality = and inequality <> support for StringView (apache#10985) * feat: Implement equality = and inequality <> support for StringView * chore: Add tests for the StringView * chore * chore: Update tests for NULL * fix: Used build_array_string! * chore: Update string_coercion function to handle Utf8View type in binary.rs * chore: add tests * chore: ci * Add more StringView comparison test coverage (apache#10997) * Add more StringView comparison test coverage * add reference * Add another test showing casting on columns works correctly * feat: Implement equality = and inequality <> support for BinaryView (apache#11004) * feat: Implement equality = and inequality <> support for BinaryView Signed-off-by: Chojan Shang <psiace@apache.org> * chore: make fmt happy Signed-off-by: Chojan Shang <psiace@apache.org> --------- Signed-off-by: Chojan Shang <psiace@apache.org> * Implement support for LargeString and LargeBinary for StringView and BinaryView (apache#11034) * implement large binary * add tests for large string * better comments for string coercion * Improve filter predicates with `Utf8View` literals (apache#11043) * refactor: Improve type coercion logic in TypeCoercionRewriter * refactor: Improve type coercion logic in TypeCoercionRewriter * chore * chore: Update test * refactor: Improve type coercion logic in TypeCoercionRewriter * refactor: Remove unused import and update code formatting in unwrap_cast_in_comparison.rs * Remove arrow-patch --------- Signed-off-by: Chojan Shang <psiace@apache.org> Co-authored-by: Alex Huang <huangweijun1001@gmail.com> Co-authored-by: Chojan Shang <psiace@apache.org> Co-authored-by: Xiangpeng Hao <haoxiangpeng123@gmail.com>
…velopment branch (apache#11402) * Update `string-view` branch to arrow-rs main (apache#10966) * Pin to arrow main * Fix clippy with latest arrow * Uncomment test that needs new arrow-rs to work * Update datafusion-cli Cargo.lock * Update Cargo.lock * tapelo * feat: Implement equality = and inequality <> support for StringView (apache#10985) * feat: Implement equality = and inequality <> support for StringView * chore: Add tests for the StringView * chore * chore: Update tests for NULL * fix: Used build_array_string! * chore: Update string_coercion function to handle Utf8View type in binary.rs * chore: add tests * chore: ci * Add more StringView comparison test coverage (apache#10997) * Add more StringView comparison test coverage * add reference * Add another test showing casting on columns works correctly * feat: Implement equality = and inequality <> support for BinaryView (apache#11004) * feat: Implement equality = and inequality <> support for BinaryView Signed-off-by: Chojan Shang <psiace@apache.org> * chore: make fmt happy Signed-off-by: Chojan Shang <psiace@apache.org> --------- Signed-off-by: Chojan Shang <psiace@apache.org> * Implement support for LargeString and LargeBinary for StringView and BinaryView (apache#11034) * implement large binary * add tests for large string * better comments for string coercion * Improve filter predicates with `Utf8View` literals (apache#11043) * refactor: Improve type coercion logic in TypeCoercionRewriter * refactor: Improve type coercion logic in TypeCoercionRewriter * chore * chore: Update test * refactor: Improve type coercion logic in TypeCoercionRewriter * refactor: Remove unused import and update code formatting in unwrap_cast_in_comparison.rs * Remove arrow-patch --------- Signed-off-by: Chojan Shang <psiace@apache.org> Co-authored-by: Alex Huang <huangweijun1001@gmail.com> Co-authored-by: Chojan Shang <psiace@apache.org> Co-authored-by: Xiangpeng Hao <haoxiangpeng123@gmail.com>
…velopment branch (apache#11402) * Update `string-view` branch to arrow-rs main (apache#10966) * Pin to arrow main * Fix clippy with latest arrow * Uncomment test that needs new arrow-rs to work * Update datafusion-cli Cargo.lock * Update Cargo.lock * tapelo * feat: Implement equality = and inequality <> support for StringView (apache#10985) * feat: Implement equality = and inequality <> support for StringView * chore: Add tests for the StringView * chore * chore: Update tests for NULL * fix: Used build_array_string! * chore: Update string_coercion function to handle Utf8View type in binary.rs * chore: add tests * chore: ci * Add more StringView comparison test coverage (apache#10997) * Add more StringView comparison test coverage * add reference * Add another test showing casting on columns works correctly * feat: Implement equality = and inequality <> support for BinaryView (apache#11004) * feat: Implement equality = and inequality <> support for BinaryView Signed-off-by: Chojan Shang <psiace@apache.org> * chore: make fmt happy Signed-off-by: Chojan Shang <psiace@apache.org> --------- Signed-off-by: Chojan Shang <psiace@apache.org> * Implement support for LargeString and LargeBinary for StringView and BinaryView (apache#11034) * implement large binary * add tests for large string * better comments for string coercion * Improve filter predicates with `Utf8View` literals (apache#11043) * refactor: Improve type coercion logic in TypeCoercionRewriter * refactor: Improve type coercion logic in TypeCoercionRewriter * chore * chore: Update test * refactor: Improve type coercion logic in TypeCoercionRewriter * refactor: Remove unused import and update code formatting in unwrap_cast_in_comparison.rs * Remove arrow-patch --------- Signed-off-by: Chojan Shang <psiace@apache.org> Co-authored-by: Alex Huang <huangweijun1001@gmail.com> Co-authored-by: Chojan Shang <psiace@apache.org> Co-authored-by: Xiangpeng Hao <haoxiangpeng123@gmail.com>
Which issue does this PR close?
Part of #10918
Rationale for this change
While we were initially developing StringView in DataFusion the required features in parquet/arrow were not yet available and so we made a feature branch pinned to the version of arrow required: https://github.com/apache/datafusion/commits/string-view/
The required functionality for StringView has been releaed in arrow/parquet
51.1.0
and DataFusion now uses that version as of #11302Thus there is no reason for a feature branch so let's bring the changes to main
What changes are included in this PR?
Merge in all the changes from @Weijun-H @PsiACE and @XiangpengHao on the
string-view
branch. Specifically:Utf8View
literals #11043Are these changes tested?
Yes, there are unit tests
Are there any user-facing changes?
Not yet, but there will be