-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add initial support for Utf8View and BinaryView types #10925
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @XiangpengHao -- this looks great ❤️
Can you please add a test (which will error) to https://github.com/apache/datafusion/blob/cb9068ce56a6870b84c4878db0255048e7df9e97/datafusion/sqllogictest/test_files/arrow_typeof.slt so there is some test coverage
But we need to wait apache/arrow-rs#5894 to merge and propagate to DataFusion, otherwise we get error from arrow.
This is an excellent point in general - that if DataFusion needs additional changes in arrow we will either
- Have to wait
- Put stubs in DataFusion that have the appropriate functionaltiy until they are available in arrow
For example, in this case, we could implement a cast
function in DataFusion that handles stringview casting otherwise falls back to the arrow kernel
This is probably something we'll have to sort out 🤔
After some more thought about this, I think the best approach will be to create a "feature" branch into which we will merge the StringView features until a new version of arrow-rs is released. In order to avoid that branch diverging too much (and thus being hard to maintain) I think we should get this PR patched up and ready to merge (as it can go in without any more arrow-rs support and the risk of conflicts is high) Then I'll make a branch in the DataFusion repo to hold all the StringView related code |
I have this PR checked out locally so I'll make some small changes to get the tests passing |
I filed #10961 to track making a feature branch |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this one is a good first step -- thank you @XiangpengHao
* add view types * Add slt tests * comment out failing test * update vendored code --------- Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org>
Which issue does this PR close?
Closes #10920 .
But we need to wait apache/arrow-rs#5894 to merge and propagate to DataFusion, otherwise we get error from arrow.
Different to the original issue, I think we mean
Utf8View
rather thanStringView
for the DataType name (consistent with those in Arrow DataTypes)Rationale for this change
Now you can cast utf8 to view types
What changes are included in this PR?
Are these changes tested?
Are there any user-facing changes?