-
Notifications
You must be signed in to change notification settings - Fork 253
fix: handle cast to dictionary vector introduced by case when #2044
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## main #2044 +/- ##
============================================
+ Coverage 56.12% 58.52% +2.40%
- Complexity 976 1282 +306
============================================
Files 119 143 +24
Lines 11743 13281 +1538
Branches 2251 2368 +117
============================================
+ Hits 6591 7773 +1182
- Misses 4012 4272 +260
- Partials 1140 1236 +96 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
|
TBH, I'm not sure this is the ideal fix because I'm not really sure why we are getting a plan to cast a scalar/primitive array to a dictionary array. |
| } | ||
|
|
||
| // copied from datafusion common scalar/mod.rs | ||
| fn dict_from_values<K: ArrowDictionaryKeyType>( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should we make it pub in DataFusion and reuse ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we can also play with cast kernel, looks like such cast might be supported
from can_cast_types
(_, Dictionary(_, value_type)) => can_cast_types(from_type, value_type),
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's a good suggestion. Let me try that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Second that we rely on the cast kernel. Do we have a good idea of how this cast is even ending up in the plan? What does the native plan look like?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I looked into reusing the cast kernel. Looks like existing code already would have fallen thru to the catch all branch which calls the cast kernel. Even though can_cast_types appears to allow this, the cast actually fails with
Caused by: org.apache.comet.CometNativeException: InternalError: Native cast invoked for unsupported cast from Utf8 to Dictionary(Int32, Utf8).
The reason for the cast (which is implicit and does not show up in the plan) is apparently because of a case statement like
CASE WHEN (_condition_) THEN _dict_encoded_column_ ELSE '' END
In such a case, we are implicitly trying to cast the array containing the string value to a dict encoded array.
I'll open a follow up PR to make dict_from_values publich in DF and remove the duplication
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@mbutrovich @comphead can we go ahead with this PR? I've verified that using the cast kernel directly does not work for this case.
7c650a7 to
30791b6
Compare
comphead
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @parthchandra for experimenting with this.
Are you planning to open PR or issue? |
|
I can log an issue. |
|
Issue: apache/datafusion#17366 |
Which issue does this PR close?
Closes #2038.
Rationale for this change
The query in the clickbench benchmark throws an exception
What changes are included in this PR?
Handles an array cast case where the to type is a dictionary
How are these changes tested?
New unit test