Skip to content

Conversation

@parthchandra
Copy link
Contributor

Which issue does this PR close?

Closes #2038.

Rationale for this change

The query in the clickbench benchmark throws an exception

What changes are included in this PR?

Handles an array cast case where the to type is a dictionary

How are these changes tested?

New unit test

@parthchandra parthchandra marked this pull request as draft July 29, 2025 00:32
@codecov-commenter
Copy link

codecov-commenter commented Jul 29, 2025

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 58.52%. Comparing base (f09f8af) to head (30791b6).
⚠️ Report is 415 commits behind head on main.

Additional details and impacted files
@@             Coverage Diff              @@
##               main    #2044      +/-   ##
============================================
+ Coverage     56.12%   58.52%   +2.40%     
- Complexity      976     1282     +306     
============================================
  Files           119      143      +24     
  Lines         11743    13281    +1538     
  Branches       2251     2368     +117     
============================================
+ Hits           6591     7773    +1182     
- Misses         4012     4272     +260     
- Partials       1140     1236      +96     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@parthchandra parthchandra marked this pull request as ready for review July 29, 2025 16:42
@parthchandra
Copy link
Contributor Author

TBH, I'm not sure this is the ideal fix because I'm not really sure why we are getting a plan to cast a scalar/primitive array to a dictionary array.

}

// copied from datafusion common scalar/mod.rs
fn dict_from_values<K: ArrowDictionaryKeyType>(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should we make it pub in DataFusion and reuse ?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we can also play with cast kernel, looks like such cast might be supported

from can_cast_types

        (_, Dictionary(_, value_type)) => can_cast_types(from_type, value_type),

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's a good suggestion. Let me try that.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Second that we rely on the cast kernel. Do we have a good idea of how this cast is even ending up in the plan? What does the native plan look like?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I looked into reusing the cast kernel. Looks like existing code already would have fallen thru to the catch all branch which calls the cast kernel. Even though can_cast_types appears to allow this, the cast actually fails with

Caused by: org.apache.comet.CometNativeException: InternalError: Native cast invoked for unsupported cast from Utf8 to Dictionary(Int32, Utf8).

The reason for the cast (which is implicit and does not show up in the plan) is apparently because of a case statement like

CASE WHEN (_condition_) THEN _dict_encoded_column_ ELSE '' END

In such a case, we are implicitly trying to cast the array containing the string value to a dict encoded array.

I'll open a follow up PR to make dict_from_values publich in DF and remove the duplication

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mbutrovich @comphead can we go ahead with this PR? I've verified that using the cast kernel directly does not work for this case.

@parthchandra parthchandra self-assigned this Aug 8, 2025
Copy link
Contributor

@comphead comphead left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @parthchandra for experimenting with this.

@comphead
Copy link
Contributor

I'll open a follow up PR to make dict_from_values publich in DF and remove the duplication

Are you planning to open PR or issue?

@parthchandra
Copy link
Contributor Author

I can log an issue.

@parthchandra
Copy link
Contributor Author

@parthchandra parthchandra merged commit ba228e1 into apache:main Sep 2, 2025
159 of 161 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Comet fails to run clickbench query

4 participants