Skip to content

Cast Utf8 -> Utf8View (not the other way around) for binary operators #11881

Closed
@alamb

Description

@alamb

Is your feature request related to a problem or challenge?

In #11796 @dharanad added a rule for binary operators such that if Utf8View is in any side, we coerce to Utf8.

I think it would be better to coerce to Utf8View as that coercsion will often be faster (it is faster to cast Utf8 -> Utf8View than the other way around)

@XiangpengHao notes: #11796 (comment)

Agree, similar to this policy:

fn string_coercion(lhs_type: &DataType, rhs_type: &DataType) -> Option<DataType> {
use arrow::datatypes::DataType::*;
match (lhs_type, rhs_type) {
// If Utf8View is in any side, we coerce to Utf8View.
(Utf8View, Utf8View | Utf8 | LargeUtf8) | (Utf8 | LargeUtf8, Utf8View) => {
Some(Utf8View)
}
// Then, if LargeUtf8 is in any side, we coerce to LargeUtf8.
(LargeUtf8, Utf8 | LargeUtf8) | (Utf8, LargeUtf8) => Some(LargeUtf8),
(Utf8, Utf8) => Some(Utf8),
_ => None,
}
}

Describe the solution you'd like

Cast to Utf8View rather than Utf8 in the aforementioned code

Describe alternatives you've considered

No response

Additional context

No response

Metadata

Metadata

Assignees

Labels

enhancementNew feature or request

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions