Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Should can_coerce_from promote Utf8 to LargeUtf8? #13325

Open
gatesn opened this issue Nov 9, 2024 · 1 comment
Open

Should can_coerce_from promote Utf8 to LargeUtf8? #13325

gatesn opened this issue Nov 9, 2024 · 1 comment
Labels
bug Something isn't working

Comments

@gatesn
Copy link

gatesn commented Nov 9, 2024

Describe the bug

The function can_coerce_from establishes the lossless conversions between data types.

It currently says that LargeUtf8 can be coerced into Utf8. I'm not sure this should be true?

// Any type can be coerced into strings
(Utf8 | LargeUtf8, _) => Some(type_into.clone()),

I suppose I could construct a single row that contains more than i32 bytes of string data. But it also seems generally desirable for DataFusion to promote to the large string/binary type?

To Reproduce

No response

Expected behavior

Utf8 promotes into LargeUt8, but not vice versa.

Additional context

No response

@gatesn gatesn added the bug Something isn't working label Nov 9, 2024
@jayzhan211
Copy link
Contributor

jayzhan211 commented Nov 9, 2024

You can try TypeSignature::String now if you want to coerce any type to string.

The current logic is which won't coerce Utf8 to LargeUtf8. (LargeUtf8View is currently not supported)

pub fn string_coercion(lhs_type: &DataType, rhs_type: &DataType) -> Option<DataType> {
    use arrow::datatypes::DataType::*;
    match (lhs_type, rhs_type) {
        // If Utf8View is in any side, we coerce to Utf8View.
        (Utf8View, Utf8View | Utf8 | LargeUtf8) | (Utf8 | LargeUtf8, Utf8View) => {
            Some(Utf8View)
        }
        // Then, if LargeUtf8 is in any side, we coerce to LargeUtf8.
        (LargeUtf8, Utf8 | LargeUtf8) | (Utf8, LargeUtf8) => Some(LargeUtf8),
        // Utf8 coerces to Utf8
        (Utf8, Utf8) => Some(Utf8),
        _ => None,
    }
}

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants