Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add DataType::Utf8View and DataType::BinaryView #5470

Merged
merged 5 commits into from
Mar 5, 2024

Conversation

XiangpengHao
Copy link
Contributor

@XiangpengHao XiangpengHao commented Mar 4, 2024

Which issue does this PR close?

Part of #5468 .

Closes #5468

Rationale for this change

Add DataType::Utf8View and DataType::BinaryView. This is the first step towards supporting StringViewArray in arrow.

What changes are included in this PR?

Only the type definition and the related comments. Most of the handling of the types are unimplemented!()

I'm new to arrow and not sure if this is the commit granularity we expected, comments/suggestions welcome!

Are there any user-facing changes?

@github-actions github-actions bot added parquet Changes to the parquet crate arrow Changes to the arrow crate labels Mar 4, 2024
Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you @XiangpengHao

I was going to suggest returning errors rather than panic but now I see that the apis aren't fallible (don't return Results).

Thus, even though using unimplemented! will panic, I think this is ok for now and is consistent with what we did with other array types that didn't have full support.

Another potential alternative is to make a feature flag and #[feature(array_view))] all the relevant call sites

However, if others think it is important not to panic in the API, I suggest we make a feature branch (I can do so) and we can merge PRs to that feature branch while in development and then merge the feature branch to master when it is ready

arrow-data/src/equal/mod.rs Outdated Show resolved Hide resolved
@@ -543,6 +543,7 @@ pub(crate) fn get_fb_field_type<'a>(
.as_union_value(),
children: Some(fbb.create_vector(&empty_fields[..])),
},
BinaryView | Utf8View => unimplemented!("Not implemented"),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a good reminder that we need to implement IPC support as well -- I added that as a task to

@alamb
Copy link
Contributor

alamb commented Mar 4, 2024

I am pleasantly surprised by how few places in the code needed to be changed.

Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great to me -- thank you @XiangpengHao

I left some suggestions on wording, but

I think we should leave this PR open for at least 24 hours to give other people a chance to comment

arrow-schema/src/datatype.rs Outdated Show resolved Hide resolved
arrow-schema/src/datatype.rs Outdated Show resolved Hide resolved
Copy link
Contributor

@tustvold tustvold left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Panicking is fine imo whilst we get this integrated. I would recommend changing the unimplemented! messages to state what isn't implemented

arrow-schema/src/datatype.rs Outdated Show resolved Hide resolved
Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org>
Co-authored-by: Liang-Chi Hsieh <viirya@gmail.com>
@XiangpengHao
Copy link
Contributor Author

I am pleasantly surprised by how few places in the code needed to be changed.

I mostly rely on the compiler to find where to add unimplemented!(). Other cases like:

match data_type {
  DataType::some_thing => {}
  _ => {}
}

https://github.com/XiangpengHao/arrow-rs/blob/d861177a5d0171fab8d4f646d3b72dfabf83b14d/rust/arrow/src/datatypes/field.rs#L124

Might require a closer look to make sure they are safe to escape

@ariesdevil
Copy link
Contributor

If there's nothing wrong with it, let's go ahead and merge it and I'll do a follow-up based on that.

arrow-data/src/data.rs Outdated Show resolved Hide resolved
Co-authored-by: Daniël Heres <danielheres@gmail.com>
@alamb alamb merged commit 7eb866d into apache:master Mar 5, 2024
27 checks passed
@alamb
Copy link
Contributor

alamb commented Mar 5, 2024

THanks everyone -- it is so exciting to see this project begin

Also, kudos to @XiangpengHao for your first Arrow contribution!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
arrow Changes to the arrow crate parquet Changes to the parquet crate
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add DataType::Utf8View and DataType::BinaryView
6 participants