Skip to content

Implement arrow_cast support for StringView and BinaryView #10920

Closed
@alamb

Description

@alamb

Is your feature request related to a problem or challenge?

Part of #10918, [StringViewArray](https://docs.rs/arrow/latest/arrow/array/type.StringViewArray.html) support in DataFusion

https://datafusion.apache.org/user-guide/sql/scalar_functions.html#arrow-cast is a function widely used in DataFusion testing to test with specific arrow data types

Under the covers it simply calls the appropriate arrow-cast kernel

Here is an example showing how this works

> select arrow_cast('foo', 'Dictionary(Int32, Utf8)');
+---------------------------------------------------------+
| arrow_cast(Utf8("foo"),Utf8("Dictionary(Int32, Utf8)")) |
+---------------------------------------------------------+
| foo                                                     |
+---------------------------------------------------------+
1 row(s) fetched.
Elapsed 0.006 seconds.

> select arrow_typeof(arrow_cast('foo', 'Dictionary(Int32, Utf8)'));
+-----------------------------------------------------------------------+
| arrow_typeof(arrow_cast(Utf8("foo"),Utf8("Dictionary(Int32, Utf8)"))) |
+-----------------------------------------------------------------------+
| Dictionary(Int32, Utf8)                                               |
+-----------------------------------------------------------------------+
1 row(s) fetched.
Elapsed 0.001 seconds.

here is how to make a table with dictionary encoded values:

> create table foo as values ('Andrew', 'Xiangpeng', 'Raphael');
0 row(s) fetched.
Elapsed 0.002 seconds.

> create table dict_table as select arrow_cast(column1, 'Dictionary(Int32, Utf8)') column1 from foo;
0 row(s) fetched.
Elapsed 0.008 seconds.

> select column1, arrow_typeof(column1) from dict_table;
+---------+----------------------------------+
| column1 | arrow_typeof(dict_table.column1) |
+---------+----------------------------------+
| Andrew  | Dictionary(Int32, Utf8)          |
+---------+----------------------------------+
1 row(s) fetched.
Elapsed 0.002 seconds.

Describe the solution you'd like

I would like to be able to use ArrowCast to create StringView and BinaryView arrays for testing

This does not yet work:

> select arrow_cast('foo', 'StringView');
Error during planning: Unsupported type 'StringView'. Must be a supported arrow type name such as 'Int32' or 'Timestamp(Nanosecond, None)'. Error unrecognized word: StringView

Describe alternatives you've considered

No response

Additional context

No response

Metadata

Metadata

Assignees

Labels

enhancementNew feature or request

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions