Description
Is your feature request related to a problem or challenge?
Part of #11752
As we work to complete StringView support in DataFusion @2010YOUY01 noticed on #11752 (comment) that we don't currently support Regexp like binary operators https://datafusion.apache.org/user-guide/sql/operators.html#op-re-match for string view
Reproducer
CREATE TABLE t0(v0 DOUBLE, v1 DOUBLE, v2 BOOLEAN, v3 BOOLEAN, v4 BOOLEAN, v5 STRING);
INSERT INTO t0(v1, v5, v2) VALUES (0.7183242196192607, 'Tn', true);
CREATE TABLE t0_stringview AS SELECT v0, v1, v2, v3, v4, arrow_cast(v5, 'Utf8View') as v5 FROM t0;
> select v5 ~ 'foo' from t0_stringview;
Internal error: Data type Utf8View not supported for binary_string_array_flag_op_scalar operation 'regexp_is_match' on string array.
This was likely caused by a bug in DataFusion's code and we would welcome that you file an bug report in our issue tracker
> select regexp_match(v5, 'foo') from t0_stringview;
+--------------------------------------------+
| regexp_match(t0_stringview.v5,Utf8("foo")) |
+--------------------------------------------+
| |
+--------------------------------------------+
1 row(s) fetched.
Elapsed 0.034 seconds.
Describe the solution you'd like
StringView should be supported for these operators (aka the query should run without error)
Describe alternatives you've considered
Here are the relevant operator names:
| Operator::RegexMatch
| Operator::RegexIMatch
| Operator::RegexNotMatch
| Operator::RegexNotIMatch
| Operator::LikeMatch
| Operator::ILikeMatch
| Operator::NotLikeMatch
| Operator::NotILikeMatch
Here is the dispatch code:
datafusion/datafusion/physical-expr/src/expressions/binary.rs
Lines 621 to 632 in 0f96af5
It appears that the corresponding arrow-rs kernel does not yet have support for StringView
https://docs.rs/arrow-string/52.2.0/src/arrow_string/regexp.rs.html#307-311
So what I would suggest is:
- Implement a PR in datafusion with coercion from Utf8View --> Utf8 (aka cast arguments back to string)
- File an upstream ticket in arrow-rs for supporting string view with the regexp_like kernels and leave a link to that ticket in the datafusion code
Additional context
No response