Skip to content

Support DictionaryString for Regex matching operators #12618

@goldmedal

Description

@goldmedal

Is your feature request related to a problem or challenge?

While I was working on #12415, I found the DictionaryString can't pass the following case in datafusion/sqllogictest/test_files/string/string_query.slt.part

statement ok
create table test_basic_operator as
select
    arrow_cast(column1, 'Dictionary(Int32, Utf8)') as ascii_1,
    arrow_cast(column2, 'Dictionary(Int32, Utf8)') as ascii_2,
    arrow_cast(column3, 'Dictionary(Int32, Utf8)') as unicode_1,
    arrow_cast(column4, 'Dictionary(Int32, Utf8)') as unicode_2
from test_source;

query BB
SELECT
  ascii_1 ~* '^a.{3}e',
  unicode_1 ~* '^d.*Фу'
FROM test_basic_operator;
----
true false
false false
false true
NULL NULL

I got the error message:

External error: query failed: DataFusion error: Internal error: Data type Dictionary(Int32, Utf8) not supported for binary_string_array_flag_op_scalar operation 'regexp_is_match' on string array.
This was likely caused by a bug in DataFusion's code and we would welcome that you file an bug report in our issue tracker

Describe the solution you'd like

Support DictionaryString at

macro_rules! binary_string_array_flag_op {
($LEFT:expr, $RIGHT:expr, $OP:ident, $NOT:expr, $FLAG:expr) => {{
match $LEFT.data_type() {
DataType::Utf8View | DataType::Utf8 => {
compute_utf8_flag_op!($LEFT, $RIGHT, $OP, StringArray, $NOT, $FLAG)
},
DataType::LargeUtf8 => {
compute_utf8_flag_op!($LEFT, $RIGHT, $OP, LargeStringArray, $NOT, $FLAG)
},
other => internal_err!(
"Data type {:?} not supported for binary_string_array_flag_op operation '{}' on string array",

Describe alternatives you've considered

No response

Additional context

No response

Metadata

Metadata

Assignees

Labels

enhancementNew feature or request

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions