Casting from Binary --> Utf8 to evaluate `LIKE` slows down some ClickBench queries 

**Is your feature request related to a problem or challenge? Please describe what you are trying to do.**

While working on enabling `StringView` by default in https://github.com/apache/datafusion/pull/12092 I noticed that some of the clickbench queries got 10% slower and looked into it. 

The plan looks like this:

```sql
DataFusion CLI v41.0.0
+---------------+----------------------------------------------------------------------------------------------------$
| plan_type     | plan                                                                                               $
+---------------+----------------------------------------------------------------------------------------------------$
| physical_plan | AggregateExec: mode=Final, gby=[], aggr=[count(*)]                                                 $
|               |   CoalescePartitionsExec                                                                           $
|               |     AggregateExec: mode=Partial, gby=[], aggr=[count(*)]                                           $
|               |       ProjectionExec: expr=[]                                                                      $
|               |         CoalesceBatchesExec: target_batch_size=8192                                                $
|               |           FilterExec: CAST(URL@0 AS Utf8View) LIKE %google%                                        $
|               |             ParquetExec: file_groups={16 groups: [[Users/andrewlamb/Software/datafusion/benchmarks/$
+---------------+----------------------------------------------------------------------------------------------------$
2 row(s) fetched.
Elapsed 0.065 seconds.
```

When looking at the flamegraphs, you can see the `CAST` spends a huge amount of time validating utf8 (more time than actually evaluating the `LIKE` predicate actually):
<img width="1585" alt="Screenshot 2024-09-17 at 11 06 34 AM" src="https://github.com/user-attachments/assets/3a97574f-da5b-4e5c-9de5-555ebadb79ef">


Here are the full flamegraphs for comparison:
[q20-flamegraph-main](https://github.com/user-attachments/assets/5cb0f83b-ce56-4a98-b0f7-c79c28d48b15)
[q20-flamegraph-stringview](https://github.com/user-attachments/assets/8ed966d5-be8e-4994-8809-e8e271e978f0)


I belive the issue is here:
```
|               |           FilterExec: CAST(URL@0 AS Utf8View) LIKE %google%
```

This filter first *CAST`s the URL column to Utf8View and then evaluates  `LIKE`

Converting `BinaryArray` --> `StringArray`as is done without StringView is relatively faster because it is done with a single large function call

However, converting `BinaryViewArrar` --> `StringViewArray` is not as it makes many small function calls. The parquet reader has a special optimization for this as descsribed in "Section 2.1: From binary to strings" of the [Using StringView / German Style Strings to Make Queries Faster: Part 1 - Reading Parquet](https://www.influxdata.com/blog/faster-queries-with-stringview-part-one-influxdb/) from  @XiangpengHao 

**Describe the solution you'd like**
I would like this query to go as fast / faster with Utf8View / BinaryView enabled.

Bonus points if it went faster even without Utf-8 enabled

**Describe alternatives you've considered**

### Option 1: `LIKE` for binary
One option is to skip validating  UTF8 entirely and evaluate `LIKE` directly on binary. This would mean if the column is read as binary we could cast the argument `'%google%'` to binary and then evaluate `LIKE` directly on the binary column. This would skip validaitng utf8 completely

Unfortunately, it appears that the `like` kernel is only implemented for `StringArray` and `StringViewArray` at the moment, not BinaryArray:  https://docs.rs/arrow-string/53.0.0/src/arrow_string/like.rs.html#110-149

Another related option would be to potentially special case the `LIKE` rewite in this case for just prefix / contians / suffix -- in this case rewrite `<binary> LIKE <const that starts and ends with '%'`> --> `<binary> CONTAINS <string>` 

### Option 2: resolve the column as `Utf8` rather than `Binary`

For some reason the schema of `hits.parquet` (the single file from ClickBench) has the `URL` column (and others) as `Utf8` (strings) but the `hits_partitioned` file resolves it as Binary. \

We could change the schema resolution logicic to resolve the column as a String instead. 

This option is probably slower than option 1 but I think it is more inline with what the intended semantics (these columns contain logical stirngs) and the parquet reader includes the fast read path for such strings and would be more general.

Filed https://github.com/apache/datafusion/issues/12510 to track this ideae

**Additional context**

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Casting from Binary --> Utf8 to evaluate `LIKE` slows down some ClickBench queries #12509

Option 1: `LIKE` for binary

Option 2: resolve the column as `Utf8` rather than `Binary`

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Casting from Binary --> Utf8 to evaluate LIKE slows down some ClickBench queries #12509

Description

Option 1: LIKE for binary

Option 2: resolve the column as Utf8 rather than Binary

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions

Casting from Binary --> Utf8 to evaluate `LIKE` slows down some ClickBench queries #12509

Option 1: `LIKE` for binary

Option 2: resolve the column as `Utf8` rather than `Binary`