Add 'clickbench_extended' benchmark

### Is your feature request related to a problem or challenge?

The [ClickBench](https://benchmark.clickhouse.com/) benchmark has excellent coverage for aggregate / grouping 

We have used the clickbench benchmark, run via `bench.sh`,  for important work improving aggregates such as https://github.com/apache/arrow-datafusion/issues/6988  and https://github.com/apache/arrow-datafusion/issues/7064. However there are some important optimizations like https://github.com/apache/arrow-datafusion/pull/8849 and https://github.com/apache/arrow-datafusion/issues/7191 from @avantgardnerio where the clickbench benchmark does not cover the existing usecase

For example, @jayzhan211 's change in https://github.com/apache/arrow-datafusion/pull/8849#issuecomment-1890482901 makes certain realistic queries 


<details><summary>Details on `bench.sh`</summary>
<p>

```shell
$ ./benchmarks/bench.sh --help

Orchestrates running benchmarks against DataFusion checkouts

Usage:
./benchmarks/bench.sh data [benchmark]
./benchmarks/bench.sh run [benchmark]
./benchmarks/bench.sh compare <branch1> <branch2>

**********
Examples:
**********
# Create the datasets for all benchmarks in /Users/andrewlamb/Software/arrow-datafusion/benchmarks/data
./bench.sh data

# Run the 'tpch' benchmark on the datafusion checkout in /source/arrow-datafusion
DATAFASION_DIR=/source/arrow-datafusion ./bench.sh run tpch

**********
* Commands
**********
data:         Generates data needed for benchmarking
run:          Runs the named benchmark
compare:      Comares results from benchmark runs

**********
* Benchmarks
**********
all(default): Data/Run/Compare for all benchmarks
tpch:                   TPCH inspired benchmark on Scale Factor (SF) 1 (~1GB), single parquet file per table
tpch_mem:               TPCH inspired benchmark on Scale Factor (SF) 1 (~1GB), query from memory
tpch10:                 TPCH inspired benchmark on Scale Factor (SF) 10 (~10GB), single parquet file per table
tpch10_mem:             TPCH inspired benchmark on Scale Factor (SF) 10 (~10GB), query from memory
parquet:                Benchmark of parquet reader's filtering speed
sort:                   Benchmark of sorting speed
clickbench_1:           ClickBench queries against a single parquet file
clickbench_partitioned: ClickBench queries against a partitioned (100 files) parquet

**********
* Supported Configuration (Environment Variables)
**********
DATA_DIR        directory to store datasets
CARGO_COMMAND   command that runs the benchmark binary
DATAFASION_DIR  directory to use (default /Users/andrewlamb/Software/arrow-datafusion/benchmarks/..)
```

</p>
</details> 

### Describe the solution you'd like

I would like to add a new benchmark to `bench.sh` that uses the same dataset but has different queries than the existing

```shell
$ ./benchmarks/bench.sh run clickbench_extended
```

The new queries should be
1. realistic (can write an English sentence explaining the quantity the compute and how it might be used)
2. Reflect some query pattern

Here is an example from https://github.com/apache/arrow-datafusion/pull/8849#issuecomment-1890482901


## Query: Distinct counts

Query Explanation: Data exploration: understand the qualities of the data in `hits.parquet`
Query Properties: multiple count distinct aggregates on string datatypes

```sql
❯ SELECT
  COUNT(DISTINCT "SearchPhrase"),
  COUNT(DISTINCT "MobilePhone"),
  COUNT(DISTINCT "MobilePhoneModel")
FROM 'hits.parquet';
```


### Describe alternatives you've considered

_No response_

### Additional context

_No response_

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add 'clickbench_extended' benchmark #8860

Is your feature request related to a problem or challenge?

Describe the solution you'd like

Query: Distinct counts

Describe alternatives you've considered

Additional context

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Add 'clickbench_extended' benchmark #8860

Description

Is your feature request related to a problem or challenge?

Describe the solution you'd like

Query: Distinct counts

Describe alternatives you've considered

Additional context

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions