-
Notifications
You must be signed in to change notification settings - Fork 1.8k
Closed
Labels
bugSomething isn't workingSomething isn't working
Description
Describe the bug
DataFusion panic's when runnning a query like
select count(distinct column1), count(distinct column2) from test group by column1;
Where column1 and column2 are Dictionary
To Reproduce
❯ create table test as values (1, arrow_cast('foo', 'Dictionary(Int32, Utf8)')), (2, arrow_cast('bar', 'Dictionary(Int32, Utf8)'));
0 rows in set. Query took 0.002 seconds.
❯ select * from test;
+---------+---------+
| column1 | column2 |
+---------+---------+
| 1 | foo |
| 2 | bar |
+---------+---------+
2 rows in set. Query took 0.001 seconds.
❯ select count(distinct column1), count(distinct column2) from test group by column1;
thread 'tokio-runtime-worker' panicked at /Users/alamb/Software/arrow-datafusion/datafusion/common/src/scalar.rs:1846:18:
Unsupported data type Dictionary(Int32, Utf8) for ScalarValue::list_to_array
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace
External error: Join Error
caused by
External error: task 66 panicked
Expected behavior
The test should produce the same as when the values are cast to string:
❯ select count(distinct column1::varchar), count(distinct column2::varchar) from test group by column1;
+------------------------------+------------------------------+
| COUNT(DISTINCT test.column1) | COUNT(DISTINCT test.column2) |
+------------------------------+------------------------------+
| 1 | 1 |
| 1 | 1 |
+------------------------------+------------------------------+
Additional context
I believe this is a regression introduced in #7629 (not yet released)
We found this downstream in IOx
Metadata
Metadata
Assignees
Labels
bugSomething isn't workingSomething isn't working