Closed
Description
Describe the bug
The make_array
function doesn't appear to properly support nulls for string arrays
To Reproduce
When I run this script
create table foo as values ('foo', null), ('bar', null), (null, 'baz');
select * from foo;
select
make_array(column1, column2)[1],
make_array(column1, column2)[1] IS NULL
from foo;
select
make_array(column1, column2)[2],
make_array(column1, column2)[2] IS NULL
from foo;
I get the following output (note that the output of make_array that had a null input does not have a null output):
0 rows in set. Query took 0.004 seconds.
+---------+---------+
| column1 | column2 |
+---------+---------+
| foo | |
| bar | |
| | baz |
+---------+---------+
3 rows in set. Query took 0.002 seconds.
+----------------------------------------+------------------------------------------------+
| make_array(foo.column1,foo.column2)[1] | make_array(foo.column1,foo.column2)[1] IS NULL |
+----------------------------------------+------------------------------------------------+
| foo | false |
| bar | false |
| | false |
+----------------------------------------+------------------------------------------------+
3 rows in set. Query took 0.007 seconds.
+----------------------------------------+------------------------------------------------+
| make_array(foo.column1,foo.column2)[2] | make_array(foo.column1,foo.column2)[2] IS NULL |
+----------------------------------------+------------------------------------------------+
| | false |
| | false |
| baz | false |
+----------------------------------------+------------------------------------------------+
3 rows in set. Query took 0.007 seconds.
Expected behavior
This is what used to happen (note the true/false):
(arrow_dev) alamb@MacBook-Pro-8:~/Software/arrow-datafusion2/benchmarks$ datafusion-cli -f /tmp/repro.sql
DataFusion CLI v27.0.0
0 rows in set. Query took 0.002 seconds.
+---------+---------+
| column1 | column2 |
+---------+---------+
| foo | |
| bar | |
| | baz |
+---------+---------+
3 rows in set. Query took 0.001 seconds.
+----------------------------------------+------------------------------------------------+
| make_array(foo.column1,foo.column2)[1] | make_array(foo.column1,foo.column2)[1] IS NULL |
+----------------------------------------+------------------------------------------------+
| foo | false |
| bar | false |
| | true |
+----------------------------------------+------------------------------------------------+
3 rows in set. Query took 0.002 seconds.
+----------------------------------------+------------------------------------------------+
| make_array(foo.column1,foo.column2)[2] | make_array(foo.column1,foo.column2)[2] IS NULL |
+----------------------------------------+------------------------------------------------+
| | true |
| | true |
| baz | false |
+----------------------------------------+------------------------------------------------+
3 rows in set. Query took 0.001 seconds.
Additional context
I found this while trying to update IOx to use the latest datafusion: https://github.com/influxdata/influxdb_iox/pull/8127