Skip to content

array_contains returning unexpected values with column parameter #6972

Closed
@maxburke

Description

@maxburke

Describe the bug

The array_contains code seems to be overly-flattening input lists and in doing so is generating incorrect data when one of the parameters is a column of List-type.

To Reproduce

I've attached a parquet table containing a column with type List(String).

When use array_contains on this data, I get this result set:

❯ create external table t0 stored as parquet location '/Users/max/tmp/array_contains.parquet';
0 rows in set. Query took 0.017 seconds.
❯ select bid_node_ids from t0 where array_contains(bid_node_ids, ['z+CPVybgUuCXlAE3A3jqyg==']);
+----------------------------+
| bid_node_ids               |
+----------------------------+
| [okwzcOFM3yjUzNFbc/BYBQ==] |
| [DbNysJTF560NzR/HLbAa/Q==] |
| [ivO3+Z+WMRqwhivy85d6KA==] |
+----------------------------+
3 rows in set. Query took 0.076 seconds.
❯

Note that none of the resulting bid_node_ids values contain the queried-for value of z+CPVybgUuCXlAE3A3jqyg==

array_contains.parquet.zip

Expected behavior

I was expecting that there are 861 matching results in the result set, all of which contain the value z+CPVybgUuCXlAE3A3jqyg==

❯ select bid_node_ids from t0 where array_contains(bid_node_ids, ['z+CPVybgUuCXlAE3A3jqyg==']);
+--------------------------------------------------------------------------------+
| bid_node_ids                                                                   |
+--------------------------------------------------------------------------------+
| [wFEkOS2AFYxekv7SzPrkiQ==, z+CPVybgUuCXlAE3A3jqyg==]                           |
[....snip...]
| [O3GAOhhCbfxgXcZEwLI7aQ==, z+CPVybgUuCXlAE3A3jqyg==]                           |
| [O3GAOhhCbfxgXcZEwLI7aQ==, z+CPVybgUuCXlAE3A3jqyg==]                           |
| [z+CPVybgUuCXlAE3A3jqyg==]                                                     |
| [iTd7HyShRr0PqSKyqKT0+A==, z+CPVybgUuCXlAE3A3jqyg==]                           |
| [edSh3ZpG53UB+JMV875ipg==, z+CPVybgUuCXlAE3A3jqyg==]                           |
| [O3GAOhhCbfxgXcZEwLI7aQ==, z+CPVybgUuCXlAE3A3jqyg==]                           |
| [edSh3ZpG53UB+JMV875ipg==, z+CPVybgUuCXlAE3A3jqyg==]                           |
| [O3GAOhhCbfxgXcZEwLI7aQ==, z+CPVybgUuCXlAE3A3jqyg==]                           |
| [O3GAOhhCbfxgXcZEwLI7aQ==, z+CPVybgUuCXlAE3A3jqyg==]                           |
| [edSh3ZpG53UB+JMV875ipg==, z+CPVybgUuCXlAE3A3jqyg==]                           |
| [edSh3ZpG53UB+JMV875ipg==, z+CPVybgUuCXlAE3A3jqyg==]                           |
+--------------------------------------------------------------------------------+
861 rows in set. Query took 1.069 seconds.

Additional context

I've hacked together a change on our branch that gives us the changes we are expecting: urbanlogiq@a381f10 but I'm not sure if this fix is what is intended by the original author.

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions