Skip to content

Fix comparison of dictionary arrays#1606

Merged
alamb merged 2 commits intoapache:masterfrom
alamb:alamb/compare_dict_arrays
Jan 19, 2022
Merged

Fix comparison of dictionary arrays#1606
alamb merged 2 commits intoapache:masterfrom
alamb:alamb/compare_dict_arrays

Conversation

@alamb
Copy link
Contributor

@alamb alamb commented Jan 18, 2022

Which issue does this PR close?

Closes #1605

See ticket for more details

assert_batches_eq!(expected, &actual);

// comparison with another dictionary column
let sql = "SELECT d1 FROM test WHERE d1 = d2";
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this test fails without the change

@alamb alamb force-pushed the alamb/compare_dict_arrays branch from 98e574d to c4db8f3 Compare January 18, 2022 15:13

fn comparison_eq_coercion(lhs_type: &DataType, rhs_type: &DataType) -> Option<DataType> {
if lhs_type == rhs_type {
if lhs_type == rhs_type && !is_dictionary(lhs_type) {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the arrow eq kernels don't support DictionaryArray <==> DictionaryArray comparisons yet, so we need to handle this case specially. I will also file a ticket to add native DictionaryArray comparison support.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

}
}

pub(crate) fn is_dictionary(t: &DataType) -> bool {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The duplication of code between this module and binary_rule.rs is not good -- I am going to try and consolidate it as a follow on PR cc @liukun4515

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

#1607 <-- doesn't remove duplication but it at least consolidates the logic into a single module


fn comparison_eq_coercion(lhs_type: &DataType, rhs_type: &DataType) -> Option<DataType> {
if lhs_type == rhs_type {
if lhs_type == rhs_type && !is_dictionary(lhs_type) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you add TODO or some comments for this change?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will do so -- it is a good point

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

in e27110f

rhs_type: &DataType,
) -> Option<DataType> {
if lhs_type == rhs_type {
if lhs_type == rhs_type && !is_dictionary(lhs_type) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's same with above.

}
}

pub(crate) fn is_dictionary(t: &DataType) -> bool {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

another point I want to point out is that the arrow-rs has some codes to check data type, and the logic is also in datafusion, for example, https://docs.rs/arrow/7.0.0/arrow/datatypes/enum.DataType.html#method.is_numeric, Maybe we should refactor them and make the codebase clear.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Filed as #1613

Copy link
Contributor

@liukun4515 liukun4515 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM
Thanks @alamb

@alamb alamb merged commit ad392fd into apache:master Jan 19, 2022
@alamb alamb deleted the alamb/compare_dict_arrays branch January 19, 2022 13:39
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Data type Dictionary(Int32, Utf8) not supported for binary operation 'eq' on dyn arrays

3 participants