Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support hashing List columns #7616

Merged
merged 7 commits into from
Sep 22, 2023
Merged

Conversation

jonmmease
Copy link
Contributor

Which issue does this PR close?

Closes #7473

Rationale for this change

I'd like to be able to group by a List column

What changes are included in this PR?

Follows suggestion by @tustvold to hash List arrays by first hashing the inner values array and then using the list offsets to combine hashes within each list element.

Are these changes tested?

Yes, there is a unit test for the hash logic itself and an sqllogic test for the desired groupby behavior

Are there any user-facing changes?

GROUP BY on a list column will no longer error

@github-actions github-actions bot added logical-expr Logical plan and expressions physical-expr Physical Expressions sqllogictest SQL Logic Tests (.slt) labels Sep 21, 2023
Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very nice @jonmmease -- thank you very much 👏

I reviewed the code and tests and it looks good to me

@xudong963 xudong963 merged commit 2c83b02 into apache:main Sep 22, 2023
21 checks passed
Ted-Jiang pushed a commit to Ted-Jiang/arrow-datafusion that referenced this pull request Oct 7, 2023
* Hash ListArray

* Implement hash join for list arrays

* add sqllogic test for grouping by list column

* reset parquet-testing

* reset testing

* clippy
@andygrove andygrove added the enhancement New feature or request label Oct 7, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request logical-expr Logical plan and expressions physical-expr Physical Expressions sqllogictest SQL Logic Tests (.slt)
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Unsupported data type in hasher: List
4 participants