Skip to content

Conversation

@adriangb
Copy link
Contributor

@adriangb adriangb commented Nov 2, 2025

Background

This PR is part of an EPIC to push down hash table references from HashJoinExec into scans. The EPIC is tracked in #17171.

A "target state" is tracked in #18393.
There is a series of PRs to get us to this target state in smaller more reviewable changes that are still valuable on their own:

Changes in this PR

Change create_hashes and related functions to work with &dyn Array references instead of requiring ArrayRef (Arc-wrapped arrays). This avoids unnecessary Arc::clone() calls and enables calls that only have an &dyn Array to use the hashing utilities.

  • Add create_hashes_from_arrays(&[&dyn Array]) function
  • Refactor hash_dictionary, hash_list_array, hash_fixed_list_array to use references instead of cloning
  • Extract hash_single_array() helper for common logic

@github-actions github-actions bot added physical-expr Changes to the physical-expr crates common Related to common crate physical-plan Changes to the physical-plan crate labels Nov 2, 2025
Change create_hashes and related functions to work with &dyn Array
references instead of requiring ArrayRef (Arc-wrapped arrays).
This avoids unnecessary Arc::clone() calls and enables calls that
only have an &dyn Array to use the hashing utilities.

Changes:
- Add create_hashes_from_arrays(&[&dyn Array]) function
- Refactor hash_dictionary, hash_list_array, hash_fixed_list_array
  to use references instead of cloning
- Extract hash_single_array() helper for common logic
@2010YOUY01
Copy link
Contributor

What is this series of PRs implementing? Is it for query anti-pattern in #18393 (comment), stat pruning is not working, so we're pushing down build-side dynamic filter like key in [1,5,23] to probe side parquet scan, and use parquet builtin bloom filter to effectively skip prunable units?

By the way I suggest to open an umbrella issue to describe the high-level ideas, now the discussion seems to be scattered around, and the work is hard to follow.

@adriangb
Copy link
Contributor Author

adriangb commented Nov 3, 2025

What is this series of PRs implementing? Is it for query anti-pattern in #18393 (comment), stat pruning is not working, so we're pushing down build-side dynamic filter like key in [1,5,23] to probe side parquet scan, and use parquet builtin bloom filter to effectively skip prunable units?

Yes that is precisely it. I'm not sure what you mean by query anti-pattern though.

I'll put a summary and links to the relevant PRs in #17171 (comment). That said #17171 is already a pretty busy issue so it's buried at the bottom... I'm not sure what more I can do about that.

@2010YOUY01
Copy link
Contributor

What is this series of PRs implementing? Is it for query anti-pattern in #18393 (comment), stat pruning is not working, so we're pushing down build-side dynamic filter like key in [1,5,23] to probe side parquet scan, and use parquet builtin bloom filter to effectively skip prunable units?

Yes that is precisely it. I'm not sure what you mean by query anti-pattern though.

Maybe query anti-pattern is not accurate, but I'm just describing in that query, stat pruning is not working due to the given parquet file is not well clustered on the pruning column.

I'll put a summary and links to the relevant PRs in #17171 (comment). That said #17171 is already a pretty busy issue so it's buried at the bottom... I'm not sure what more I can do about that.

Perhaps edit the issue top description, or open a new one? 🤔

@adriangb
Copy link
Contributor Author

adriangb commented Nov 3, 2025

Maybe query anti-pattern is not accurate, but I'm just describing in that query, stat pruning is not working due to the given parquet file is not well clustered on the pruning column.

Okay yes agreed. Anti-pattern was only confusing to me because I thought it might be suggesting that pattern of query is wrong / that the query itself is wrong and had me thinking maybe my example was a bad one 😔

I'll put a summary and links to the relevant PRs in #17171 (comment). That said #17171 is already a pretty busy issue so it's buried at the bottom... I'm not sure what more I can do about that.

Perhaps edit the issue top description, or open a new one? 🤔

Hmm there's others working on the same issue I don't want to make my work sounds like the only solution. I'll add a more extensive description and cross references to the PRs tomorrow.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

common Related to common crate physical-expr Changes to the physical-expr crates physical-plan Changes to the physical-plan crate

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants