-
Notifications
You must be signed in to change notification settings - Fork 1.8k
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
[Spark] Skip non-deterministic filters in Data Skipping to prevent in…
…correct file pruning in Delta queries (#4141) #### Which Delta project/connector is this regarding? - [x] Spark - [ ] Standalone - [ ] Flink - [ ] Kernel - [ ] Other (fill in here) ## Description This is a follow-up to the previous attempt to handle double-filtering of non-deterministic conditions (e.g. rand() < 0.25) in #4095. It prevented non-deterministic filters from appearing in `unusedFilters` in the `ScanReport` unless we added special pipelining for them from `PrepareDeltaScan` to `filesForScan`. This is also inconsistent with how we skip `subqueryFilters`. We now treat `filesForScan` as the narrow waist to skip any filters. ## How was this patch tested? UTs ## Does this PR introduce _any_ user-facing changes? No
- Loading branch information
Showing
3 changed files
with
80 additions
and
14 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters