Skip to content

Conversation

adriangb
Copy link
Contributor

@adriangb adriangb commented Jul 1, 2025

@github-actions github-actions bot added the datasource Changes to the datasource crate label Jul 1, 2025
@adriangb
Copy link
Contributor Author

adriangb commented Jul 1, 2025

I'm trying to understand why this isn't caught by datafusion/sqllogictest/test_files/parquet_filter_pushdown.slt. Might have to wait until tomorrow.

@github-actions github-actions bot added the sqllogictest SQL Logic Tests (.slt) label Jul 1, 2025
@adriangb
Copy link
Contributor Author

adriangb commented Jul 1, 2025

I can confirm this new test fails without the change:

❯ cargo test -p datafusion-sqllogictest --test sqllogictests -- parquet_filter_pushdown
   Compiling datafusion-datasource-parquet v48.0.0 (/Users/adriangb/GitHub/datafusion-clone/datafusion/datasource-parquet)
   Compiling datafusion v48.0.0 (/Users/adriangb/GitHub/datafusion-clone/datafusion/core)
   Compiling datafusion-substrait v48.0.0 (/Users/adriangb/GitHub/datafusion-clone/datafusion/substrait)
   Compiling datafusion-sqllogictest v48.0.0 (/Users/adriangb/GitHub/datafusion-clone/datafusion/sqllogictest)
    Finished `test` profile [unoptimized + debuginfo] target(s) in 4.69s
     Running bin/sqllogictests.rs (target/debug/deps/sqllogictests-5b94bf3a17202ad3)
Completed 1 test files in 0 seconds                                                                                                                                                                                                                                      External error: 1 errors in file /Users/adriangb/GitHub/datafusion-clone/datafusion/sqllogictest/test_files/parquet_filter_pushdown.slt

1. query result mismatch:
[SQL] select a from t where b > 2 ORDER BY a;
[Diff] (-expected|+actual)
+   bar
    baz
    foo
+   foo
    NULL
    NULL
    NULL
at /Users/adriangb/GitHub/datafusion-clone/datafusion/sqllogictest/test_files/parquet_filter_pushdown.slt:120



Error: Execution("1 failures")
error: test failed, to rerun pass `-p datafusion-sqllogictest --test sqllogictests`

Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you @adriangb 🙏

I don't understand how this fixes wrong results. I am out of time today but will check on this first thing tomorrow

/// Defaults to false
///
/// [`Expr`]: datafusion_expr::Expr
/// Defaults to false.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How did this result in wrong results?

Is it because the planner assumed filters would be pushed down but they weren't?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well this bit is just a comment 😛

But this produced wrong results because we'd return Supported(filter) up to the parents -> FilterExec removed from the plan but ParquetOpener.pushdown_filters = false -> pruning did not actually happen during the scan.

None => conjunction(allowed_filters.iter().cloned()),
};
source.predicate = Some(predicate);
source = source.with_pushdown_filters(pushdown_filters);
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@alamb this is the relevant LOC that fixes things

Copy link
Contributor

@ianthetechie ianthetechie left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can confirm as well that it works when pointing at this branch using my original code. Thanks!

// If both are disabled, we will not push down the filters.
// By default they are both disabled.
// Regardless of pushdown, we will update the predicate to include the filters
// because even if scan pushdown is disabled we can still use the filters for stats pruning.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had wondered about this; thanks for updating this comment!

Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you @adriangb and @ianthetechie

With his and several other issues, I think we should make a 48.0.1 release:

I will begin work on that later today

@alamb alamb changed the title respect parquet filter pushdown config in scan Fix parquet filter_pushdown: respect parquet filter pushdown config in scan Jul 2, 2025
@alamb alamb mentioned this pull request Jul 2, 2025
@alamb alamb merged commit f03a8fd into apache:main Jul 2, 2025
29 of 30 checks passed
@alamb
Copy link
Contributor

alamb commented Jul 2, 2025

Thank you @adriangb -- I plan to backport this to the branch-48and include it in a patch release

alamb pushed a commit to alamb/datafusion that referenced this pull request Jul 2, 2025
…n scan (apache#16646)

* respect parquet filter pushdown config in scan

* Add test
@alamb
Copy link
Contributor

alamb commented Jul 2, 2025

alamb added a commit that referenced this pull request Jul 3, 2025
…n scan (#16646) (#16656)

* respect parquet filter pushdown config in scan

* Add test

Co-authored-by: Adrian Garcia Badaracco <1755071+adriangb@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
datasource Changes to the datasource crate sqllogictest SQL Logic Tests (.slt)
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Wrong results when pushdown_filters is enabled (starting in 48.0.0)
3 participants