-
Notifications
You must be signed in to change notification settings - Fork 1.7k
Description
Summary
When the build side of a hash join contains only NULL join keys and the join is configured with
datafusion_common::NullEquality::NullEqualsNothing, the dynamic filter generated for the probe side
is produced as range comparisons against NULL (e.g. a >= NULL AND a <= NULL). In the current
optimizer/test code this is treated as a no-op (a tautology) instead of being treated as either
unsatisfiable or as an explicit "no matches" condition.
Why this matters
- This is a surprising corner case and may hide regressions if semantics or simplification rules change.
- If NullEquality semantics change (or the filter simplifier is updated), behavior and test expectations
could silently diverge. - We should monitor and decide whether this should be:
- left as a tautology (no-op),
- treated as unsatisfiable (prune everything),
- or canonicalized/annotated to make intent explicit.
Where to look / repro
Test: datafusion/core/tests/physical_optimizer/filter_pushdown/mod.rs
Specifically the test test_hashjoin_dynamic_filter_pushdown_null_keys in #17090
Reproduction steps (test harness)
- Run that single async test (or the optimizer tests):
- cargo test --test --filter test_hashjoin_dynamic_filter_pushdown_null_keys
- Observe plan printed by
format_plan_for_test(&plan)contains:
DynamicFilterPhysicalExpr [ a@0 >= NULL AND a@0 <= NULL AND b@1 >= NULL AND b@1 <= NULL ]
Observed behavior
The optimizer generates a dynamic filter with min/max bounds set to NULL; the code interprets this as
a tautology (no filtering). The current test documents this and asserts the presence of the >= NULL / <= NULL pattern.
Expected / options
We need a decision on intended semantics. Options:
- Keep current behavior (tautology/no-op). Document it clearly in code/tests.
- Treat NULL-only build-side as unsatisfiable filter (drop all probe rows).
- Change filter generation to avoid producing
>= NULL/<= NULLand instead produce an explicit marker (e.g.,unsatisfiable) so optimizer simplification can handle it deterministically.
References
File: datafusion/core/tests/physical_optimizer/filter_pushdown/mod.rs
Test: test_hashjoin_dynamic_filter_pushdown_null_keys