Skip to content

Dynamic filter no-op for NULL-only build keys (NullEqualsNothing) #17206

@kosiew

Description

@kosiew

Summary

When the build side of a hash join contains only NULL join keys and the join is configured with
datafusion_common::NullEquality::NullEqualsNothing, the dynamic filter generated for the probe side
is produced as range comparisons against NULL (e.g. a >= NULL AND a <= NULL). In the current
optimizer/test code this is treated as a no-op (a tautology) instead of being treated as either
unsatisfiable or as an explicit "no matches" condition.

Why this matters

  • This is a surprising corner case and may hide regressions if semantics or simplification rules change.
  • If NullEquality semantics change (or the filter simplifier is updated), behavior and test expectations
    could silently diverge.
  • We should monitor and decide whether this should be:
    • left as a tautology (no-op),
    • treated as unsatisfiable (prune everything),
    • or canonicalized/annotated to make intent explicit.

Where to look / repro

Test: datafusion/core/tests/physical_optimizer/filter_pushdown/mod.rs
Specifically the test test_hashjoin_dynamic_filter_pushdown_null_keys in #17090

Reproduction steps (test harness)

  1. Run that single async test (or the optimizer tests):
    • cargo test --test --filter test_hashjoin_dynamic_filter_pushdown_null_keys
  2. Observe plan printed by format_plan_for_test(&plan) contains:
    DynamicFilterPhysicalExpr [ a@0 >= NULL AND a@0 <= NULL AND b@1 >= NULL AND b@1 <= NULL ]

Observed behavior

The optimizer generates a dynamic filter with min/max bounds set to NULL; the code interprets this as
a tautology (no filtering). The current test documents this and asserts the presence of the >= NULL / <= NULL pattern.

Expected / options

We need a decision on intended semantics. Options:

  • Keep current behavior (tautology/no-op). Document it clearly in code/tests.
  • Treat NULL-only build-side as unsatisfiable filter (drop all probe rows).
  • Change filter generation to avoid producing >= NULL / <= NULL and instead produce an explicit marker (e.g., unsatisfiable) so optimizer simplification can handle it deterministically.

References

File: datafusion/core/tests/physical_optimizer/filter_pushdown/mod.rs
Test: test_hashjoin_dynamic_filter_pushdown_null_keys

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions