[Incompatibility] Document arrays_overlap null handling differences #3364
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Summary
This PR addresses issue #3175 by documenting the specific null handling incompatibility between Spark and Comet for the
arrays_overlapfunction.Changes Made
1. Code Documentation (
arrays.scala)Updated
CometArraysOverlap.getSupportLevel()to returnIncompatible(Some(...))with detailed explanation and concrete example.Before:
After:
2. Test Coverage (
CometArrayExpressionSuite.scala)Added comprehensive test
arrays_overlap - null handling behavior verificationwith 6 test cases:truefalsenull, Comet:false(documented incompatibility)true3. User Documentation (
expressions.md)Updated the Array Expressions table with specific explanation and example showing the three-valued logic difference.
Root Cause Analysis
Spark Behavior (Three-Valued Logic)
Spark follows SQL's three-valued logic (true, false, null):
trueif common elements foundfalseif no common elements AND no nullsnullif no common elements BUT nulls exist (indeterminate)Comet Behavior
Comet uses DataFusion's
array_has_anyfunction:trueif common elements foundfalsein all other cases (no three-valued logic support)Example Demonstrating Incompatibility
nullfalseWhy This Matters
Users who enable
arrays_overlapwithspark.comet.expression.ArraysOverlap.allowIncompatible=trueneed to understand:Testing Notes
Local test execution encountered environment Java version compatibility issues (unrelated to code changes). Test code is syntactically correct and follows existing patterns. CI environment should run tests successfully with proper Java configuration.
Files Modified
Closes
Closes #3175
Checklist