Minor: Clarify Boolean `Interval` handling and verify it with a test #7885

alamb · 2023-10-20T15:19:15Z

Which issue does this PR close?

Related to #7883

Rationale for this change

I was very confused about what was happening and how open/closed boolean intervals were handled while working on #7883

What changes are included in this PR?

Update documentation with some additional rationale
Add a test that shows how Boolean intervals are handled with the different combinations of open/closed intervals)

Are these changes tested?

Yes

Are there any user-facing changes?

No functional change is intended

alamb · 2023-10-20T15:20:47Z

datafusion/physical-expr/src/intervals/interval_aritmetic.rs

+            // Not: closed/closed is the same as lower/upper
+        }
+
+        let cases = vec![


I was really confused about what the expected Intervals were, so I added test cases. it would be great if someone could double check these

I think they look good according to the rules added above. But unbounded rules seem not tested?

Also an open false upper bound and an open true lower bound are not in the rules. From the test cases, looks like they are not mapped?

But unbounded rules seem not tested?

This is a good point, I will add tests for them

alamb · 2023-10-20T15:21:36Z

datafusion/physical-expr/src/intervals/interval_aritmetic.rs

@@ -235,18 +247,16 @@ impl Display for Interval {
 impl Interval {
    /// Creates a new interval object using the given bounds.
    ///
-    /// # Boolean intervals need special handling
+    /// As explained in [`Interval]` boolean `Interval`s are special and this


Given that there are only three valid boolean intervals, and Interval::new() normalizes any provided into into one of those, it might make sense to make the fields of Interval non pub so that it is not possible to construct an invalid Interval

What do you think @metesynnada / @berkaysynnada / @ozankabak ?

In fact, that's exactly what I'm working on. I have planned to set the fields to private and create intervals with only try_new. If an interval can be created with the given parameters, it creates that interval. If the parameters cannot construct a valid interval (like lower: [true - upper: false] for a boolean interval), it returns an error.

alamb · 2023-10-20T15:22:20Z

datafusion/physical-expr/src/intervals/interval_aritmetic.rs

+///
+/// Given there are only two  boolean values, and they are ordered such that
+/// `false` is less than `true`, there are only three possible valid intervals
+/// for a boolean `[false, false]`, `[false, true]` or `[true, true]`, all with


The core of my confusion is that boolean Intervals NEVER have open bounds -- and if you try to make one with open bounds, Interval::new will remap them

datafusion/physical-expr/src/intervals/interval_aritmetic.rs

viirya · 2023-10-20T17:11:25Z

datafusion/physical-expr/src/intervals/interval_aritmetic.rs

+            TestCase {
+                lower: false,
+                upper: false,
+                expected_open_open: (true, false), // whole range


An open false lower bound is mapped according to rule 1.

An open false upper bound is not mapped, right?

I believe this is what the tests show, and I did not change the behavior. However, I don't know if this is correct or not. Perhaps @berkaysynnada / @metesynnada have some insight

The second output (expected_open_closed) of this TestCase is (false, false] is the same as [true, false] which is an invalid interval. As I mentioned there, I think these cases should give an error. Maybe we can wait until I propose the PR which I am working on and planning to submit in a few days. It will explain these initialization issues neatly.

viirya · 2023-10-20T17:13:20Z

datafusion/physical-expr/src/intervals/interval_aritmetic.rs

+            TestCase {
+                lower: true,
+                upper: true,
+                expected_open_open: (true, false), // whole range


An open true upper bound is mapped according to rule 2.

An open true lower bound is not mapped, right?

In fact when I look at this test now it doesn't make sense as (true, false) isn't a valid Interval according to my understanding -- the interval should be (false, true) to represent the entire range 🤔

A lower bound cannot be a true - open, and vice versa an upper bound cannot be a false - open. A boolean interval representing the entire range only can be [false, true].

Thank you @berkaysynnada -- this was my understanding as well. So given that this test passes with the current implementation, that suggestes to me it is a bugwhat do you suggest we do?

I wait for your refactoring PR

Fix the code to not create invalid intervals?

Something else?

I will be clarifying these issues and introducing a solid API in my PR. Is it possible to keep you waiting for a while, because they will all cause conflict for me.

I'll handle any conflicts after your new PR lands -- no worries!

Co-authored-by: Liang-Chi Hsieh <viirya@gmail.com>

alamb · 2023-10-20T20:45:11Z

datafusion/physical-expr/src/intervals/interval_aritmetic.rs

@@ -1079,6 +1094,116 @@ mod tests {
        Interval::make(lower, upper, (false, false))
    }

+    #[test]
+    fn boolean_interval_test() -> Result<()> {


Note I have subsequently found https://github.com/apache/arrow-datafusion/blob/1f4acbb4f0b7fde10b12684c7270069b86c386dc/datafusion/physical-expr/src/intervals/interval_aritmetic.rs#L1721-L1762 which appears to test the same things, but maybe not exhaustively 🤔

alamb · 2023-10-23T13:19:17Z

Marking as a draft until @berkaysynnada 's refactor PR has landed

alamb · 2023-11-20T20:52:51Z

Superseded by #8276

Minor: Clarify and write some more tests for boolean interval handling

7fc223a

alamb changed the title ~~Minor: Clarify and write some more tests for boolean interval handling~~ Minor: Clarify Boolean Interval handling and verify it with a test Oct 20, 2023

More comments

90e525f

alamb commented Oct 20, 2023

View reviewed changes

github-actions bot added the physical-expr Physical Expressions label Oct 20, 2023

more better

285990d

alamb mentioned this pull request Oct 20, 2023

Support Interval analysis for OR expressions #7884

Closed

fmt

4e2efad

This was referenced Oct 20, 2023

Minor: Fix bug in AND interval analysis tests (not code), and add more coverage #7886

Closed

[EPIC] A collection of Interval arithmetic (not Intervals of time) improvements #7882

Open

viirya reviewed Oct 20, 2023

View reviewed changes

datafusion/physical-expr/src/intervals/interval_aritmetic.rs Outdated Show resolved Hide resolved

viirya reviewed Oct 20, 2023

View reviewed changes

datafusion/physical-expr/src/intervals/interval_aritmetic.rs Outdated Show resolved Hide resolved

viirya reviewed Oct 20, 2023

View reviewed changes

datafusion/physical-expr/src/intervals/interval_aritmetic.rs Outdated Show resolved Hide resolved

viirya reviewed Oct 20, 2023

View reviewed changes

datafusion/physical-expr/src/intervals/interval_aritmetic.rs Outdated Show resolved Hide resolved

viirya reviewed Oct 20, 2023

View reviewed changes

alamb and others added 2 commits October 20, 2023 16:21

Apply suggestions from code review

f3c3094

Co-authored-by: Liang-Chi Hsieh <viirya@gmail.com>

Apply suggestions from code review

1f4acbb

Co-authored-by: Liang-Chi Hsieh <viirya@gmail.com>

alamb commented Oct 20, 2023

View reviewed changes

alamb marked this pull request as draft October 23, 2023 13:19

alamb closed this Nov 20, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Minor: Clarify Boolean `Interval` handling and verify it with a test #7885

Minor: Clarify Boolean `Interval` handling and verify it with a test #7885

alamb commented Oct 20, 2023 •

edited

Loading

alamb Oct 20, 2023

viirya Oct 20, 2023

alamb Oct 20, 2023

alamb Oct 20, 2023 •

edited

Loading

berkaysynnada Oct 21, 2023

alamb Oct 20, 2023

viirya Oct 20, 2023

alamb Oct 20, 2023

berkaysynnada Oct 21, 2023 •

edited

Loading

viirya Oct 20, 2023

alamb Oct 20, 2023

berkaysynnada Oct 21, 2023

alamb Oct 21, 2023 •

edited

Loading

berkaysynnada Oct 23, 2023

alamb Oct 23, 2023

alamb Oct 20, 2023

alamb commented Oct 23, 2023

alamb commented Nov 20, 2023

Minor: Clarify Boolean Interval handling and verify it with a test #7885

Minor: Clarify Boolean Interval handling and verify it with a test #7885

Conversation

alamb commented Oct 20, 2023 • edited Loading

Which issue does this PR close?

Rationale for this change

What changes are included in this PR?

Are these changes tested?

Are there any user-facing changes?

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

alamb Oct 20, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

berkaysynnada Oct 21, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

alamb Oct 21, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

alamb commented Oct 23, 2023

alamb commented Nov 20, 2023

Minor: Clarify Boolean `Interval` handling and verify it with a test #7885

Minor: Clarify Boolean `Interval` handling and verify it with a test #7885

alamb commented Oct 20, 2023 •

edited

Loading

alamb Oct 20, 2023 •

edited

Loading

berkaysynnada Oct 21, 2023 •

edited

Loading

alamb Oct 21, 2023 •

edited

Loading