-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Minor: Clarify Boolean Interval
handling and verify it with a test
#7885
Conversation
Interval
handling and verify it with a test
// Not: closed/closed is the same as lower/upper | ||
} | ||
|
||
let cases = vec![ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was really confused about what the expected Intervals were, so I added test cases. it would be great if someone could double check these
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think they look good according to the rules added above. But unbounded rules seem not tested?
Also an open false
upper bound and an open true
lower bound are not in the rules. From the test cases, looks like they are not mapped?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But unbounded rules seem not tested?
This is a good point, I will add tests for them
@@ -235,18 +247,16 @@ impl Display for Interval { | |||
impl Interval { | |||
/// Creates a new interval object using the given bounds. | |||
/// | |||
/// # Boolean intervals need special handling | |||
/// As explained in [`Interval]` boolean `Interval`s are special and this |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Given that there are only three valid boolean intervals, and Interval::new()
normalizes any provided into into one of those, it might make sense to make the fields of Interval
non pub
so that it is not possible to construct an invalid Interval
What do you think @metesynnada / @berkaysynnada / @ozankabak ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In fact, that's exactly what I'm working on. I have planned to set the fields to private and create intervals with only try_new
. If an interval can be created with the given parameters, it creates that interval. If the parameters cannot construct a valid interval (like lower: [true
- upper: false]
for a boolean interval), it returns an error.
/// | ||
/// Given there are only two boolean values, and they are ordered such that | ||
/// `false` is less than `true`, there are only three possible valid intervals | ||
/// for a boolean `[false, false]`, `[false, true]` or `[true, true]`, all with |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The core of my confusion is that boolean Intervals
NEVER have open bounds -- and if you try to make one with open bounds, Interval::new
will remap them
TestCase { | ||
lower: false, | ||
upper: false, | ||
expected_open_open: (true, false), // whole range |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
An open false
lower bound is mapped according to rule 1.
An open false
upper bound is not mapped, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe this is what the tests show, and I did not change the behavior. However, I don't know if this is correct or not. Perhaps @berkaysynnada / @metesynnada have some insight
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The second output (expected_open_closed) of this TestCase is (false, false] is the same as [true, false] which is an invalid interval. As I mentioned there, I think these cases should give an error. Maybe we can wait until I propose the PR which I am working on and planning to submit in a few days. It will explain these initialization issues neatly.
TestCase { | ||
lower: true, | ||
upper: true, | ||
expected_open_open: (true, false), // whole range |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
An open true
upper bound is mapped according to rule 2.
An open true
lower bound is not mapped, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In fact when I look at this test now it doesn't make sense as (true, false)
isn't a valid Interval
according to my understanding -- the interval should be (false, true)
to represent the entire range 🤔
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A lower bound cannot be a true - open, and vice versa an upper bound cannot be a false - open. A boolean interval representing the entire range only can be [false, true].
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you @berkaysynnada -- this was my understanding as well. So given that this test passes with the current implementation, that suggestes to me it is a bugwhat do you suggest we do?
- I wait for your refactoring PR
- Fix the code to not create invalid intervals?
- Something else?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I will be clarifying these issues and introducing a solid API in my PR. Is it possible to keep you waiting for a while, because they will all cause conflict for me.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll handle any conflicts after your new PR lands -- no worries!
Co-authored-by: Liang-Chi Hsieh <viirya@gmail.com>
Co-authored-by: Liang-Chi Hsieh <viirya@gmail.com>
@@ -1079,6 +1094,116 @@ mod tests { | |||
Interval::make(lower, upper, (false, false)) | |||
} | |||
|
|||
#[test] | |||
fn boolean_interval_test() -> Result<()> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note I have subsequently found https://github.com/apache/arrow-datafusion/blob/1f4acbb4f0b7fde10b12684c7270069b86c386dc/datafusion/physical-expr/src/intervals/interval_aritmetic.rs#L1721-L1762 which appears to test the same things, but maybe not exhaustively 🤔
Marking as a draft until @berkaysynnada 's refactor PR has landed |
Superseded by #8276 |
Which issue does this PR close?
Related to #7883
Rationale for this change
I was very confused about what was happening and how open/closed boolean intervals were handled while working on #7883
What changes are included in this PR?
Are these changes tested?
Yes
Are there any user-facing changes?
No functional change is intended