-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Minor: Clarify Boolean Interval
handling and verify it with a test
#7885
Changes from all commits
7fc223a
90e525f
285990d
4e2efad
f3c3094
1f4acbb
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -194,6 +194,23 @@ impl Display for IntervalBound { | |
/// called *unbounded* endpoint and represented using a `NULL` and written using | ||
/// `∞`. | ||
/// | ||
/// # Boolean Handling | ||
/// | ||
/// Boolean values require special handling. Boolean Intervals NEVER have open | ||
/// bounds. If you try to and construct one with open bounds, | ||
/// [`Interval::new`] will remap them to one of the three valid values. | ||
/// | ||
/// Given there are only two boolean values, and they are ordered such that | ||
/// `false` is less than `true`, there are only three possible valid intervals | ||
/// for a boolean: | ||
/// | ||
/// 1. `[false, false]`: definitely `false` | ||
/// 2. `[false, true]`: either `false` or `true` | ||
/// 3. `[true, true]`: definitely `true` | ||
/// | ||
/// [`Interval::new`] takes this into account and adjusts any arguments as | ||
/// needed. | ||
/// | ||
/// # Examples | ||
/// | ||
/// A `Int64` `Interval` of `[10, 20)` represents the values `10, 11, ... 18, | ||
|
@@ -235,18 +252,16 @@ impl Display for Interval { | |
impl Interval { | ||
/// Creates a new interval object using the given bounds. | ||
/// | ||
/// # Boolean intervals need special handling | ||
/// As explained in [`Interval]` boolean `Interval`s are special and this | ||
/// function ensures they are one of the three valid values: | ||
/// | ||
/// For boolean intervals, having an open false lower bound is equivalent to | ||
/// having a true closed lower bound. Similarly, open true upper bound is | ||
/// equivalent to having a false closed upper bound. Also for boolean | ||
/// intervals, having an unbounded left endpoint is equivalent to having a | ||
/// false closed lower bound, while having an unbounded right endpoint is | ||
/// equivalent to having a true closed upper bound. Therefore; input | ||
/// parameters to construct an Interval can have different types, but they | ||
/// all result in `[false, false]`, `[false, true]` or `[true, true]`. | ||
/// 1. An open `false` lower bound is mapped to a `true` closed lower bound. | ||
/// 2. An open `true` upper bound is mapped to `false` closed | ||
/// upper bound. | ||
/// 3. An unbounded lower endpoints is mapped to a `false` closed lower | ||
/// bound | ||
/// 4. An unbounded upper endpoint is mapped to a `true` closed upper bound. | ||
pub fn new(lower: IntervalBound, upper: IntervalBound) -> Interval { | ||
// Boolean intervals need a special handling. | ||
if let ScalarValue::Boolean(_) = lower.value { | ||
let standardized_lower = match lower.value { | ||
ScalarValue::Boolean(None) if lower.open => { | ||
|
@@ -1079,6 +1094,116 @@ mod tests { | |
Interval::make(lower, upper, (false, false)) | ||
} | ||
|
||
#[test] | ||
fn boolean_interval_test() -> Result<()> { | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Note I have subsequently found https://github.com/apache/arrow-datafusion/blob/1f4acbb4f0b7fde10b12684c7270069b86c386dc/datafusion/physical-expr/src/intervals/interval_aritmetic.rs#L1721-L1762 which appears to test the same things, but maybe not exhaustively 🤔 |
||
// Demonstrate / document how boolean Intervals work | ||
#[derive(Debug)] | ||
struct TestCase { | ||
lower: bool, | ||
upper: bool, | ||
/// expected bounds when lower is open, upper is open | ||
expected_open_open: (bool, bool), | ||
/// expected bounds when lower is open, upper is closed | ||
expected_open_closed: (bool, bool), | ||
/// expected bounds when lower is closed, upper is open | ||
expected_closed_open: (bool, bool), | ||
// Not: closed/closed is the same as lower/upper | ||
} | ||
|
||
let cases = vec![ | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I was really confused about what the expected Intervals were, so I added test cases. it would be great if someone could double check these There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I think they look good according to the rules added above. But unbounded rules seem not tested? Also an open There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
This is a good point, I will add tests for them |
||
TestCase { | ||
lower: false, | ||
upper: false, | ||
expected_open_open: (true, false), // whole range | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. An open An open There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I believe this is what the tests show, and I did not change the behavior. However, I don't know if this is correct or not. Perhaps @berkaysynnada / @metesynnada have some insight There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. The second output (expected_open_closed) of this TestCase is (false, false] is the same as [true, false] which is an invalid interval. As I mentioned there, I think these cases should give an error. Maybe we can wait until I propose the PR which I am working on and planning to submit in a few days. It will explain these initialization issues neatly. |
||
expected_open_closed: (true, false), | ||
expected_closed_open: (false, false), | ||
}, | ||
TestCase { | ||
lower: false, | ||
upper: true, | ||
expected_open_open: (true, false), // whole range | ||
expected_open_closed: (true, true), | ||
expected_closed_open: (false, false), | ||
}, | ||
TestCase { | ||
lower: true, | ||
upper: false, | ||
expected_open_open: (true, false), // whole range | ||
expected_open_closed: (true, false), | ||
expected_closed_open: (true, false), | ||
}, | ||
TestCase { | ||
lower: true, | ||
upper: true, | ||
expected_open_open: (true, false), // whole range | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. An open An open There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. In fact when I look at this test now it doesn't make sense as There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. A lower bound cannot be a true - open, and vice versa an upper bound cannot be a false - open. A boolean interval representing the entire range only can be [false, true]. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Thank you @berkaysynnada -- this was my understanding as well. So given that this test passes with the current implementation, that suggestes to me it is a bugwhat do you suggest we do?
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I will be clarifying these issues and introducing a solid API in my PR. Is it possible to keep you waiting for a while, because they will all cause conflict for me. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I'll handle any conflicts after your new PR lands -- no worries! |
||
expected_open_closed: (true, true), | ||
expected_closed_open: (true, false), | ||
}, | ||
]; | ||
|
||
// Asserts that the given interval has the expected (closed) bounds | ||
fn assert_bool_interval( | ||
name: &str, | ||
interval: Interval, | ||
expected_lower: bool, | ||
expected_upper: bool, | ||
) { | ||
let open = false; | ||
let expected_interval = Interval { | ||
// boolean bounds are always closed | ||
lower: IntervalBound { | ||
value: ScalarValue::Boolean(Some(expected_lower)), | ||
open, | ||
}, | ||
upper: IntervalBound { | ||
value: ScalarValue::Boolean(Some(expected_upper)), | ||
open, | ||
}, | ||
}; | ||
assert_eq!( | ||
interval, expected_interval, | ||
"{name} failure\n\nActual: {interval:#?}\n\nExpected: {expected_interval:#?}" | ||
); | ||
} | ||
|
||
for case in cases { | ||
let TestCase { | ||
lower, | ||
upper, | ||
expected_open_open, | ||
expected_open_closed, | ||
expected_closed_open, | ||
} = case; | ||
|
||
assert_bool_interval( | ||
"open_open", | ||
open_open(Some(lower), Some(upper)), | ||
expected_open_open.0, | ||
expected_open_open.1, | ||
); | ||
assert_bool_interval( | ||
"open_closed", | ||
open_closed(Some(lower), Some(upper)), | ||
expected_open_closed.0, | ||
expected_open_closed.1, | ||
); | ||
assert_bool_interval( | ||
"closed_open", | ||
closed_open(Some(lower), Some(upper)), | ||
expected_closed_open.0, | ||
expected_closed_open.1, | ||
); | ||
assert_bool_interval( | ||
"closed_closed", | ||
closed_closed(Some(lower), Some(upper)), | ||
lower, | ||
upper, | ||
); | ||
} | ||
|
||
Ok(()) | ||
} | ||
|
||
#[test] | ||
fn intersect_test() -> Result<()> { | ||
let possible_cases = vec![ | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Given that there are only three valid boolean intervals, and
Interval::new()
normalizes any provided into into one of those, it might make sense to make the fields ofInterval
nonpub
so that it is not possible to construct an invalid IntervalWhat do you think @metesynnada / @berkaysynnada / @ozankabak ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In fact, that's exactly what I'm working on. I have planned to set the fields to private and create intervals with only
try_new
. If an interval can be created with the given parameters, it creates that interval. If the parameters cannot construct a valid interval (like lower:[true
- upper:false]
for a boolean interval), it returns an error.