-
Notifications
You must be signed in to change notification settings - Fork 468
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix underflow/overflow during binary encoding/decoding of Intervals #10589
Fix underflow/overflow during binary encoding/decoding of Intervals #10589
Conversation
In Materialize Intervals internally store the number of months in the interval and the number of nanoseconds. The pg_wire binary format for Intervals is 8 bytes for the number of microseconds, followed 4 bytes for the number of days, followed by 4 bytes for the number of months. When converting an Interval to the pg_wire binary encoding, Materialize extracts as many whole days from the nanoseconds as will fit in 4 bytes (an i32 in Rust). During this it checks if the number of days is larger than i32::MAX protecting from overflow. However, it never checked if the number of days is less than i32::MIN making is susceptible to underflow. This commit fixes that issue.
src/pgrepr/src/value/interval.rs
Outdated
@@ -46,7 +46,10 @@ impl ToSql for Interval { | |||
// Our intervals are guaranteed to fit within SQL's min/max intervals, | |||
// so this is compression is guaranteed to be lossless. For details, see | |||
// `repr::scalar::datetime::compute_interval`. | |||
let days = std::cmp::min(self.0.days() as i128, i32::MAX as i128); | |||
let days = std::cmp::max( | |||
std::cmp::min(self.0.days() as i128, i32::MAX as i128), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here's an alternate take: we shouldn't have been doing min in the first place because it's changing the user's data, and should instead be returning an error. I think this current PR is strictly better than what we have now, but I don't think the underlying problem is fixed, just improved. @sploiselle do you have thoughts about this proposal, or more context about the use of min?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, yeah, this prior behavior was bad! Our min/max intervals are way bigger than that:
https://materialize.com/docs/sql/types/interval/
Seems like this just wants to error on overflow. Does PG limit you to i32::MIN/MAX days?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For reference i32::max
is 2147483647
So, assuming that we are currently considering 24 hours equal to 1 day, I don't think it's necessarily changing the users data. Any amount of days that cannot fit into an i32
is put into the the microsecond field which is an i64
. So 2147483648 days
won't get truncated into 2147483647 days
, it will be converted into 2147483647 days and 24 hrs
.
The biggest Interval that we (and Postgres) will accept is 2147483647 days 2147483647 hours 59 minutes 59.999999 seconds
. Anything bigger should result in an out of range error. 2147483647 hours 59 minutes 59.999999 seconds
is slightly less than 7730941132800000000 microseconds.
So that means that we will never have to store more than 2147483647 in the day bytes and we will never have to store more than 7730941132800000000 in the microsecond bytes. Since i64::max
is 9223372036854775807, I believe that means that we won't ever lose any precision or change the user's data here. We may just convert days to microseconds or vice versa. Assuming that all my math here is correct.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh I didn't see your comment @sploiselle or the docs online. Postgres does limit you to i32::MIN/MAX days:
postgres=# SELECT INTERVAL '2147483648 days';
ERROR: interval field value out of range: "2147483648 days"
LINE 1: SELECT INTERVAL '2147483648 days';
^
postgres=# SELECT INTERVAL '-2147483649 days';
ERROR: interval field value out of range: "-2147483649 days"
LINE 1: SELECT INTERVAL '-2147483649 days';
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually looks like I was mistaken about what we accept. We diverge from Postgres because we do not limit to i32::MIN/MAX days.
materialize=> SELECT INTERVAL '2147483648 days';
interval
-----------------
2147483648 days
(1 row)
materialize=> SELECT INTERVAL '-2147483649 days';
interval
------------------
-2147483649 days
(1 row)
So it might be possible that the microseconds eventually overflow, I have to think about what the largest Inteval we accept would get converted into.
Is there a reason that we accept Intervals outside of the range that Postgres does?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, I'll include it in this PR since it's not a large change.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's just that the number is wrong. 193_273_528_233_599_999_999_000 nanoseconds does not equal 2147483647 hours 59 minutes 59.999999 seconds
But it does equal 2147483647 days 2147483647 hours 59 minutes 59.999999 seconds.
Anyhow, sorry folks––I wrote this code a long time ago and had forgotten my thinking/motivation and it definitely should have gotten more inline comments to detail the behavior.
Part of this is the "fallout" from our underlying representation differing from PG, which treats days as separate from H:M:S, whereas we bundle them together:
In PG:
sean=# SELECT interval '1 day 2147483647 hours';
interval
------------------------
1 day 2147483647:00:00
(1 row)
sean=# SELECT interval '1 day 2147483648 hours';
ERROR: interval field value out of range: "1 day 2147483648 hours"
LINE 1: SELECT interval '1 day 2147483648 hours';
This limitation/behavior seemed pretty arbitrary to me, so I let users express Postgres' rolled-up maximum (still obeying the month-date boundary), which turns out to be 2236962132 days 07:59:59.999999. This code does then takes the maximum number of days possible (i.e. saturating the i32
), and then rolls over any remaining day-like time units into hours.
Given that this is working as intended, I think @jkosh44 's code is right and I was wrong to say it was wrong.
This all being said, I am a little too tired tonight to reason through any issues with the values' signed-ness, but that's worth thinking through because I obviously failed to in my initial implementation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see what you're saying, you're right I didn't account for the days portion in the current bounds check. I think then that the current bounds check is mostly correct. Only issue is that the min is off by 1 day and 1 hour.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I still don't understand the need to have min and max here especially with a comment that says "this is compression is guaranteed to be lossless".
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So the days
variable is computed using the the self.0.duration
field. I've inlined the days()
method below.
pub fn days(&self) -> i64 {
self.dur_as_secs() / (60 * 60 * 24)
}
pub fn dur_as_secs(&self) -> i64 {
(self.duration / 1_000_000_000) as i64
}
If you take a look at line 53, The ns
variable is also being initialized with the self.0.duration
field minus the days
variable. So that means that any amount of days that didn't fit into the days
variable (32 bits), will be included in the ns
variable (64 bits) and still be lossless.
Due to the bounds of the min/max allowable Interval, we know that self.0.duration
will never be big/small enough to over/under-flow both days
and ns
.
…erval-binary-underflow � Conflicts: � test/pgtest/binary.pt
ed22698
to
45074a9
Compare
I created #10598 to fix the bounds issue. I decided to make a separate PR because I thought it was related but a separate issue and wanted to keep the commit messages isolated. If either of you think I should combine the PRs then I'm happy to do that as well. |
@sploiselle @mjibson Hold off on continuing to review this for now. I've noticed a couple other things in testing that make me want to give this a second look. |
…erval-binary-underflow
@sploiselle @mjibson Ok, it's ready for review again. I added stuff to fix an overflow in the binary encoding as well. |
@@ -55,10 +55,10 @@ impl Interval { | |||
/// Constructs a new `Interval` with the specified units of time. | |||
/// | |||
/// `nanos` in excess of `999_999_999` are carried over into seconds. | |||
pub fn new(months: i32, seconds: i64, nanos: i64) -> Result<Interval, anyhow::Error> { | |||
pub fn new(months: i32, seconds: i64, nanos: i128) -> Result<Interval, anyhow::Error> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We would prefer to not use rust's as
operator because it truncates when downcasting and is a no-op for same size integers leading to unexpected results (see "numeric cast" in https://doc.rust-lang.org/reference/expressions/operator-expr.html#type-cast-expressions). Although it doesn't seem to be happening here, we think it's better to write things in a way that that doesn't happen. For example, if one of these were a u128 -> i128, using as
would obscure the fact that the result could end up negative. I think there are two solutions:
- Call
x.into()
instead ofx as i128
. Into is guaranteed to be lossless. - Change the signature of new to be
new<I: Into<i128>>(months: i32, seconds: i64, nanos: I)
then usenanos.into()
. This is the same as 1 but without pushing the requirement to cast on the caller.
I'm not sure which one is better (I'd probably just use 1), but they are both better than using as
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just removed as
from as many places as possible. There were two places were I needed to use as
because it was converting from an f64
to i128
, but there are checks right before the conversion. In a couple of places I had to use from
just because there was some arithmetic happening in the function call.
src/pgrepr/src/value/interval.rs
Outdated
@@ -46,7 +46,10 @@ impl ToSql for Interval { | |||
// Our intervals are guaranteed to fit within SQL's min/max intervals, | |||
// so this is compression is guaranteed to be lossless. For details, see | |||
// `repr::scalar::datetime::compute_interval`. | |||
let days = std::cmp::min(self.0.days() as i128, i32::MAX as i128); | |||
let days = std::cmp::max( | |||
std::cmp::min(self.0.days() as i128, i32::MAX as i128), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I still don't understand the need to have min and max here especially with a comment that says "this is compression is guaranteed to be lossless".
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM modulo changing the as
conversions to either from
or try_from.expect
; realize you are not the original perpetrator of many of these (I think I was), but we were previously less strict about this and is something we'd like to entirely remove from the codebase. Feel free to submit a follow-on PR to address this, if you like, or lmk and I'm glad to do it.
src/pgrepr/src/value/interval.rs
Outdated
@@ -46,7 +46,10 @@ impl ToSql for Interval { | |||
// Our intervals are guaranteed to fit within SQL's min/max intervals, | |||
// so this is compression is guaranteed to be lossless. For details, see | |||
// `repr::scalar::datetime::compute_interval`. | |||
let days = std::cmp::min(self.0.days() as i128, i32::MAX as i128); | |||
let days = std::cmp::max( | |||
std::cmp::min(self.0.days() as i128, i32::MAX as i128), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, in the case of self.0.days() as i128
, you should use try_from().expect("[I know this works because...]"
.
For i32::MIN as i128
and the like, you should use i128::from(i32::MIN)
.
I would still like an explanation about why we use min/max if we also say "this is compression is guaranteed to be lossless"! |
@mjibson Compression is probably the wrong word to use here, or at least it's confusing. Maybe a way to think of it is that we store |
Ok, thanks. |
In Materialize Intervals internally store the number of months in the interval and the number of nanoseconds. The pg_wire binary format for Intervals is 8 bytes for the number of microseconds, followed 4 bytes for the number of days, followed by 4 bytes for the number of months. When converting an Interval to the pg_wire binary encoding, Materialize extracts as many whole days from the nanoseconds as will fit in 4 bytes (an i32 in Rust). During this it checks if the number of days is larger than i32::MAX protecting from overflow. However, it never checked if the number of days is less than i32::MIN making is susceptible to underflow. This commit fixes that issue. Additionally when converting from the pg_wire binary format to our internal Interval representation, we convert the microseconds (which is stored in an i64) to nanoseconds and then convert the result to an i128. If the microseconds are large enough, the conversion to nanoseconds can overflow the i64. This commit first converts the microseconds to an i128 before converting to nanoseconds, to avoid an overflow.
In Materialize Intervals internally store the number of months in the
interval and the number of nanoseconds. The pg_wire binary format for
Intervals is 8 bytes for the number of microseconds, followed 4 bytes
for the number of days, followed by 4 bytes for the number of months.
When converting an Interval to the pg_wire binary encoding, Materialize
extracts as many whole days from the nanoseconds as will fit in 4 bytes
(an i32 in Rust). During this it checks if the number of days is larger
than i32::MAX protecting from overflow. However, it never checked if
the number of days is less than i32::MIN making is susceptible to
underflow. This commit fixes that issue.
Motivation
This PR fixes a previously unreported bug. As described above, the pg_wire binary encoding for Interval types never checked for underflow causing the binary encoding to lose precision/correctness of the underlying Interval.
The easiest way to see this is with the query
SELECT INTERVAL '-2147483648 days -48 hrs';
(Note thati32::MIN
is -2147483648). Materialize tried to convert this to -2147483650 days and encode this in ani32
which results in underflow. The resulting binary encoding is [0, 0, 0, 0, 0, 0, 0, 0, 127, 255, 255, 254, 0, 0, 0, 0]. As described above the pg_wire binary format for Intervals is [Microsecond (8 bytes), Days (4 bytes), Months (4 bytes)]. So the encoding that materialize comes up with is0 Microseconds, 0111 1111 1111 1111 1111 1111 1111 1110 days, and 0 months. 0111 1111 1111 1111 1111 1111 1111 1110 is equal to 2147483646 in decimal. So what's happening is that the -2147483650 days ends up underflowing into a positive amount of days.
Checklist