Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix underflow/overflow during binary encoding/decoding of Intervals #10589

Merged
merged 8 commits into from
Feb 14, 2022

Conversation

jkosh44
Copy link
Contributor

@jkosh44 jkosh44 commented Feb 10, 2022

In Materialize Intervals internally store the number of months in the
interval and the number of nanoseconds. The pg_wire binary format for
Intervals is 8 bytes for the number of microseconds, followed 4 bytes
for the number of days, followed by 4 bytes for the number of months.
When converting an Interval to the pg_wire binary encoding, Materialize
extracts as many whole days from the nanoseconds as will fit in 4 bytes
(an i32 in Rust). During this it checks if the number of days is larger
than i32::MAX protecting from overflow. However, it never checked if
the number of days is less than i32::MIN making is susceptible to
underflow. This commit fixes that issue.

Motivation

This PR fixes a previously unreported bug. As described above, the pg_wire binary encoding for Interval types never checked for underflow causing the binary encoding to lose precision/correctness of the underlying Interval.

The easiest way to see this is with the query SELECT INTERVAL '-2147483648 days -48 hrs'; (Note that i32::MIN is -2147483648). Materialize tried to convert this to -2147483650 days and encode this in an i32 which results in underflow. The resulting binary encoding is [0, 0, 0, 0, 0, 0, 0, 0, 127, 255, 255, 254, 0, 0, 0, 0]. As described above the pg_wire binary format for Intervals is [Microsecond (8 bytes), Days (4 bytes), Months (4 bytes)]. So the encoding that materialize comes up with is
0 Microseconds, 0111 1111 1111 1111 1111 1111 1111 1110 days, and 0 months. 0111 1111 1111 1111 1111 1111 1111 1110 is equal to 2147483646 in decimal. So what's happening is that the -2147483650 days ends up underflowing into a positive amount of days.

Checklist

  • This PR has adequate test coverage / QA involvement has been duly considered.
  • This PR adds a release note for any user-facing behavior changes.

In Materialize Intervals internally store the number of months in the
interval and the number of nanoseconds. The pg_wire binary format for
Intervals is 8 bytes for the number of microseconds, followed 4 bytes
for the number of days, followed by 4 bytes for the number of months.
When converting an Interval to the pg_wire binary encoding, Materialize
extracts as many whole days from the nanoseconds as will fit in 4 bytes
(an i32 in Rust). During this it checks if the number of days is larger
than i32::MAX protecting from overflow. However, it never checked if
the number of days is less than i32::MIN making is susceptible to
underflow. This commit fixes that issue.
@@ -46,7 +46,10 @@ impl ToSql for Interval {
// Our intervals are guaranteed to fit within SQL's min/max intervals,
// so this is compression is guaranteed to be lossless. For details, see
// `repr::scalar::datetime::compute_interval`.
let days = std::cmp::min(self.0.days() as i128, i32::MAX as i128);
let days = std::cmp::max(
std::cmp::min(self.0.days() as i128, i32::MAX as i128),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here's an alternate take: we shouldn't have been doing min in the first place because it's changing the user's data, and should instead be returning an error. I think this current PR is strictly better than what we have now, but I don't think the underlying problem is fixed, just improved. @sploiselle do you have thoughts about this proposal, or more context about the use of min?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, yeah, this prior behavior was bad! Our min/max intervals are way bigger than that:
https://materialize.com/docs/sql/types/interval/

Seems like this just wants to error on overflow. Does PG limit you to i32::MIN/MAX days?

Copy link
Contributor Author

@jkosh44 jkosh44 Feb 10, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For reference i32::max is 2147483647

So, assuming that we are currently considering 24 hours equal to 1 day, I don't think it's necessarily changing the users data. Any amount of days that cannot fit into an i32 is put into the the microsecond field which is an i64. So 2147483648 days won't get truncated into 2147483647 days, it will be converted into 2147483647 days and 24 hrs.

The biggest Interval that we (and Postgres) will accept is 2147483647 days 2147483647 hours 59 minutes 59.999999 seconds. Anything bigger should result in an out of range error. 2147483647 hours 59 minutes 59.999999 seconds is slightly less than 7730941132800000000 microseconds.

So that means that we will never have to store more than 2147483647 in the day bytes and we will never have to store more than 7730941132800000000 in the microsecond bytes. Since i64::max is 9223372036854775807, I believe that means that we won't ever lose any precision or change the user's data here. We may just convert days to microseconds or vice versa. Assuming that all my math here is correct.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh I didn't see your comment @sploiselle or the docs online. Postgres does limit you to i32::MIN/MAX days:

postgres=# SELECT INTERVAL '2147483648 days';
ERROR:  interval field value out of range: "2147483648 days"
LINE 1: SELECT INTERVAL '2147483648 days';
                        ^
postgres=# SELECT INTERVAL '-2147483649 days';
ERROR:  interval field value out of range: "-2147483649 days"
LINE 1: SELECT INTERVAL '-2147483649 days';

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually looks like I was mistaken about what we accept. We diverge from Postgres because we do not limit to i32::MIN/MAX days.

materialize=>  SELECT INTERVAL '2147483648 days';
    interval     
-----------------
 2147483648 days
(1 row)

materialize=> SELECT INTERVAL '-2147483649 days';
     interval     
------------------
 -2147483649 days
(1 row)

So it might be possible that the microseconds eventually overflow, I have to think about what the largest Inteval we accept would get converted into.

Is there a reason that we accept Intervals outside of the range that Postgres does?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, I'll include it in this PR since it's not a large change.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's just that the number is wrong. 193_273_528_233_599_999_999_000 nanoseconds does not equal 2147483647 hours 59 minutes 59.999999 seconds

But it does equal 2147483647 days 2147483647 hours 59 minutes 59.999999 seconds.

Anyhow, sorry folks––I wrote this code a long time ago and had forgotten my thinking/motivation and it definitely should have gotten more inline comments to detail the behavior.

Part of this is the "fallout" from our underlying representation differing from PG, which treats days as separate from H:M:S, whereas we bundle them together:

In PG:

sean=# SELECT interval '1 day 2147483647 hours';
        interval
------------------------
 1 day 2147483647:00:00
(1 row)

sean=# SELECT interval '1 day 2147483648 hours';
ERROR:  interval field value out of range: "1 day 2147483648 hours"
LINE 1: SELECT interval '1 day 2147483648 hours';

This limitation/behavior seemed pretty arbitrary to me, so I let users express Postgres' rolled-up maximum (still obeying the month-date boundary), which turns out to be 2236962132 days 07:59:59.999999. This code does then takes the maximum number of days possible (i.e. saturating the i32), and then rolls over any remaining day-like time units into hours.

Given that this is working as intended, I think @jkosh44 's code is right and I was wrong to say it was wrong.

This all being said, I am a little too tired tonight to reason through any issues with the values' signed-ness, but that's worth thinking through because I obviously failed to in my initial implementation.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see what you're saying, you're right I didn't account for the days portion in the current bounds check. I think then that the current bounds check is mostly correct. Only issue is that the min is off by 1 day and 1 hour.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I still don't understand the need to have min and max here especially with a comment that says "this is compression is guaranteed to be lossless".

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So the days variable is computed using the the self.0.duration field. I've inlined the days() method below.

    pub fn days(&self) -> i64 {
        self.dur_as_secs() / (60 * 60 * 24)
    }

    pub fn dur_as_secs(&self) -> i64 {
        (self.duration / 1_000_000_000) as i64
    }

If you take a look at line 53, The ns variable is also being initialized with the self.0.duration field minus the days variable. So that means that any amount of days that didn't fit into the days variable (32 bits), will be included in the ns variable (64 bits) and still be lossless.

Due to the bounds of the min/max allowable Interval, we know that self.0.duration will never be big/small enough to over/under-flow both days and ns.

…erval-binary-underflow

� Conflicts:
�	test/pgtest/binary.pt
@jkosh44
Copy link
Contributor Author

jkosh44 commented Feb 10, 2022

I created #10598 to fix the bounds issue. I decided to make a separate PR because I thought it was related but a separate issue and wanted to keep the commit messages isolated. If either of you think I should combine the PRs then I'm happy to do that as well.

@jkosh44 jkosh44 changed the title Fix underflow during binary encoding of Intervals [WIP] Fix underflow during binary encoding of Intervals Feb 11, 2022
@jkosh44
Copy link
Contributor Author

jkosh44 commented Feb 11, 2022

@sploiselle @mjibson Hold off on continuing to review this for now. I've noticed a couple other things in testing that make me want to give this a second look.

@jkosh44 jkosh44 changed the title [WIP] Fix underflow during binary encoding of Intervals Fix underflow during binary encoding of Intervals Feb 11, 2022
@jkosh44 jkosh44 changed the title Fix underflow during binary encoding of Intervals Fix underflow/overflow during binary encoding/decoding of Intervals Feb 11, 2022
@jkosh44
Copy link
Contributor Author

jkosh44 commented Feb 11, 2022

@sploiselle @mjibson Ok, it's ready for review again. I added stuff to fix an overflow in the binary encoding as well.

@@ -55,10 +55,10 @@ impl Interval {
/// Constructs a new `Interval` with the specified units of time.
///
/// `nanos` in excess of `999_999_999` are carried over into seconds.
pub fn new(months: i32, seconds: i64, nanos: i64) -> Result<Interval, anyhow::Error> {
pub fn new(months: i32, seconds: i64, nanos: i128) -> Result<Interval, anyhow::Error> {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We would prefer to not use rust's as operator because it truncates when downcasting and is a no-op for same size integers leading to unexpected results (see "numeric cast" in https://doc.rust-lang.org/reference/expressions/operator-expr.html#type-cast-expressions). Although it doesn't seem to be happening here, we think it's better to write things in a way that that doesn't happen. For example, if one of these were a u128 -> i128, using as would obscure the fact that the result could end up negative. I think there are two solutions:

  1. Call x.into() instead of x as i128. Into is guaranteed to be lossless.
  2. Change the signature of new to be new<I: Into<i128>>(months: i32, seconds: i64, nanos: I) then use nanos.into(). This is the same as 1 but without pushing the requirement to cast on the caller.

I'm not sure which one is better (I'd probably just use 1), but they are both better than using as.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just removed as from as many places as possible. There were two places were I needed to use as because it was converting from an f64 to i128, but there are checks right before the conversion. In a couple of places I had to use from just because there was some arithmetic happening in the function call.

@@ -46,7 +46,10 @@ impl ToSql for Interval {
// Our intervals are guaranteed to fit within SQL's min/max intervals,
// so this is compression is guaranteed to be lossless. For details, see
// `repr::scalar::datetime::compute_interval`.
let days = std::cmp::min(self.0.days() as i128, i32::MAX as i128);
let days = std::cmp::max(
std::cmp::min(self.0.days() as i128, i32::MAX as i128),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I still don't understand the need to have min and max here especially with a comment that says "this is compression is guaranteed to be lossless".

Copy link
Contributor

@sploiselle sploiselle left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM modulo changing the as conversions to either from or try_from.expect; realize you are not the original perpetrator of many of these (I think I was), but we were previously less strict about this and is something we'd like to entirely remove from the codebase. Feel free to submit a follow-on PR to address this, if you like, or lmk and I'm glad to do it.

@@ -46,7 +46,10 @@ impl ToSql for Interval {
// Our intervals are guaranteed to fit within SQL's min/max intervals,
// so this is compression is guaranteed to be lossless. For details, see
// `repr::scalar::datetime::compute_interval`.
let days = std::cmp::min(self.0.days() as i128, i32::MAX as i128);
let days = std::cmp::max(
std::cmp::min(self.0.days() as i128, i32::MAX as i128),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, in the case of self.0.days() as i128 , you should use try_from().expect("[I know this works because...]".

For i32::MIN as i128 and the like, you should use i128::from(i32::MIN).

@jkosh44 jkosh44 merged commit 19ff5f8 into MaterializeInc:main Feb 14, 2022
@jkosh44 jkosh44 deleted the interval-binary-underflow branch February 14, 2022 17:48
@maddyblue
Copy link
Contributor

I would still like an explanation about why we use min/max if we also say "this is compression is guaranteed to be lossless"!

@sploiselle
Copy link
Contributor

@mjibson Compression is probably the wrong word to use here, or at least it's confusing. Maybe a way to think of it is that we store days as an uncompressed value, intermingled with the H:M:S set of durations. We extract as many days (sign independent) from the duration as we can, and then pack the remaining duration into the H:M:S field. Because we fastidiously check the bounds on the duration, we guarantee that this min/max operation and subsequent subtraction will not lose any of the users' data.

@maddyblue
Copy link
Contributor

Ok, thanks.

wangandi pushed a commit to wangandi/materialize that referenced this pull request Apr 11, 2022
In Materialize Intervals internally store the number of months in the
interval and the number of nanoseconds. The pg_wire binary format for
Intervals is 8 bytes for the number of microseconds, followed 4 bytes
for the number of days, followed by 4 bytes for the number of months.
When converting an Interval to the pg_wire binary encoding, Materialize
extracts as many whole days from the nanoseconds as will fit in 4 bytes
(an i32 in Rust). During this it checks if the number of days is larger
than i32::MAX protecting from overflow. However, it never checked if
the number of days is less than i32::MIN making is susceptible to
underflow. This commit fixes that issue.

Additionally when converting from the pg_wire binary format to our
internal Interval representation, we convert the microseconds (which is
stored in an i64) to nanoseconds and then convert the result to an 
i128. If the microseconds are large enough, the conversion to
nanoseconds can overflow the i64. This commit first converts the
microseconds to an i128 before converting to nanoseconds, to avoid an
overflow.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants