Fix underflow/overflow during binary encoding/decoding of Intervals #10589

jkosh44 · 2022-02-10T15:57:15Z

In Materialize Intervals internally store the number of months in the
interval and the number of nanoseconds. The pg_wire binary format for
Intervals is 8 bytes for the number of microseconds, followed 4 bytes
for the number of days, followed by 4 bytes for the number of months.
When converting an Interval to the pg_wire binary encoding, Materialize
extracts as many whole days from the nanoseconds as will fit in 4 bytes
(an i32 in Rust). During this it checks if the number of days is larger
than i32::MAX protecting from overflow. However, it never checked if
the number of days is less than i32::MIN making is susceptible to
underflow. This commit fixes that issue.

Motivation

This PR fixes a previously unreported bug. As described above, the pg_wire binary encoding for Interval types never checked for underflow causing the binary encoding to lose precision/correctness of the underlying Interval.

The easiest way to see this is with the query SELECT INTERVAL '-2147483648 days -48 hrs'; (Note that i32::MIN is -2147483648). Materialize tried to convert this to -2147483650 days and encode this in an i32 which results in underflow. The resulting binary encoding is [0, 0, 0, 0, 0, 0, 0, 0, 127, 255, 255, 254, 0, 0, 0, 0]. As described above the pg_wire binary format for Intervals is [Microsecond (8 bytes), Days (4 bytes), Months (4 bytes)]. So the encoding that materialize comes up with is
0 Microseconds, 0111 1111 1111 1111 1111 1111 1111 1110 days, and 0 months. 0111 1111 1111 1111 1111 1111 1111 1110 is equal to 2147483646 in decimal. So what's happening is that the -2147483650 days ends up underflowing into a positive amount of days.

Checklist

This PR has adequate test coverage / QA involvement has been duly considered.
This PR adds a release note for any user-facing behavior changes.

In Materialize Intervals internally store the number of months in the interval and the number of nanoseconds. The pg_wire binary format for Intervals is 8 bytes for the number of microseconds, followed 4 bytes for the number of days, followed by 4 bytes for the number of months. When converting an Interval to the pg_wire binary encoding, Materialize extracts as many whole days from the nanoseconds as will fit in 4 bytes (an i32 in Rust). During this it checks if the number of days is larger than i32::MAX protecting from overflow. However, it never checked if the number of days is less than i32::MIN making is susceptible to underflow. This commit fixes that issue.

maddyblue · 2022-02-10T17:41:35Z

src/pgrepr/src/value/interval.rs

@@ -46,7 +46,10 @@ impl ToSql for Interval {
        // Our intervals are guaranteed to fit within SQL's min/max intervals,
        // so this is compression is guaranteed to be lossless. For details, see
        // `repr::scalar::datetime::compute_interval`.
-        let days = std::cmp::min(self.0.days() as i128, i32::MAX as i128);
+        let days = std::cmp::max(
+            std::cmp::min(self.0.days() as i128, i32::MAX as i128),


Here's an alternate take: we shouldn't have been doing min in the first place because it's changing the user's data, and should instead be returning an error. I think this current PR is strictly better than what we have now, but I don't think the underlying problem is fixed, just improved. @sploiselle do you have thoughts about this proposal, or more context about the use of min?

Ah, yeah, this prior behavior was bad! Our min/max intervals are way bigger than that:
https://materialize.com/docs/sql/types/interval/

Seems like this just wants to error on overflow. Does PG limit you to i32::MIN/MAX days?

For reference i32::max is 2147483647

So, assuming that we are currently considering 24 hours equal to 1 day, I don't think it's necessarily changing the users data. Any amount of days that cannot fit into an i32 is put into the the microsecond field which is an i64. So 2147483648 days won't get truncated into 2147483647 days, it will be converted into 2147483647 days and 24 hrs.

The biggest Interval that we (and Postgres) will accept is 2147483647 days 2147483647 hours 59 minutes 59.999999 seconds. Anything bigger should result in an out of range error. 2147483647 hours 59 minutes 59.999999 seconds is slightly less than 7730941132800000000 microseconds.

So that means that we will never have to store more than 2147483647 in the day bytes and we will never have to store more than 7730941132800000000 in the microsecond bytes. Since i64::max is 9223372036854775807, I believe that means that we won't ever lose any precision or change the user's data here. We may just convert days to microseconds or vice versa. Assuming that all my math here is correct.

Oh I didn't see your comment @sploiselle or the docs online. Postgres does limit you to i32::MIN/MAX days:

postgres=# SELECT INTERVAL '2147483648 days'; ERROR: interval field value out of range: "2147483648 days" LINE 1: SELECT INTERVAL '2147483648 days'; ^ postgres=# SELECT INTERVAL '-2147483649 days'; ERROR: interval field value out of range: "-2147483649 days" LINE 1: SELECT INTERVAL '-2147483649 days';

Actually looks like I was mistaken about what we accept. We diverge from Postgres because we do not limit to i32::MIN/MAX days.

materialize=> SELECT INTERVAL '2147483648 days'; interval ----------------- 2147483648 days (1 row) materialize=> SELECT INTERVAL '-2147483649 days'; interval ------------------ -2147483649 days (1 row)

So it might be possible that the microseconds eventually overflow, I have to think about what the largest Inteval we accept would get converted into.

Is there a reason that we accept Intervals outside of the range that Postgres does?

Ok, I'll include it in this PR since it's not a large change.

It's just that the number is wrong. 193_273_528_233_599_999_999_000 nanoseconds does not equal 2147483647 hours 59 minutes 59.999999 seconds

But it does equal 2147483647 days 2147483647 hours 59 minutes 59.999999 seconds.

Anyhow, sorry folks––I wrote this code a long time ago and had forgotten my thinking/motivation and it definitely should have gotten more inline comments to detail the behavior.

Part of this is the "fallout" from our underlying representation differing from PG, which treats days as separate from H:M:S, whereas we bundle them together:

In PG:

sean=# SELECT interval '1 day 2147483647 hours'; interval ------------------------ 1 day 2147483647:00:00 (1 row) sean=# SELECT interval '1 day 2147483648 hours'; ERROR: interval field value out of range: "1 day 2147483648 hours" LINE 1: SELECT interval '1 day 2147483648 hours';

This limitation/behavior seemed pretty arbitrary to me, so I let users express Postgres' rolled-up maximum (still obeying the month-date boundary), which turns out to be 2236962132 days 07:59:59.999999. This code does then takes the maximum number of days possible (i.e. saturating the i32), and then rolls over any remaining day-like time units into hours.

Given that this is working as intended, I think @jkosh44 's code is right and I was wrong to say it was wrong.

This all being said, I am a little too tired tonight to reason through any issues with the values' signed-ness, but that's worth thinking through because I obviously failed to in my initial implementation.

I see what you're saying, you're right I didn't account for the days portion in the current bounds check. I think then that the current bounds check is mostly correct. Only issue is that the min is off by 1 day and 1 hour.

I still don't understand the need to have min and max here especially with a comment that says "this is compression is guaranteed to be lossless".

So the days variable is computed using the the self.0.duration field. I've inlined the days() method below.

pub fn days(&self) -> i64 { self.dur_as_secs() / (60 * 60 * 24) } pub fn dur_as_secs(&self) -> i64 { (self.duration / 1_000_000_000) as i64 }

If you take a look at line 53, The ns variable is also being initialized with the self.0.duration field minus the days variable. So that means that any amount of days that didn't fit into the days variable (32 bits), will be included in the ns variable (64 bits) and still be lossless.

Due to the bounds of the min/max allowable Interval, we know that self.0.duration will never be big/small enough to over/under-flow both days and ns.

…erval-binary-underflow � Conflicts: � test/pgtest/binary.pt

jkosh44 · 2022-02-10T22:36:04Z

I created #10598 to fix the bounds issue. I decided to make a separate PR because I thought it was related but a separate issue and wanted to keep the commit messages isolated. If either of you think I should combine the PRs then I'm happy to do that as well.

jkosh44 · 2022-02-11T15:20:38Z

@sploiselle @mjibson Hold off on continuing to review this for now. I've noticed a couple other things in testing that make me want to give this a second look.

…erval-binary-underflow

Fixes #4926

jkosh44 · 2022-02-11T16:31:41Z

@sploiselle @mjibson Ok, it's ready for review again. I added stuff to fix an overflow in the binary encoding as well.

maddyblue · 2022-02-11T16:52:03Z

src/repr/src/adt/interval.rs

@@ -55,10 +55,10 @@ impl Interval {
    /// Constructs a new `Interval` with the specified units of time.
    ///
    /// `nanos` in excess of `999_999_999` are carried over into seconds.
-    pub fn new(months: i32, seconds: i64, nanos: i64) -> Result<Interval, anyhow::Error> {
+    pub fn new(months: i32, seconds: i64, nanos: i128) -> Result<Interval, anyhow::Error> {


We would prefer to not use rust's as operator because it truncates when downcasting and is a no-op for same size integers leading to unexpected results (see "numeric cast" in https://doc.rust-lang.org/reference/expressions/operator-expr.html#type-cast-expressions). Although it doesn't seem to be happening here, we think it's better to write things in a way that that doesn't happen. For example, if one of these were a u128 -> i128, using as would obscure the fact that the result could end up negative. I think there are two solutions:

Call x.into() instead of x as i128. Into is guaranteed to be lossless.

Change the signature of new to be new<I: Into<i128>>(months: i32, seconds: i64, nanos: I) then use nanos.into(). This is the same as 1 but without pushing the requirement to cast on the caller.

I'm not sure which one is better (I'd probably just use 1), but they are both better than using as.

Just removed as from as many places as possible. There were two places were I needed to use as because it was converting from an f64 to i128, but there are checks right before the conversion. In a couple of places I had to use from just because there was some arithmetic happening in the function call.

maddyblue · 2022-02-11T16:56:59Z

src/pgrepr/src/value/interval.rs

@@ -46,7 +46,10 @@ impl ToSql for Interval {
        // Our intervals are guaranteed to fit within SQL's min/max intervals,
        // so this is compression is guaranteed to be lossless. For details, see
        // `repr::scalar::datetime::compute_interval`.
-        let days = std::cmp::min(self.0.days() as i128, i32::MAX as i128);
+        let days = std::cmp::max(
+            std::cmp::min(self.0.days() as i128, i32::MAX as i128),


I still don't understand the need to have min and max here especially with a comment that says "this is compression is guaranteed to be lossless".

sploiselle

LGTM modulo changing the as conversions to either from or try_from.expect; realize you are not the original perpetrator of many of these (I think I was), but we were previously less strict about this and is something we'd like to entirely remove from the codebase. Feel free to submit a follow-on PR to address this, if you like, or lmk and I'm glad to do it.

sploiselle · 2022-02-14T15:29:38Z

src/pgrepr/src/value/interval.rs

@@ -46,7 +46,10 @@ impl ToSql for Interval {
        // Our intervals are guaranteed to fit within SQL's min/max intervals,
        // so this is compression is guaranteed to be lossless. For details, see
        // `repr::scalar::datetime::compute_interval`.
-        let days = std::cmp::min(self.0.days() as i128, i32::MAX as i128);
+        let days = std::cmp::max(
+            std::cmp::min(self.0.days() as i128, i32::MAX as i128),


Ah, in the case of self.0.days() as i128 , you should use try_from().expect("[I know this works because...]".

For i32::MIN as i128 and the like, you should use i128::from(i32::MIN).

maddyblue · 2022-02-14T18:28:28Z

I would still like an explanation about why we use min/max if we also say "this is compression is guaranteed to be lossless"!

sploiselle · 2022-02-14T19:11:10Z

@mjibson Compression is probably the wrong word to use here, or at least it's confusing. Maybe a way to think of it is that we store days as an uncompressed value, intermingled with the H:M:S set of durations. We extract as many days (sign independent) from the duration as we can, and then pack the remaining duration into the H:M:S field. Because we fastidiously check the bounds on the duration, we guarantee that this min/max operation and subsequent subtraction will not lose any of the users' data.

maddyblue · 2022-02-15T00:03:15Z

Ok, thanks.

In Materialize Intervals internally store the number of months in the interval and the number of nanoseconds. The pg_wire binary format for Intervals is 8 bytes for the number of microseconds, followed 4 bytes for the number of days, followed by 4 bytes for the number of months. When converting an Interval to the pg_wire binary encoding, Materialize extracts as many whole days from the nanoseconds as will fit in 4 bytes (an i32 in Rust). During this it checks if the number of days is larger than i32::MAX protecting from overflow. However, it never checked if the number of days is less than i32::MIN making is susceptible to underflow. This commit fixes that issue. Additionally when converting from the pg_wire binary format to our internal Interval representation, we convert the microseconds (which is stored in an i64) to nanoseconds and then convert the result to an i128. If the microseconds are large enough, the conversion to nanoseconds can overflow the i64. This commit first converts the microseconds to an i128 before converting to nanoseconds, to avoid an overflow.

jkosh44 requested review from maddyblue and sploiselle February 10, 2022 15:57

maddyblue reviewed Feb 10, 2022

View reviewed changes

Merge branch 'main' of github.com:MaterializeInc/materialize into int…

45074a9

…erval-binary-underflow � Conflicts: � test/pgtest/binary.pt

jkosh44 force-pushed the interval-binary-underflow branch from ed22698 to 45074a9 Compare February 10, 2022 22:29

jkosh44 changed the title ~~Fix underflow during binary encoding of Intervals~~ [WIP] Fix underflow during binary encoding of Intervals Feb 11, 2022

jkosh44 added 2 commits February 11, 2022 10:38

Merge branch 'main' of github.com:MaterializeInc/materialize into int…

ec61f72

…erval-binary-underflow

Fix overflow when converting from binary to Interval

c795cf9

Fixes #4926

jkosh44 changed the title ~~[WIP] Fix underflow during binary encoding of Intervals~~ Fix underflow during binary encoding of Intervals Feb 11, 2022

jkosh44 changed the title ~~Fix underflow during binary encoding of Intervals~~ Fix underflow/overflow during binary encoding/decoding of Intervals Feb 11, 2022

maddyblue reviewed Feb 11, 2022

View reviewed changes

jkosh44 added 2 commits February 11, 2022 13:40

Remove as from a couple of places

291a40f

Remove as from a couple of places

1f91674

sploiselle approved these changes Feb 14, 2022

View reviewed changes

jkosh44 added 2 commits February 14, 2022 11:30

Remove as

59b7e7a

Remove another as

de3796b

jkosh44 merged commit 19ff5f8 into MaterializeInc:main Feb 14, 2022

jkosh44 deleted the interval-binary-underflow branch February 14, 2022 17:48

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix underflow/overflow during binary encoding/decoding of Intervals #10589

Fix underflow/overflow during binary encoding/decoding of Intervals #10589

jkosh44 commented Feb 10, 2022

maddyblue Feb 10, 2022

sploiselle Feb 10, 2022

jkosh44 Feb 10, 2022 •

edited

Loading

jkosh44 Feb 10, 2022

jkosh44 Feb 10, 2022

jkosh44 Feb 10, 2022

sploiselle Feb 10, 2022

jkosh44 Feb 10, 2022

maddyblue Feb 11, 2022

jkosh44 Feb 11, 2022

jkosh44 commented Feb 10, 2022

jkosh44 commented Feb 11, 2022

jkosh44 commented Feb 11, 2022

maddyblue Feb 11, 2022

jkosh44 Feb 11, 2022

maddyblue Feb 11, 2022

sploiselle left a comment

sploiselle Feb 14, 2022

maddyblue commented Feb 14, 2022

sploiselle commented Feb 14, 2022

maddyblue commented Feb 15, 2022

Fix underflow/overflow during binary encoding/decoding of Intervals #10589

Fix underflow/overflow during binary encoding/decoding of Intervals #10589

Conversation

jkosh44 commented Feb 10, 2022

Motivation

Checklist

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jkosh44 Feb 10, 2022 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jkosh44 commented Feb 10, 2022

jkosh44 commented Feb 11, 2022

jkosh44 commented Feb 11, 2022

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

sploiselle left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

maddyblue commented Feb 14, 2022

sploiselle commented Feb 14, 2022

maddyblue commented Feb 15, 2022

jkosh44 Feb 10, 2022 •

edited

Loading