|
1 | 1 | +++
|
2 | 2 | title = "Float like Excel"
|
3 |
| -date = 2024-10-10 |
4 |
| -draft = true |
| 3 | +date = 2024-11-27 |
5 | 4 | +++
|
6 | 5 |
|
7 |
| -In a [Row Zero](https://rowzero.io/home) workbook, all numeric values are stored as [IEEE |
8 |
| -754](https://en.wikipedia.org/wiki/IEEE_754) binary64 values, more commonly known as |
9 |
| -double-precision floats, or "doubles". This binary type has several virtues, but also can behave in |
10 |
| -ways that deviate from what most people expect from numbers. For example, 0 and -0 are distinct |
11 |
| -double values, but not distinct numbers. |
| 6 | +Microsoft Excel stores numbers in a binary floating-point format. Specifically, [the |
| 7 | +documentation](https://learn.microsoft.com/en-us/office/troubleshoot/excel/floating-point-arithmetic-inaccurate-result) |
| 8 | +tells us that Excel "was designed around" [IEEE 754](https://en.wikipedia.org/wiki/IEEE_754) and |
| 9 | +uses a version of the binary64 type specified in that document. In the course of building [the |
| 10 | +world's fastest spreadsheet](https://rowzero.io/home), I've had occasion to look into some of the |
| 11 | +nuances of how Excel handles numbers. In this post, I will explain one way in which Excel's behavior |
| 12 | +surprised me. Excel discards more numeric precision than is strictly necessary for the binary64 |
| 13 | +format (more commonly known as double-precision floating point numbers, or "doubles" for short). |
| 14 | + |
| 15 | +(If you're already an expert on the binary64 format, feel free to skip the next two sections.) |
| 16 | + |
| 17 | +## Quick primer on binary64 |
| 18 | + |
| 19 | +The binary64 format encodes numbers in 64 bits. It uses 1 bit for the *sign*, like any signed number |
| 20 | +format. 11 bits are used to encode the *exponent*, and 52 bits for the *significand*, sometimes also |
| 21 | +called the *mantissa*. |
| 22 | + |
| 23 | +As an 11-bit unsigned integer exponent can range from 0 to 2047. But we want to |
| 24 | +support negative exponents, so the intended value is recovered by subtracting 1023, also known as |
| 25 | +the *bias*. |
| 26 | + |
| 27 | +The 52 bits of the significand are used as the fractional part of a number with a leading 1 (except |
| 28 | +in the subnormal case, which we'll ignore — Excel explicitly doesn't support it, anyway). We |
| 29 | +can get away with using 52 bits for a 53-bit number because the most significant digit of any number |
| 30 | +is guaranteed to be non-zero, which in binary means it must be 1. So let's disambiguate: we'll call |
| 31 | +the (unsigned) integer represented by the 52 bits the *fraction*, and the *significand* is always a |
| 32 | +53 bit number, and is recovered by adding the leading 1, i.e., the significand is 1.*fraction*. Note that this is a binary |
| 33 | +fraction, so multiplying by 2<sup>*n*</sup> just shifts the decimal place *n* place to the right, like |
| 34 | +multiplying by 10<sup>*n*</sup> does in decimal. |
| 35 | + |
| 36 | +Then we can recover the number encoded in the bits as: |
| 37 | + |
| 38 | +> (-1)<sup>*sign*</sup> × *significand* × 2<sup>*exponent*-1023</sup> |
| 39 | +
|
| 40 | +Let's also define a *representable* number as one which can be recovered, using this formula, from |
| 41 | +a binary64 number. |
| 42 | + |
| 43 | +A few facts that follow from these definitions: |
| 44 | + |
| 45 | +1. Not every number is representable. For example, 1234567890.123456789 — it's not possible to |
| 46 | + encode this many significant decimal digits in 53 binary digits. |
| 47 | +2. The largest representable number is 1.7976931348623157e308 (where "e308" is scientific notation |
| 48 | + meaning "×10<sup>308</sup>"). |
| 49 | +3. The least representable number is -1.7976931348623157e308. |
| 50 | +4. The smallest representable fraction (again ignoring subnormal numbers) is 2.2250738585072014e-308. |
| 51 | +5. In general, decimal numbers with 15 significant digits or fewer which are neither too large nor |
| 52 | + too small *are* representable, but some numbers with as many as 17 significant decimal digits are representable. |
| 53 | +6. An example of the last is 2<sup>53</sup>, which is equivalent to 9,007,199,254,740,992 (16 |
| 54 | + significant digits). This number is just barely too big to fit in the 53 bits of the |
| 55 | + significand. But since it's exactly 2<sup>52</sup> × 2<sup>1</sup>, it can be represented that |
| 56 | + way. |
| 57 | + |
| 58 | +So what happens when you have a number that isn't representable, and you need to store it in this |
| 59 | +format? |
| 60 | + |
| 61 | +## Rounding rules |
| 62 | + |
| 63 | +Well, you have no choice but to round. But round to what? The nearest representable number is the |
| 64 | +option that introduces the least rounding error — the smallest delta between the number you |
| 65 | +want and the number you get. So that's what the IEEE 754 specification says to do. And then you also |
| 66 | +need a rule for breaking ties, when there are two equidistant representable numbers. The typical, |
| 67 | +but not necessarily required, rule is "round ties to even", which is more colloquially known as |
| 68 | +"banker's rounding". In binary64, since there's only one even digit, that means if you're |
| 69 | +equidistant to two representable numbers, you pick the one with a 0 in the least digit in the |
| 70 | +significand. |
| 71 | + |
| 72 | +Here's an example. As noted above, 2<sup>53</sup> is representable. We saw that this was because it |
| 73 | +can be represented as a 53-bit number multiplied by 2. This is true for *every* even integer in the |
| 74 | +range 2<sup>53</sup> to 2<sup>54</sup> - 1. But the odd integers are not representable. |
| 75 | + |
| 76 | +So if we try to parse 2<sup>53</sup> + 1, we have to round it either to 2<sup>53</sup> or |
| 77 | +2<sup>53</sup> + 2. The former is representable as 2<sup>52</sup> × 2; the latter is |
| 78 | +representable as (2<sup>52</sup> + 1) × 2. For the former, the binary significand is a one |
| 79 | +followed by 52 zeros. The latter has a one followed by 51 zeros and a trailing one. So the rule says |
| 80 | +to choose the former. And this is what we see in, for example, rust's `f64` type. The following |
| 81 | +program executes successfully: |
| 82 | + |
| 83 | +```rust |
| 84 | +fn main() { |
| 85 | + let a = 2f64.powi(53); |
| 86 | + let b = a + 1.0; |
| 87 | + let c = a + 2.0; |
| 88 | + |
| 89 | + assert_eq!(a, b); |
| 90 | + assert_ne!(a, c); |
| 91 | +} |
| 92 | +``` |
| 93 | + |
| 94 | +## What does Excel do? |
| 95 | + |
| 96 | +So we're now in a position to state how Excel diverges from what I'd expect from an implementation |
| 97 | +of IEEE 754 binary64 numbers. Excel does not round to the nearest representable number. Instead, |
| 98 | +they truncate to 15 significant decimal digits, which as we noted above is guaranteed to be |
| 99 | +representable. And they do this even if a number with more than 15 significant decimal digits is |
| 100 | +representable without rounding! |
| 101 | + |
| 102 | +For example if you type 9,007,199,254,740,992 (2<sup>53</sup>) into a cell in Excel, what you get |
| 103 | +back is 9,007,199,254,740,990. Note the final digit. You get the same result if you enter any number |
| 104 | +in the range 9,007,199,254,740,990 to 9,007,199,254,740,999. |
| 105 | + |
| 106 | +This is true even though 9,007,199,254,741,000 is accepted as-is by Excel, and is closer to |
| 107 | +9,007,199,254,740,999 than the value it actually rounds to. So this is not rounding — it's |
| 108 | +truncation to 15 significant digits. |
| 109 | + |
| 110 | +## Pros and cons of Excel's behavior |
| 111 | + |
| 112 | +A consequence of what Excel does is that it introduces a larger overall rounding error. In rust, if |
| 113 | +you subtract 2<sup>53</sup> from 2<sup>53</sup> + 2, you get 2, which is the precise, correct |
| 114 | +result. In Excel, you get 0. If you introduce many such errors, and then do, say, a sum over a bunch |
| 115 | +of numbers with individual small rounding errors, the total error can add up to be quite large. |
| 116 | +This is the reason for rounding to the nearest representable number — to reduce error |
| 117 | +introduced by rounding. It's also the [reason to use banker's rounding](https://stackoverflow.com/questions/45223778/is-bankers-rounding-really-more-numerically-stable) rather than the more familiar rule we learn in school (round ties away from zero). |
| 118 | + |
| 119 | +So why do what Excel does? Excel's behavior gives you the following property: no number it displays |
| 120 | +will contain a significant digit that's different from one you typed. By limiting to 15 significant |
| 121 | +decimal places, it can |
| 122 | +guarantee that the truncated number is precisely representable. And by truncating instead of |
| 123 | +rounding, it can guarantee that the significant digits that remain are exactly the same as the |
| 124 | +input. |
| 125 | + |
| 126 | +I imagine this is the property that they wanted. An Excel user might feel, if they |
| 127 | +typed 9,007,199,254,740,993 and got back 9,007,199,254,740,992, that this was a bug. Or, potentially |
| 128 | +worse, wouldn't even realize it had changed, and would later wrongly infer that the number they |
| 129 | +entered was 9,007,199,254,740,992. |
| 130 | + |
| 131 | +Of course, the actual behavior might also appear to be a bug, but I imagine it is easier to explain |
| 132 | +"we only support 15 significant digits of precision" than it is to explain binary64 in all its |
| 133 | +complex glory. Is it less surprising for a 2 to become a 0 than a 4? I guess, maybe. |
| 134 | + |
| 135 | +It's worth noting that Google sheets emulates Excel exactly here. Is that just extreme dedication to |
| 136 | +Excel-compatibility? Or is it because they agree that this behavior is desirable? |
| 137 | + |
| 138 | +The cost of this is that Excel sacrifices more precision than is strictly necessary. Personally, I'm |
| 139 | +not sure that trade-off is worth it. |
12 | 140 |
|
13 |
| -Excel does the same, except they [don't exactly adhere to IEEE |
14 |
| -standard](https://learn.microsoft.com/en-us/office/troubleshoot/excel/floating-point-arithmetic-inaccurate-result#cases-in-which-we-dont-adhere-to-ieee-754). But it's pretty close. |
15 | 141 |
|
16 |
| -It's no secret that we use the rust programming language at Row Zero. So |
17 |
| -initially Row Zero inherited a lot of implementation details about numbers from rust's |
18 |
| -[`f64`](https://doc.rust-lang.org/std/primitive.f64.html) type, which also implements the standard. |
19 | 142 |
|
20 |
| -However, the standard leaves quite a bit of detail unspecified, at least when it comes to how to |
21 |
| -present doubles as decimal numbers for a human to read. |
|
0 commit comments