Skip to content

Default float formatting is too narrow-minded #24556

Closed
@hanna-kruppe

Description

@hanna-kruppe

Currently, {} and {:?} always prints at most 6 digits after the decimal point, and never uses scientific exponential notation (1.23e6) regardless of how large the number is.

This has several undesirable consequences:

  • It introduces significant error when serializing numbers and reading them back (even ignoring Float printing and/or parsing is inaccurate #24557), making it the wrong default for machine-readable output.
  • It can produce extremely long strings of digits (up to three hundred), which hampers its usefulness for quick listings to be scanned by humans.
  • Newcomers are not confronted with the fact that floating point numbers are not real/rational/decimal numbers. 0.1 + 0.1 + 0.1 is not the same float as 0.3, but the default option does not reflect that.
  • For Debug specifically, it hides important differences (like the above 0.1 + 0.1 + 0.1 vs 0.3) which can mislead even experienced programmers when debugging, or at least force them to use unsightly formatting codes like {.17} (which is harder to read).

There are smarter algorithms that do better on all accounts (and also fix the accuracy issues reported in #24557). See e.g. Python 3.1+ (last bullet point, also back-ported to 2.7). It would be a great boon for people whose debugging consists of staring at the results of float calculations if Rust did the same thing.

These algorithms search for, roughly, the shortest string that reproduces the number exactly (bit for bit identical) when read back with an accurate parser. They also use scientific exponential notation when appropriate. We probably don't have accurate parsing either (again, see #24557) but this issue is not about that.

I propose adopting such an algorithm. This would lead to the following differences:

  • It will sometimes include more than six decimal digits. However, it will not include more digits than necessary, so it's better than {:.17} (this makes round trip-safe outputs easier to eyeball).
  • It will include fewer, or no, decimal digits for very large numbers.
  • It will use scientific notation for very small and very large numbers, rather than attempting to give them with ludicrous precision.
  • This is not strictly a property of these algorithms, but other languages do it and I think Rust will want it too: It always includes a decimal point, even for floats that happen to have no fractional part.

The combination of these changes mean a good balance between accuracy and not overly burdening readers. Debug and Display can still differ on minor details like whether negative zero is printed with a minus sign (#20596), but the changes listed above should apply to both. Exponential scientific notation is by no means exclusive to programmers, it's used by many calculators (hardware and software) for example, and printing a hundred digits is hardly very user friendly either.

Implementation

Python uses Martin Gay's algorithm (and his C implementation of the same), which by all accounts is incredibly complicated and complex --- porting it won't be fun. It also needs memory allocation, which disqualifies it from core (I'm sure there is an upper bound on how many bits it needs, but identifying that bound would be yet another porting hurdle).

Florian Loitsch's Grisu algorithm(s) should be doable (with the caveat that I'm only halfway through the paper myself). Grisu3 "gives up" on about 0.5% of all possible floats, but IIUC Grisu and Grisu2 can handle those, they just doesn't guarantee finding the shorted possible string, an acceptable trade off IMHO. I'm not sure whether using only Grisu2 gives the same result as Grisu3 on the floats the latter does handle, but if so, that would simplify the implementation.

There is an existing implementation (that uses only core) of Grisu3 in rust-strconv by @lifthrasiir --- anyone interested in working on this should get in touch with the author. I'm not sure if we'd want to import that implementation wholesale (even assuming the author's cooperation) though: It falls back to Dragon4 for the numbers Grisu3 can't handle, so it's rather more code (and more complicated) than the Grisu2-only option.

There are some open question though: How should formatting options like .N and LowerExp and UpperExp be handled? Can Grisu handle these, or do we need to build on top of it? I don't care that much about perfect accuracy for these (since they round anyway), so this may be easier than we expect.

cc @rprichard

Metadata

Metadata

Assignees

No one assigned

    Labels

    C-enhancementCategory: An issue proposing an enhancement or a PR with one.

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions