Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix Issue 20288 - std.format with separator and double causes RangeError #7222

Merged
merged 1 commit into from
Oct 23, 2019

Conversation

burner
Copy link
Member

@burner burner commented Oct 10, 2019

std.format with separator and double causes RangeError

https://issues.dlang.org/show_bug.cgi?id=20288

@burner burner requested a review from andralex as a code owner October 10, 2019 14:28
@dlang-bot
Copy link
Contributor

Thanks for your pull request, @burner!

Bugzilla references

Auto-close Bugzilla Severity Description
20288 enhancement std.format double with NaN fails with range violation on comma

Testing this PR locally

If you don't have a local development environment setup, you can use Digger to test this PR:

dub fetch digger
dub run digger -- build "master + phobos#7222"

@CyberShadow
Copy link
Member

Please put the issue title in the commit/PR title, it's impossible to know what "Fix Issue 20288" means without further action.

@CyberShadow CyberShadow changed the title Fix Issue 20288 Fix Issue 20288 - std.format with separator and double causes RangeError Oct 10, 2019
@burner burner force-pushed the Issue20288_format_comma_NaN branch from 35e4b91 to 4bf038e Compare October 11, 2019 09:17
@burner burner requested a review from JackStouffer as a code owner October 11, 2019 13:56
@wilzbach
Copy link
Member

Robert Schadek: I need a win32_64 box

This should help -> #7226 (Azure 10 parallel boxes and thus drastically faster start times).

@burner burner force-pushed the Issue20288_format_comma_NaN branch 3 times, most recently from e58956d to bba4b0d Compare October 16, 2019 15:47
@ghost
Copy link

ghost commented Oct 20, 2019

Shouldn't "-nan" not exist at all? I'm aware, that there are several ways to express nans in the IEEE standard, but I always thought, that D does not distinguish between them and therefore should still output "nan" instead of "-nan" when the sign bit of a nan is set.

I'd like to point to issue 19878 which questions the result "-nan" when using writeln.

Is there anything official about using -nan or not? I was not able to find anything useful.

@burner
Copy link
Member Author

burner commented Oct 20, 2019

@berni44 I looked at the standard. Section 6.3 of 754 2008 states that the standard does not interpret the sign bit of NaNs, but some operation specify it. IMHO phobos should therefore display the sign.

@ghost
Copy link

ghost commented Oct 22, 2019

rikki cattermole wrote in the Forum in a related thread, that the behaviour of printf is the baseline. According to this -nan is correct and issue 19878 should be marked "invalid".

@burner
Copy link
Member Author

burner commented Oct 22, 2019

printf on which platfrom, win32 win64, glibc, bsd? they all do things differently.

@atilaneves
Copy link
Contributor

printf on which platfrom, win32 win64, glibc, bsd? they all do things differently.

Shouldn't that be covered by the C standard?

@burner
Copy link
Member Author

burner commented Oct 22, 2019

I don't have the latest standard at hand, but in the latest publicly available one
http://www.open-std.org/JTC1/SC22/WG14/www/docs/n1570.pdf

on page 312 it states that

Adouble argument representing a NaN is converted in one of the styles
[-]nan or [-]nan(n-char-sequence) — which style, and the meaning of
any n-char-sequence, is implementation-defined.

The phrase implementation-defined appears quite a lot in the paragraph.
real values are not mentioned at all.

edit: windows does some really strange things. "1.#INF" is a thing

More interesting is footnote 277

When applied to infinite and NaN values, the -, +, and space flag characters have their
usual meaning

I therefore think that the implemented behavior of this PR is "correct" in that it makes all platforms act as linux 64bit.
Which I turn IMHO is the sanest of them all (sign is preserved)

@ghost
Copy link

ghost commented Oct 22, 2019

Interesting. So printf and the C documentation is not really a reliable source...

Maybe we (= D community) have to define ourselves, what format() should do and clearly document that. Looks like lot of work...

@burner
Copy link
Member Author

burner commented Oct 22, 2019

eventually yes, but not in this PR. This PR fixes a crash and makes a minor thing behave equal across all platforms.

Doing the thing you mentioned, would firstly require phobos to have its own float formatting. Currently we call snprintf.

But again, the this PR does a tiny narrow thing. And shouldn't grow to do all that.

@ghost
Copy link

ghost commented Oct 22, 2019

Yes, I agree. In my oppinion, this PR can be merged.

Beyond this PR, for me the question is, what's better: Do more of such corrections or provide a complete rewrite of formattedWrite and Co. using regex for parsing the format string, providing our own float formatting and so on. When looking into bugzilla, there are lots of bugs in here and correcting them might be more work than a sound rewrite.

@burner
Copy link
Member Author

burner commented Oct 22, 2019

I think doing our own formatImpl from float/double/real to string would be pretty good already.

edit:
the formatspec parsing is already good. IMHO regex would make it only worse.
IMO most bug reports come from an under-specified printf leading to different opinions.

@ghost
Copy link

ghost commented Oct 22, 2019

Having had a closer look at the files, I don't understand how the changes in std.conv are related to this bug.

@burner
Copy link
Member Author

burner commented Oct 22, 2019

windows 32 is doing some really crazy stuff with floats, these changes just add some sanity checks.

@burner
Copy link
Member Author

burner commented Oct 22, 2019

and std.conv.to!string( some_float ) uses std.format

std.format with separator and double causes RangeError

trying to find the win32 bug

windows nan -nan fix

harder workaround

name clash fix

moving some stuff around

making it shorter

oh win32_64 what are you doing to me

and another try

some windows special case

and another

and again

removed some duplicated code

make it compile again

some debug output

I need a win32_64 box

and again

better infos

finding the failing test

something strange is going on

getting closer

I got it maybe

less output

I think I understand now

tighter code

making the commit nicer

removed an import

undoing some debug changes

removed an unneeded test
@burner burner force-pushed the Issue20288_format_comma_NaN branch from bba4b0d to 2b2ed40 Compare October 22, 2019 11:18
wstring ws = to!wstring(a);
assert(ws == "42"w, to!string(ws));
dstring ds = to!dstring(a);
assert(ds == "42"d, to!string(ds));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this change necessary?

Copy link
Member Author

@burner burner Oct 23, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would say yes, while working on std.format this broke and I couldn't see the assert inputs. So this will help the next one person who gets gray hair looking at std.format!double.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for clarifying, I actually missed that you added the inputs as second argument to the asserts.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no worries, thanks for taking an interest.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe checkaction=context should be enabled for DRuntime (and other repos) unittest builds

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, please

@dejlek
Copy link

dejlek commented Oct 30, 2019

We have discussed yesterday performance issue with float --> string conversion. The default implementation in libC is painfully slow. PyPy is 2x faster (!!) because they have their own implementation. Something like this in Phobos would be welcome: https://www.reddit.com/r/ProgrammingLanguages/comments/930pch/ryu_a_new_algorithm_to_quickly_convert_floating/

@burner
Copy link
Member Author

burner commented Oct 30, 2019

@dejlek not only that, but also the usage of libC is making format fundamentally not pure.
Which stops std.json to(Pretty)String being pure and a whole lot of other stuff as well.

@ghost
Copy link

ghost commented Oct 30, 2019

@dejlek Thank for pointing to the ryu-algorithm. I'll go though it and have a look if I can use/implement it.

@UplinkCoder
Copy link
Member

I have done a port of a grisu2 implementation which is quite fast and ctfeable.

Search for fpconv_ctfe.d

@marler8997
Copy link
Contributor

I ported the ryu algorithm to D in about an hour, source here: https://github.com/dragon-lang/mar/blob/master/src/mar/ryu.d

@ghost
Copy link

ghost commented Oct 30, 2019

Meanwhile I read through the paper on ryu. I may be wrong, but I think this algorithm is not useful for this task, for two reasons:

a) The goal is slightly different to that of printf. While ryu tries to print as few digits as possible, printf prints as much as the format string defines. If that's larger than what ryu computes, the remaining digits will be wrong.
b) Ryu uses a lookuptable which is IMHO far too large for a standard library. For 128bit reals it's about 300K of data, that needs to be stored.

@ghost
Copy link

ghost commented Oct 30, 2019

@UplinkCoder I'll have a look at the grisu implementations to see if it can be used for this task. But due to my time schedule this will probably have to wait one or two weeks.

@moon-chilled
Copy link

moon-chilled commented Oct 31, 2019

@marler8997 I'm working on a more robust (than the original) translation that has everything you need to replace the current implementation format() uses. Already support not printing out all the digits; main task now is not to print out numbers with small exponents as scientific notation (that is, 123.45 instead of 1.2345e+2). Then trailing/leading zeroes should be easy enough.

Benchmarks suggest a not quite 2x bump in speed; maybe 1.8x.

@marler8997
Copy link
Contributor

@moon-chilled great. You'll have to share it. I only ported the 32-bit version of ryu. For the 64-bit version I wasn't sure if it should be completely separate like the original did, or if I should try to combine them and re-use common code. I look forward to seeing your port.

@moon-chilled
Copy link

Here. (Lookup tables end at line 380--I really need to ctfe those, because they kill compile times.)

I was planning on mostly re-using code, once I figured out a good architecture for the 64-bit version.

Oh, also - I forgot; it'll probably also get a decent speedboost once I get the 128-bit multiplies in hardware.

@burner
Copy link
Member Author

burner commented Oct 31, 2019

@MoonlightSentinel
Copy link
Contributor

This caused a regression: https://issues.dlang.org/show_bug.cgi?id=20623

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.