Skip to content

base: fix parsing Float32 literals with f/F in place of e/E. #44910

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 2 commits into
base: master
Choose a base branch
from

Conversation

thinking-tower
Copy link

@thinking-tower thinking-tower commented Apr 8, 2022

Description

julia> float32("1f0")
ERROR: ArgumentError("float32(String): invalid number format")
 in float32 at string.jl:1567

This PR should fix #5690.

Discussion

In the previous PR, there was a comment by @StefanKarpinski in #37961 (comment):

Since we already ship an implementation of strtod it seems like we could just unconditionally use that (instead of only using it on platforms that don't have a native implementation) and then modify this one line to accept f and F as well as e and E:

if (*p == 'e' || *p == 'E')

That would have the further benefit of making us less dependent on different behaviors on different systems.

However we are still dependent on the native implementation for strtod, which doesn't recognize f/F for the exponent.

julia/src/support/strtod.c

Lines 222 to 260 in 6d78404

if (decimal_point_pos) {
char *copy, *c;
/* Create a copy of the input, with the '.' converted to the
locale-specific decimal point */
copy = (char *)malloc(end - digits_pos + 1 + decimal_point_len);
if (copy == NULL) {
*endptr = (char *)nptr;
errno = ENOMEM;
return -1.0;
}
c = copy;
memcpy(c, digits_pos, decimal_point_pos - digits_pos);
c += decimal_point_pos - digits_pos;
memcpy(c, decimal_point, decimal_point_len);
c += decimal_point_len;
memcpy(c, decimal_point_pos + 1,
end - (decimal_point_pos + 1));
c += end - (decimal_point_pos + 1);
*c = 0;
val = strtod(copy, &fail_pos);
if (fail_pos)
{
if (fail_pos > decimal_point_pos)
fail_pos = (char *)digits_pos +
(fail_pos - copy) -
(decimal_point_len - 1);
else
fail_pos = (char *)digits_pos +
(fail_pos - copy);
}
free(copy);
}
else {
val = strtod(digits_pos, &fail_pos);
}

An alternate option is to change the API from

JL_DLLEXPORT double jl_strtod_c(const char *nptr, char **endptr);
JL_DLLEXPORT float jl_strtof_c(const char *nptr, char **endptr);

to

JL_DLLEXPORT double jl_strtod_c(char *nptr, char **endptr);
JL_DLLEXPORT float jl_strtof_c(char *nptr, char **endptr);

Then we can modify the first instance of f/F to e there.

References

@thinking-tower thinking-tower force-pushed the bugfix/parse-julia-float32-format branch from cb1982a to 2438b82 Compare April 8, 2022 11:20
@KristofferC
Copy link
Member

KristofferC commented Apr 8, 2022

I think this is a bit strange to have since the f/F notation is Julia syntax and that is already parsed by Meta.parse. The way Float32s are printed (for example to files) is not with f/F. So you would only run into this when you are reading a Julia source file but then Base.parse is the wrong tool anyway.

@thinking-tower thinking-tower force-pushed the bugfix/parse-julia-float32-format branch from 2438b82 to 6b920e5 Compare April 9, 2022 09:09
@thinking-tower
Copy link
Author

thinking-tower commented Apr 9, 2022

I think this is a bit strange to have since the f/F notation is Julia syntax and that is already parsed by Meta.parse. The way Float32s are printed (for example to files) is not with f/F. So you would only run into this when you are reading a Julia source file but then Base.parse is the wrong tool anyway.

@KristofferC I think that this is good in terms of language consistency but I don't have a strong preference for either. Note that Parsers.jl supports it too.

@Keno's bumped into this issue a few times, perhaps he can give some insight?

@thinking-tower thinking-tower force-pushed the bugfix/parse-julia-float32-format branch from 6b920e5 to 43970ea Compare April 10, 2022 11:19
@oscardssmith
Copy link
Member

To me this looks like an improvement, but I think it is worth considering taking the Parsers.jl implementation. It would be relatively complicated, but Parsers.jl is several times faster than strod, which can matter a bunch for data processing.

@thinking-tower
Copy link
Author

To me this looks like an improvement, but I think it is worth considering taking the Parsers.jl implementation. It would be relatively complicated, but Parsers.jl is several times faster than strod, which can matter a bunch for data processing.

@oscardssmith Thanks for the review, I am interested in porting over Parsers.jl's implementation! However that is a much bigger change than what I am proposing now, having more pairs of eyes on this PR would be better before undertaking that task.

@thinking-tower
Copy link
Author

test x86_64-apple-darwin segmentation faults might be related to #44562.

@stevengj
Copy link
Member

stevengj commented Apr 26, 2022

In general, parse(T, string) isn't related to Julia syntax. parse(Int, "1_000") doesn't work either, even though 1_000 is a valid Julia integer literal, whereas parse(DateTime, "2022-04-26T15:15") works even though 2022-04-26T15:15 is not Julia syntax. What we accept in parse is more about what we're likely to find in external data sources (e.g. CSV files).

Until/unless we replace our usage of strtod entirely, it makes sense to me to stick to strotod formats.

@oscardssmith
Copy link
Member

OTOH, I am very much in favor of replacing strod with Parsers.jl. It's much faster and more flexible.

@brenhinkeller brenhinkeller added strings "Strings!" feature Indicates new feature / enhancement requests labels Nov 16, 2022
@stevengj
Copy link
Member

stevengj commented Sep 4, 2024

That being said, this feature was already discussed and accepted in #5690

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature Indicates new feature / enhancement requests strings "Strings!"
Projects
None yet
Development

Successfully merging this pull request may close these issues.

parse can't parse our own Float32 output format
6 participants