base: fix parsing Float32 literals with f/F in place of e/E. #44910

thinking-tower · 2022-04-08T11:14:03Z

Description

julia> float32("1f0")
ERROR: ArgumentError("float32(String): invalid number format")
 in float32 at string.jl:1567

This PR should fix #5690.

Discussion

In the previous PR, there was a comment by @StefanKarpinski in #37961 (comment):

Since we already ship an implementation of strtod it seems like we could just unconditionally use that (instead of only using it on platforms that don't have a native implementation) and then modify this one line to accept f and F as well as e and E:

julia/src/support/strtod.c

Line 208 in c31d126

if (*p == 'e' || *p == 'E')

That would have the further benefit of making us less dependent on different behaviors on different systems.

However we are still dependent on the native implementation for strtod, which doesn't recognize f/F for the exponent.

julia/src/support/strtod.c

Lines 222 to 260 in 6d78404

    
               if (decimal_point_pos) { 
        
                   char *copy, *c; 
        
                   /* Create a copy of the input, with the '.' converted to the 
        
                      locale-specific decimal point */ 
        
                   copy = (char *)malloc(end - digits_pos + 1 + decimal_point_len); 
        
                   if (copy == NULL) { 
        
                       *endptr = (char *)nptr; 
        
                       errno = ENOMEM; 
        
                       return -1.0; 
        
                   } 
        
                   c = copy; 
        
                   memcpy(c, digits_pos, decimal_point_pos - digits_pos); 
        
                   c += decimal_point_pos - digits_pos; 
        
                   memcpy(c, decimal_point, decimal_point_len); 
        
                   c += decimal_point_len; 
        
                   memcpy(c, decimal_point_pos + 1, 
        
                          end - (decimal_point_pos + 1)); 
        
                   c += end - (decimal_point_pos + 1); 
        
                   *c = 0; 
        
                   val = strtod(copy, &fail_pos); 
        
                   if (fail_pos) 
        
                   { 
        
                       if (fail_pos > decimal_point_pos) 
        
                           fail_pos = (char *)digits_pos + 
        
                               (fail_pos - copy) - 
        
                               (decimal_point_len - 1); 
        
                       else 
        
                           fail_pos = (char *)digits_pos + 
        
                               (fail_pos - copy); 
        
                   } 
        
                   free(copy); 
        
               } 
        
               else { 
        
                   val = strtod(digits_pos, &fail_pos); 
        
               }

An alternate option is to change the API from

julia/src/support/strtod.h

Lines 10 to 11 in 6d78404

    
           JL_DLLEXPORT double jl_strtod_c(const char *nptr, char **endptr); 
        
           JL_DLLEXPORT float jl_strtof_c(const char *nptr, char **endptr);

to

JL_DLLEXPORT double jl_strtod_c(char *nptr, char **endptr);
JL_DLLEXPORT float jl_strtof_c(char *nptr, char **endptr);

Then we can modify the first instance of f/F to e there.

References

Fix #5690: Julia to parse own float format "1f0" #37961

KristofferC · 2022-04-08T15:12:37Z

I think this is a bit strange to have since the f/F notation is Julia syntax and that is already parsed by Meta.parse. The way Float32s are printed (for example to files) is not with f/F. So you would only run into this when you are reading a Julia source file but then Base.parse is the wrong tool anyway.

thinking-tower · 2022-04-09T09:10:38Z

I think this is a bit strange to have since the f/F notation is Julia syntax and that is already parsed by Meta.parse. The way Float32s are printed (for example to files) is not with f/F. So you would only run into this when you are reading a Julia source file but then Base.parse is the wrong tool anyway.

@KristofferC I think that this is good in terms of language consistency but I don't have a strong preference for either. Note that Parsers.jl supports it too.

@Keno's bumped into this issue a few times, perhaps he can give some insight?

oscardssmith · 2022-04-10T14:42:26Z

To me this looks like an improvement, but I think it is worth considering taking the Parsers.jl implementation. It would be relatively complicated, but Parsers.jl is several times faster than strod, which can matter a bunch for data processing.

thinking-tower · 2022-04-11T08:21:56Z

To me this looks like an improvement, but I think it is worth considering taking the Parsers.jl implementation. It would be relatively complicated, but Parsers.jl is several times faster than strod, which can matter a bunch for data processing.

@oscardssmith Thanks for the review, I am interested in porting over Parsers.jl's implementation! However that is a much bigger change than what I am proposing now, having more pairs of eyes on this PR would be better before undertaking that task.

thinking-tower · 2022-04-11T08:24:43Z

test x86_64-apple-darwin segmentation faults might be related to #44562.

stevengj · 2022-04-26T19:18:03Z

In general, parse(T, string) isn't related to Julia syntax. parse(Int, "1_000") doesn't work either, even though 1_000 is a valid Julia integer literal, whereas parse(DateTime, "2022-04-26T15:15") works even though 2022-04-26T15:15 is not Julia syntax. What we accept in parse is more about what we're likely to find in external data sources (e.g. CSV files).

Until/unless we replace our usage of strtod entirely, it makes sense to me to stick to strotod formats.

oscardssmith · 2022-04-26T19:57:44Z

OTOH, I am very much in favor of replacing strod with Parsers.jl. It's much faster and more flexible.

stevengj · 2024-09-04T12:56:07Z

That being said, this feature was already discussed and accepted in #5690

thinking-tower force-pushed the bugfix/parse-julia-float32-format branch from cb1982a to 2438b82 Compare April 8, 2022 11:20

thinking-tower force-pushed the bugfix/parse-julia-float32-format branch from 2438b82 to 6b920e5 Compare April 9, 2022 09:09

base: fix parsing Float32 literals with f/F in place of e/E.

43970ea

thinking-tower force-pushed the bugfix/parse-julia-float32-format branch from 6b920e5 to 43970ea Compare April 10, 2022 11:19

Merge branch 'master' into bugfix/parse-julia-float32-format

5e42dc6

brenhinkeller added strings "Strings!" feature Indicates new feature / enhancement requests labels Nov 16, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

base: fix parsing Float32 literals with f/F in place of e/E. #44910

base: fix parsing Float32 literals with f/F in place of e/E. #44910

Uh oh!

thinking-tower commented Apr 8, 2022 •

edited by stevengj

Loading

Uh oh!

KristofferC commented Apr 8, 2022 •

edited

Loading

Uh oh!

thinking-tower commented Apr 9, 2022 •

edited

Loading

Uh oh!

oscardssmith commented Apr 10, 2022

Uh oh!

thinking-tower commented Apr 11, 2022

Uh oh!

thinking-tower commented Apr 11, 2022

Uh oh!

stevengj commented Apr 26, 2022 •

edited

Loading

Uh oh!

oscardssmith commented Apr 26, 2022

Uh oh!

stevengj commented Sep 4, 2024 •

edited

Loading

Uh oh!

Uh oh!

	if (decimal_point_pos) {
	char copy, c;
	/* Create a copy of the input, with the '.' converted to the
	locale-specific decimal point */
	copy = (char *)malloc(end - digits_pos + 1 + decimal_point_len);
	if (copy == NULL) {
	endptr = (char )nptr;
	errno = ENOMEM;
	return -1.0;
	}

	c = copy;
	memcpy(c, digits_pos, decimal_point_pos - digits_pos);
	c += decimal_point_pos - digits_pos;
	memcpy(c, decimal_point, decimal_point_len);
	c += decimal_point_len;
	memcpy(c, decimal_point_pos + 1,
	end - (decimal_point_pos + 1));
	c += end - (decimal_point_pos + 1);
	*c = 0;

	val = strtod(copy, &fail_pos);

	if (fail_pos)
	{
	if (fail_pos > decimal_point_pos)
	fail_pos = (char *)digits_pos +
	(fail_pos - copy) -
	(decimal_point_len - 1);
	else
	fail_pos = (char *)digits_pos +
	(fail_pos - copy);
	}

	free(copy);
	}
	else {
	val = strtod(digits_pos, &fail_pos);
	}

	JL_DLLEXPORT double jl_strtod_c(const char nptr, char *endptr);
	JL_DLLEXPORT float jl_strtof_c(const char nptr, char *endptr);

Uh oh!

base: fix parsing Float32 literals with f/F in place of e/E. #44910

Are you sure you want to change the base?

base: fix parsing Float32 literals with f/F in place of e/E. #44910

Uh oh!

Conversation

thinking-tower commented Apr 8, 2022 • edited by stevengj Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Discussion

References

Uh oh!

KristofferC commented Apr 8, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

thinking-tower commented Apr 9, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

oscardssmith commented Apr 10, 2022

Uh oh!

thinking-tower commented Apr 11, 2022

Uh oh!

thinking-tower commented Apr 11, 2022

Uh oh!

stevengj commented Apr 26, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

oscardssmith commented Apr 26, 2022

Uh oh!

stevengj commented Sep 4, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

thinking-tower commented Apr 8, 2022 •

edited by stevengj

Loading

KristofferC commented Apr 8, 2022 •

edited

Loading

thinking-tower commented Apr 9, 2022 •

edited

Loading

stevengj commented Apr 26, 2022 •

edited

Loading

stevengj commented Sep 4, 2024 •

edited

Loading