Allow underscores in numbers. Closes #1677 by Balajiganapathi · Pull Request #3120 · argotorg/solidity

Balajiganapathi · 2017-10-20T19:40:49Z

Closes #1677.

Follows https://www.python.org/dev/peps/pep-0515/

axic · 2017-10-23T09:25:13Z

libsolidity/parsing/Scanner.cpp

This doesn't prevent the leading underscore which seems to be weird in hex (0x_1234)

I added that as python allows it:
(an example from https://www.python.org/dev/peps/pep-0515/)
flags = 0b_0011_1111_0100_1110

axic · 2017-10-23T09:30:28Z

Thanks! Can you please add tests that (assuming we strictly follow PEP-0515):

trailing underscores are not allowed (in both decimal, hex and exponential)
underscores within the exponent part is allowed (1e3_00_0)
leading underscores in decimals (disallowed by PEP)
consecutive underscores are not allowed in any of the places where underscores are allowed

Deviating from the PEP I'd argue that leading underscores in hex are a nuisance.

Also please update the documentation (search for "rational and integer literals").

axic · 2017-10-23T09:41:52Z

Also there's this rule in the PEP:

For the b, x and o format specifiers, _ will be allowed and group by 4 digits.

e.g. 0x1_2_3 is not valid, but 0x1234_5678 is

Balajiganapathi · 2017-10-23T10:02:04Z

I agree with you that 0x_abc looks weird and not consistent with disallowing leading underscore for decimal numbers.

Also there's this rule in the PEP:
For the b, x and o format specifiers, _ will be allowed and group by 4 digits.
e.g. 0x1_2_3 is not valid, but 0x1234_5678 is

My understanding is that this is for the output format specifier, when converting from number to string. This is not a constraint on the syntax of a numeric literal. The grammar given does not specify this constraint.

I am working on adding the tests and disallowing leading underscore. Is this additional grouping constraint really needed?

Balajiganapathi · 2017-10-23T10:34:26Z

@axic I have added more tests and docs.

I also tried in python3.6 and it does allow arbitrary grouping of digits.

e.g. x = 0xab_cd_e_f is valid

pirapira · 2017-10-24T09:31:11Z

@axic do you still want precisely sized groupings?

axic · 2017-10-24T11:32:18Z

Even the PEP introductory section suggests that. I think it definitely is a better idea to enforce that (we can always lax the rules, but harder to make them strict once out in a lax version).

Balajiganapathi · 2017-10-24T11:45:13Z

@axic Ok will do this then.

Just to be clear the following will be invalid:

0x_abcd_cdef (leading underscore)
0xab_cd (not grouped by 4)

What about the following?
0x12345678_9abc - one group of 4 has underscore but other doesn't.
0x123_4567 - The total number of digits is not divisible by 4.

axic · 2017-10-24T11:49:01Z

Good questions!

0x12345678_9abc - one group of 4 has underscore but other doesn't.

I guess this is fine too (I would personally not allow it, but I guess could be used to show where the fixed point would be after conversion? Tiny use case though). Perhaps this is the point where lax rules gain ground.

0x123_4567 - The total number of digits is not divisible by 4.

I think this is fine.

Balajiganapathi · 2017-10-24T11:53:09Z

@axic Thanks One more question :)

Since we are allowing 0x123_4567, should we allow 0x1234_567 too? I think allowing only one of these 2 makes sense (and will make implementation slightly easier).

axic · 2017-10-24T11:56:20Z

I wouldn't allow 0x1234_567.

Balajiganapathi · 2017-10-24T12:22:05Z

Hi, I am thinking about the best way to implement this. I think this rule will cover all the cases. Or do you want more restrictive ones?

The number of digits to the right of each underscore should be a multiple of 4.
And ofcourse, no leading, trailing or double underscores

This will take care of all the above cases. But it will also allow such cases:
0x12341234_123456781234_abcd

chriseth · 2017-10-24T12:47:31Z

Due to our ambiguous handling of literals, we should allow both 0x123_4567 and 0x1234_567. The first makes sense if used in a number context, the second if used in a bytes32 context.

Ok, perhaps to complicate things even further:

If the literal ends in 4 hex digits without separator, the length of the first element is irrelevant. If it starts with four hex digits without separator, the last element has to have an even length. So 0x1234_567 would be disallowed, but 0x1234_5670 would not be.

axic · 2017-10-24T12:54:39Z

I'd argue that in a number context one could say leaving the leading 0 nibble out is acceptable (as it is a number), while in a bytes32 context it isn't since that refers to actual bytes.

0x1234_567 is very misleading, because even in a bytes32 context it will be assigned as 0x01234567...0.

chriseth · 2017-10-24T12:57:48Z

Exactly! Although the "context" should not influence whether the literal is valid or not, it should be visible from looking at the literal itself.

axic · 2017-10-24T12:59:30Z

I can't follow you, you say 0x1234_567 should be allowed because it makes sense for bytes32. I do not think it makes sense for it at all.

Looking at bytes32 x = 0x1234_567 I'd get the impression that it will be 0x12345670...0, but it will be 0x01234567...0.

chriseth · 2017-10-24T13:05:57Z

Ok, sorry, let me clarify: 0x1234_567 should not be allowed, but 0x123_4567 should be.

axic · 2017-10-24T13:06:40Z

Also I'd say the parser should not enforce any rules (perhaps apart from stray leading, trailing underscores), but rather have it in the SyntaxChecker.

Reason:

better error reporting
perhaps less code (could use a regular expression there).
and easier to describe it in grammar.txt

Balajiganapathi · 2017-10-24T13:47:42Z

@chriseth @axic , is the following rule fine?

If there is a semicolon in hex numeric literal, then it should be before every 4th digit from right.

but rather have it in the SyntaxChecker.

How about in RationalNumberType::isValidLiteral in Types.cpp file?

and easier to describe it in grammar.txt

I could not find this file in this repo :(

Balajiganapathi · 2017-10-24T13:48:42Z

Also I am assuming this does not apply to base 10 numerals as the rules of separating are different in different countries.

chriseth · 2017-10-24T14:01:49Z

@axic great idea about syntaxchecker! This would also automatically allow the parsing to continue in case of an error.

@Balajiganapathi I would prefer the following rule:

If there is no separator at all, it is fine.
If there is a separator, the number of hex digits between separators has to be exactly 4 and special rules about the part before the first separator and after the last one are:

If the literal ends in 4 hex digits without separator, the length of the first element can be arbitrary. If it starts with four hex digits without separator, the last element has to have an even length. If neither the first nor the last has 4 hex digits without separator, the literal is invalid.

Balajiganapathi · 2017-10-25T09:30:37Z

Not able to find out why ci build is failing. It passed in first commit. In next 2 commits I only made some cosmetic changes, yet the ci is showing build failed. I looked at build logs, it looks like it is randomly failing some.

axic · 2017-10-26T10:06:47Z

Downloading version soljson-v0.4.18+commit.9cf6e910.js
Hash mismatch: 0x0478b43de978b1af1d6d6d8c09e84cdb2cc8ed76218d38f17b841b6e539742f0 vs 0xcecc3e8b4d1a9bb6ceadae7f681def7982755cbe97a4d28c043d3136ccbe7df1

This failure is down to Travis, you cannot do much against it.

Balajiganapathi · 2017-10-26T10:57:21Z

@axic / @chriseth can you please review the changes.

chriseth · 2017-10-26T11:33:59Z

libsolidity/analysis/SyntaxChecker.cpp

Please add an assert that the first two characters are 0x.

chriseth · 2017-10-26T11:35:42Z

libsolidity/analysis/SyntaxChecker.cpp

Could you make the error message a little more specific? Something like
Invalid use of underscores in hex literal. Found inner part of length 2 (has to be 4 characters).

chriseth · 2017-10-26T11:37:29Z

libsolidity/analysis/SyntaxChecker.cpp

Please also make this error message more specific.

chriseth · 2017-10-26T11:41:14Z

libsolidity/parsing/Scanner.cpp

I think control flow would be clearer if you move the rollback(1) before the if statement (and of course save m_char in a local variable).

Ok, reading that again - why do you advance if you rollback later anyway?

axic · 2017-11-22T04:25:43Z

test/libsolidity/SolidityNameAndTypeResolution.cpp

Uses length and digits. I think it should just use one term and digits seems to be fine. This applies to the other messages too.

pirapira · 2017-12-08T09:57:00Z

Please rebase.

Balajiganapathi · 2017-12-09T04:49:13Z

@pirapira Rebased.

chriseth · 2017-12-11T11:05:29Z

@axic please merge if you are fine with it.

pirapira · 2017-12-12T11:37:05Z

Changelog.md

Please move this entry to 0.4.20.

chriseth · 2018-02-28T16:38:57Z

Rebased.

chriseth · 2018-03-06T15:42:56Z

Moved to 0.5.0

chriseth · 2018-03-09T13:27:46Z

@ekpyron @bit-shift does one of you want to continue this? There are still some unaddressed comments by @axic.

Balajiganapathi · 2018-03-09T13:44:34Z

@chriseth I will fix those this weekend.

axic · 2018-04-11T20:12:10Z

Rebased.

christianparpart · 2018-08-02T10:05:39Z

not sure how I closed this PR?

Balajiganapathi mentioned this pull request Oct 20, 2017

Underscores in number literals #1677

Closed

axic reviewed Oct 23, 2017

View reviewed changes

pirapira assigned axic Oct 24, 2017

chriseth reviewed Oct 26, 2017

View reviewed changes

libsolidity/analysis/SyntaxChecker.cpp Outdated

Copy link

Contributor

chriseth Oct 26, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add an assert that the first two characters are 0x.

chriseth reviewed Oct 26, 2017

View reviewed changes

libsolidity/analysis/SyntaxChecker.cpp Outdated

Copy link

Contributor

chriseth Oct 26, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please also make this error message more specific.

chriseth reviewed Oct 26, 2017

View reviewed changes

axic reviewed Nov 22, 2017

View reviewed changes

Balajiganapathi force-pushed the develop branch from 5f03a1d to 99be022 Compare December 9, 2017 03:00

pirapira suggested changes Dec 12, 2017

View reviewed changes

Changelog.md Outdated

Copy link

Contributor

pirapira Dec 12, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please move this entry to 0.4.20.

Balajiganapathi force-pushed the develop branch from 99be022 to 58832d9 Compare December 22, 2017 14:51

axic added the nextrelease label Feb 14, 2018

chriseth force-pushed the develop branch from 58832d9 to f65ecc4 Compare February 19, 2018 14:27

chriseth mentioned this pull request Feb 22, 2018

Disallow combination of hex literals and units like wei or hours #3574

Closed

chriseth force-pushed the develop branch from f65ecc4 to 1c8f21c Compare February 28, 2018 16:38

chriseth force-pushed the develop branch from 1c8f21c to 09d68cc Compare March 6, 2018 14:55

pirapira previously approved these changes Mar 6, 2018

View reviewed changes

Balajiganapathi force-pushed the develop branch from 09d68cc to ab9b6dd Compare April 2, 2018 16:14

axic removed the nextrelease label Apr 5, 2018

axic dismissed stale reviews from pirapira and chriseth via 891fbac April 11, 2018 20:12

axic force-pushed the develop branch from ab9b6dd to 891fbac Compare April 11, 2018 20:12

axic force-pushed the develop branch from 891fbac to df71ea9 Compare April 11, 2018 20:17

christianparpart self-assigned this Aug 1, 2018

christianparpart closed this Aug 2, 2018

christianparpart force-pushed the develop branch from df71ea9 to 9ec3fd1 Compare August 2, 2018 09:57

This was referenced Aug 2, 2018

[DON'T REVIEW] Allow underscores in numbers #4655

Closed

[BREAKING] Underscores in numeric literals #4684

Merged

Conversation

Balajiganapathi commented Oct 20, 2017 • edited by axic Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

axic commented Oct 23, 2017

Uh oh!

axic commented Oct 23, 2017

Uh oh!

Balajiganapathi commented Oct 23, 2017

Uh oh!

Balajiganapathi commented Oct 23, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

pirapira commented Oct 24, 2017

Uh oh!

axic commented Oct 24, 2017

Uh oh!

Balajiganapathi commented Oct 24, 2017

Uh oh!

axic commented Oct 24, 2017

Uh oh!

Balajiganapathi commented Oct 24, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

axic commented Oct 24, 2017

Uh oh!

Balajiganapathi commented Oct 24, 2017

Uh oh!

chriseth commented Oct 24, 2017

Uh oh!

axic commented Oct 24, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

chriseth commented Oct 24, 2017

Uh oh!

axic commented Oct 24, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

chriseth commented Oct 24, 2017

Uh oh!

axic commented Oct 24, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Balajiganapathi commented Oct 24, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Balajiganapathi commented Oct 24, 2017

Uh oh!

chriseth commented Oct 24, 2017

Uh oh!

Balajiganapathi commented Oct 25, 2017

Uh oh!

axic commented Oct 26, 2017

Uh oh!

Balajiganapathi commented Oct 26, 2017

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

axic Nov 22, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

pirapira commented Dec 8, 2017

Uh oh!

Balajiganapathi commented Dec 9, 2017

Uh oh!

chriseth commented Dec 11, 2017

Uh oh!

Balajiganapathi commented Oct 20, 2017 •

edited by axic

Loading

Balajiganapathi commented Oct 23, 2017 •

edited

Loading

Balajiganapathi commented Oct 24, 2017 •

edited

Loading

axic commented Oct 24, 2017 •

edited

Loading

axic commented Oct 24, 2017 •

edited

Loading

axic commented Oct 24, 2017 •

edited

Loading

Balajiganapathi commented Oct 24, 2017 •

edited

Loading

axic Nov 22, 2017 •

edited

Loading