-
Notifications
You must be signed in to change notification settings - Fork 26
Fix parsing of integer literals with base prefix #106
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Fix parsing of integer literals with base prefix #106
Conversation
9452423
to
23f8ab4
Compare
After merging #107 the tests now pass. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Specifying base 0 was not a solution, as this resulted in parsing behaviour different from GNU as.
I guess the simplest fix here would be to just replace int(x)
with int(x, 0)
. That should restore the existing behaviour. But it looks like you want to improve things further, which is great!
Hey @wnienhaus , tested with the ulp programs on my s2, and had no issues 👍 |
Yes, That said, since def parse_int(literal):
if len(literal) > 2:
prefix_start = 1 if literal[0] == '-' else 0 # skip over negative sign if present
if literal[prefix_start] == '0' and literal[prefix_start+1] in '123456789':
return int(literal, 8)
return int(literal, 0) and all tests still pass. So it's really just the octal case that's different (and theoretically we should disallow python style octal ( Now I am starting the overthink this:
I think I'll keep the current approach, as it's very explicit about what we support (including explicitly supporting python style octal). I'll just remove the comment about legacy octal format, because from the GNU as perspective, it's the currently valid and only possible octal format. (Happy to get feedback on my chosen approach) |
Ok. Fixes pushed. Will squash-merge this once approved. |
e4a7e33
to
f7dfddc
Compare
remove duplication
f7dfddc
to
5c84d08
Compare
IMO that's quite a bit simpler and easier to read/understand. I would vote for this approach. Or a little simpler still: def parse_int(literal):
if len(literal) >= 2 and literal.startswith(("0", "-0")) and literal.lstrip("-0").isdigt():
return int(literal, 8)
return int(literal, 0) Note: I think you need the |
Simply treat the special octal case and let Python handle the rest. Also add extra tests, to show additional edge cases that are handled correctly.
Thanks for that feedback. Shorter is nice, so let's go with that. MicroPython's The I added some extra tests to show more cases that (should) work as expected, e.g. Anyway, your proposed code handles all cases correctly, so I used it now. And now there are tests to verify they all work as intended. Thanks. |
Ah, it will in the next release! But in the interest of backwards compatibility, best not to rely on that yet.
Very good! |
MicroPython 1.25.0 introduced a breaking change, aligning the behaviour of the
int()
function closer to the behaviour of CPython (something along the lines of: strings are assumed to represent a decimal number, unless a base is specified. if a base of 0 is specified, is the base is inferred from the string)This broke our parsing logic, which relied on the previous behaviour of the
int()
function to automatically determine the base of the string literal, based on a base prefix present in the string. Specifying base 0 was not a solution, as this resulted in parsing behaviour different from GNU as.Additionally, we never actually parsed octal in the format
0100
correctly - even before this PR; that number would have been interpreted as 100 rather than 64.So, to fix this, and to ensure our parsing matches the GNU assembler, this PR implements a custom
parse_int()
function, using the base prefix in a string to determine the correct base to pass toint()
. The following are supported:The
parse_int
method also supports the negative prefix operator for all of the above cases.This change also ensures
.int
,.long
,.word
directives correctly handle the above mentioned formats. This fixes the issue described in #104.Note: GNU as does not actually accept the octal prefix
0o...
, but we accept it as a convenience, as this is accepted in Python code. This means however, that our assembler accepts code which GNU as does not accept. But the other way around, we still accept all code that GNU as accepts, which was one of our goals.