Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add more integer notations for parsing #897

Merged
merged 1 commit into from
Sep 16, 2024

Conversation

MrDaiki
Copy link
Collaborator

@MrDaiki MrDaiki commented Sep 11, 2024

Issue

Parsing integer in Jasmin compiler are quite limited for the moment. We can easily add more syntaxes by only modifying a few lines in the lexer :

reg u64 x,y,z;
x = 1_000_000_0000_0 //separating underscore for readability
y = 0b1110110 // binary 
z = 0o13671 // octal, supported by default by Z.of_string so there is no cost to add it

Changelog

  • modification of lexer to support those syntaxes

| ((*'-'?*) digit+) as s
{INT s}
| ((*'-'?*) digit+('_'digit+)*) as s
{INT Str.(global_replace (regexp "_") "" s)}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You need to keep the exact representation.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed in last commit, the function that parse int_representation type delete the '_' before converting to Z.t


| ('0' ['x' 'X'] hexdigit+) as s
| ('0' ['x' 'X' 'b' 'B' 'o' 'O'] hexdigit+) as s
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are we sure we want to accept 0babcd and 0o99 as valid literals?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I corrected it in the last commit, I didn't saw the error at the time

| ('0' ['b' 'B'] bindigit+('_'bindigit+)*) as s
{INT s}

| ('0' ['o' 'O'] octdigit+('_'octdigit+)*) as s
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why forbid consecutive underscores (as in 0xAA__BB)?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There was no reasons. I just didn't though of it at time. It is possible now

@@ -24,7 +24,9 @@ type castop1 = CSS of sowsize | CVS of svsize
type castop = castop1 L.located option

type int_representation = string
let parse_int = Z.of_string
let parse_int (i:int_representation):Z.t =
let s = Str.(global_replace (regexp "_") "" i) in
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please don’t use Str. If you open Utils, the module String will contain replace_chars and filter.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Corrected (I didn't knew it is necessary to open Utils to use filter)

CHANGELOG.md Outdated
@@ -31,6 +31,8 @@
- Preserve formatting of integer literals in the lexer and when pretty-printing to LATEX
([PR #886](https://github.com/jasmin-lang/jasmin/pull/886)).

- Adding support for new integer notations similar to Ocaml and Rust langage integer
([PR#897](https://github.com/jasmin-lang/jasmin/pull/897)).
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you add more description: 0b.... 0o... 0x... and _.

Comment on lines 138 to 139
(* Why this is needed *)
| ((*'-'?*) digit+) as s
| ((*'-'?*) digit+(('_')+ digit+)*) as s
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It might be worth removing this legacy comment.

@vbgl vbgl self-assigned this Sep 16, 2024
At the source level, integer literal can now be written in

  - binary with a 0b or 0B prefix: 0b10001111
  - octal with a 0o or 0O prefix: 0o7722
  - decimal without any explicit prefix: 42
  - hexadecimal with a 0x or 0X prefix: 0xabcd

All these notations may contain underscores (except at the beginning),
e.g., 0b1111_0000 or 10_854_736. Caveat _1234 is an indentifier, not a
number.

Co-authored-by: Vincent Laporte <Vincent.Laporte@inria.fr>
@vbgl vbgl added this to the 2024.07.1 milestone Sep 16, 2024
@vbgl vbgl merged commit 5603a32 into jasmin-lang:main Sep 16, 2024
1 check passed
@vbgl vbgl deleted the jasmin-integer-notation branch September 16, 2024 14:08
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants