-
Notifications
You must be signed in to change notification settings - Fork 1
Parse fixes #1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Parse fixes #1
Conversation
Opcode mnemonics are compared using String.prototype.toLocaleLowerCase, which has locale-aware case folding. Thus, in Turkish (LANG=tr) and Azeri (LANG=az) locales, İ (U+0130, LATIN CAPITAL LETTER I WITH DOT ABOVE) compares equal to i (U+0069, LATIN SMALL LETTER I). In all locales, K (U+212A, KELVIN SIGN) compares equal to k (U+006B, LATIN SMALL LETTER K), but no mnemonics use k. The characters used are [a-eg-jl-pr-z]. Use String.prototype.toLowerCase, since it has the correct ASCII-only folding for the characters used.
Currently, the parsing of arguments is left up to each instruction, making the behavior inconsistent. Required arguments can be omitted and are substituted with 0. Opcodes that don't take arguments ignore any arguments that are given. `storestr` mangles a char argument, because it converts it to the decimal string representation of its code unit. => Scan arguments into tokens and check their types before invoking the function for the opcode. Represent this within the TypeScript type system using discriminated unions, so it is strongly typed. Throw on malformed input. It relies heavily on splitting lines by exactly one space, making it brittle to other whitespace characters like tab or repeated spaces. At the start and end of the line, Unicode whitespace may be used due to `String.prototype.trim`. => Consistently use /\s/ while parsing and create a "-quoted space token, so that strings with spaces can still be represented. Single-word strings are still allowed.
With the current state of variable semantics, it doesn't make sense to treat a leading underscore in label or import position as a variable.
Some other things not addressed in this PR:
|
ceb0de5
to
2f3e595
Compare
Do not add internal labels to the label map, so a user-supplied label with the same name as an internal label will receive a different ID. In such cases, the disassembly in the debugger will still show the same name for both, but that only affects display.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry I took this long to review it - I wanted to take a deeper look since it's a substantial refactor :D
I like it, it's massively better than what I had before, with a proper tokenization step. Thank you so much!
I've licensed the repo under MIT, I hope that's alright for your contribution.
Mhh I'm not sure where the UnhandledPromiseRejection happens. In theory I didn't add promises for concurrency, but to not constraint the compiling process into a synchronous operation. I know NodeJS has sync APIs to read files, but by abstracting it to the
Good point. I think the canonical should be I'll probably also change the "debugExtensions" into something which is not just "debug", as that also includes bitwise opcodes. Or maybe split into different flags.
Yes, I see it was from very early on while I was still implementing the initial version. |
Thanks for your review! I see I missed the email notification from two weeks ago; sorry about that. |
This addresses all the parsing bugs I found when I surveyed your assembler. The largest change is to move to strongly typed argument handling by scanning arguments as consistent tokens, before invoking each opcode's function. I have written more details in the commit messages and each commit is atomic.
I found that any exceptions thrown within the promises cannot be caught by the enclosing block and have context for the line added, since they are not awaited there, thus most validation errors are thrown as
UnhandledPromiseRejection
. I was tempted to duplicate thetry
/catch
block within both promises, but that would be quite lengthy. Why did you choose the promise approach? The problem is sequential and each promise waits for the result of the previous, so there is no parallelism. I was tempted to refactor that, but thought you might want it.