Parse fixes #1

thaliaarchi · 2024-08-19T08:58:52Z

This addresses all the parsing bugs I found when I surveyed your assembler. The largest change is to move to strongly typed argument handling by scanning arguments as consistent tokens, before invoking each opcode's function. I have written more details in the commit messages and each commit is atomic.

I found that any exceptions thrown within the promises cannot be caught by the enclosing block and have context for the line added, since they are not awaited there, thus most validation errors are thrown as UnhandledPromiseRejection. I was tempted to duplicate the try/catch block within both promises, but that would be quite lengthy. Why did you choose the promise approach? The problem is sequential and each promise waits for the result of the previous, so there is no parallelism. I was tempted to refactor that, but thought you might want it.

Opcode mnemonics are compared using String.prototype.toLocaleLowerCase, which has locale-aware case folding. Thus, in Turkish (LANG=tr) and Azeri (LANG=az) locales, İ (U+0130, LATIN CAPITAL LETTER I WITH DOT ABOVE) compares equal to i (U+0069, LATIN SMALL LETTER I). In all locales, K (U+212A, KELVIN SIGN) compares equal to k (U+006B, LATIN SMALL LETTER K), but no mnemonics use k. The characters used are [a-eg-jl-pr-z]. Use String.prototype.toLowerCase, since it has the correct ASCII-only folding for the characters used.

Currently, the parsing of arguments is left up to each instruction, making the behavior inconsistent. Required arguments can be omitted and are substituted with 0. Opcodes that don't take arguments ignore any arguments that are given. `storestr` mangles a char argument, because it converts it to the decimal string representation of its code unit. => Scan arguments into tokens and check their types before invoking the function for the opcode. Represent this within the TypeScript type system using discriminated unions, so it is strongly typed. Throw on malformed input. It relies heavily on splitting lines by exactly one space, making it brittle to other whitespace characters like tab or repeated spaces. At the start and end of the line, Unicode whitespace may be used due to `String.prototype.trim`. => Consistently use /\s/ while parsing and create a "-quoted space token, so that strings with spaces can still be represented. Single-word strings are still allowed.

With the current state of variable semantics, it doesn't make sense to treat a leading underscore in label or import position as a variable.

thaliaarchi · 2024-08-19T09:03:12Z

Some other things not addressed in this PR:

Your breakpoint extension instruction is written debugger for your assembler and printed as dbg by your debugger. Which do you consider canonical? This difference makes the output of the disassembler not able to be fed into the assembler.
wsa-tests/count.wsa uses test, presumably from Burghard, but it is not supported by your assembler.

Do not add internal labels to the label map, so a user-supplied label with the same name as an internal label will receive a different ID. In such cases, the disassembly in the debugger will still show the same name for both, but that only affects display.

voliva

Sorry I took this long to review it - I wanted to take a deeper look since it's a substantial refactor :D

I like it, it's massively better than what I had before, with a proper tokenization step. Thank you so much!

I've licensed the repo under MIT, I hope that's alright for your contribution.

voliva · 2024-08-29T08:21:04Z

I found that any exceptions thrown within the promises cannot be caught by the enclosing block and have context for the line added, since they are not awaited there, thus most validation errors are thrown as UnhandledPromiseRejection. I was tempted to duplicate the try/catch block within both promises, but that would be quite lengthy. Why did you choose the promise approach? The problem is sequential and each promise waits for the result of the previous, so there is no parallelism. I was tempted to refactor that, but thought you might want it.

Mhh I'm not sure where the UnhandledPromiseRejection happens. In theory Promise.resolve().then(() => { throw "foo" }) should return a promise that rejects (so it can be handled from outside). All of the promises are added into the results array and then awaited through Promise.all, which also handles it correctly. The promise returned by the compile function should be capturing all of the errors.

I didn't add promises for concurrency, but to not constraint the compiling process into a synchronous operation. I know NodeJS has sync APIs to read files, but by abstracting it to the getIncludedStream: (filename: string) => LineStream makes it asynchronous, and in mind this could help later on when building with a web UI. The Promises then are just a way of wrapping and working around this.

Your breakpoint extension instruction is written debugger for your assembler and printed as dbg by your debugger. Which do you consider canonical? This difference makes the output of the disassembler not able to be fed into the assembler.

Good point. I think the canonical should be dbg - Feels a bit more consistent with the compact nature of the others.

I'll probably also change the "debugExtensions" into something which is not just "debug", as that also includes bitwise opcodes. Or maybe split into different flags.

wsa-tests/count.wsa uses test, presumably from Burghard, but it is not supported by your assembler.

Yes, I see it was from very early on while I was still implementing the initial version.

thaliaarchi · 2024-09-09T06:34:35Z

Thanks for your review! I see I missed the email notification from two weeks ago; sorry about that.

thaliaarchi added 6 commits August 7, 2024 23:24

Fix typos

cb459be

Add debug build with no minification

11f733a

Support an optional argument for and and or

920eb86

Allow labels and imports starting with underscore

2f3e595

With the current state of variable semantics, it doesn't make sense to treat a leading underscore in label or import position as a variable.

thaliaarchi force-pushed the parse-fixes branch from ceb0de5 to 2f3e595 Compare August 19, 2024 09:41

voliva approved these changes Aug 29, 2024

View reviewed changes

voliva merged commit 0039336 into voliva:main Sep 9, 2024

thaliaarchi deleted the parse-fixes branch December 3, 2024 04:43

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Parse fixes #1

Parse fixes #1

thaliaarchi commented Aug 19, 2024

thaliaarchi commented Aug 19, 2024

voliva left a comment •

edited

Loading

voliva commented Aug 29, 2024 •

edited

Loading

thaliaarchi commented Sep 9, 2024

Parse fixes #1

Parse fixes #1

Conversation

thaliaarchi commented Aug 19, 2024

thaliaarchi commented Aug 19, 2024

voliva left a comment • edited Loading

Choose a reason for hiding this comment

voliva commented Aug 29, 2024 • edited Loading

thaliaarchi commented Sep 9, 2024

voliva left a comment •

edited

Loading

voliva commented Aug 29, 2024 •

edited

Loading