Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

New Mindcode syntax #148

Open
cardillan opened this issue Sep 24, 2024 · 9 comments
Open

New Mindcode syntax #148

cardillan opened this issue Sep 24, 2024 · 9 comments
Labels
enhancement New feature or request

Comments

@cardillan
Copy link
Owner

cardillan commented Sep 24, 2024

Some of the planned changes have already been implemented as of release 2.4.0:

  • Block comments: /* this is a block comment that can span several lines */
  • Enhanced comments (///) for generating remarks.
  • Program parametrization through the param keyword. Global variables are no longer suitable for program parametrization.
  • Support for omitting any optional argument from a function call: getBlock(x, y, , floor) - the third parameter is optional and the argument may be omitted. Output-only parameters are optional.
  • Support for output parameters in user defined functions.
  • Support for void functions (functions not returning any value) -- new void keyword.
  • Support for function varagrs (only for inline functions)
  • All names of Mindcode built-ins (e.g. @coal) will always be specified including the @ sign, e.g. vault1.@coal. The non-prefixed variant will be deprecated (see Kebab-case identifiers and the @ sign in mlog identifiers #160).

Currently, there are two version of the syntax: strict and relaxed. The relaxed syntax is mostly compatible with Mindcode version 2.2.0 and earlier. Furthermore, there's a lot of deprecated language features. These deprecations, as well as the differences between the strict and relaxed syntax are described here.

The deprecated features and the relaxed version of the syntax will be eventually removed, probably no sooner than January 2025.

The Strict/Relaxed syntax modes will be probably supported even in the future to make adapting to the changes in the syntax easier, and/or to support a lighter variant of the syntax more suited for short, simple scripts.

Additional planned changes

  • Empty bodies of ifs, loops, functions and other constructs will be allowed.
  • It will be possible to invoke properties on expressions, e.g. getlink(n).enabled = false.
  • Increment and decrement operators (++, --) will be supported in both prefix and postfix forms.
  • It will not be possible to declare a function within another function or code block.
  • Names of all variables (probably except program parameters) will be mangled.
    • Variables which aren't mangled today will be mangled in some simply way, say var --> :var
    • Names of linked blocks won't be mangled (obviously)

Strict syntax

  • All variables (local/main, global, external, linked blocks) will have to be explicitly declared.
    • Variables declared outside functions/code blocks are global. Variables declared inside functions/code blocks are local.
    • Constants, program parameters, external variables and linked blocks need to be declared at the global level only.
  • The $ prefix in external variable names will not be supported.
  • All code needs to be enclosed in function definitions or main code block. There needs to be exactly one main code block (begin ... end;).
    • Identifiers, which are today forbidden for being synonymous with linked blocks (e.g. wave1), will be allowed for regular variables, as they'll be mangled to :wave1, not colliding with block names.

Relaxed syntax

  • Global variable names will be recognized by upper-case identifiers. Additionally, it will be possible to declare global variables with lower-case names.
  • External variables will be identified by the $ prefix. Additionally, it will be possible to declare external variables without the prefix.
  • Enclosing the main body of the program in a code block will be optional.

Example of the planned syntax

An example of a code rewritten to the strict variant of the new syntax:

param LINK_ID = 1;                      // Program parameter
const QUERY_BASE = 99999900;            // Constant
var QUERY_FLAG, ANSWER_FLAG;            // Global variables
var SERVICED = 0;                       // Global variable with initialization

begin
    var link_id = max(min(LINK_ID, 99), 0);             // Local variable, for sanitizing the program parameter
    QUERY_FLAG = QUERY_BASE + link_id;
    var position = 100 * (vault1.x + @mapw * vault1.y);
    ANSWER_FLAG = position + link_id;

    while true do
        var start = @time;
        for var unit in @mono, @poly, @mega, @quad, @oct, @flare, @horizon, @zenith, @antumbra, @eclipse do
            procesUnit(unit);
        end;
        print($"Remote vault [gold]#$link_id[]\n");
        print($"Queries serviced: [green]$SERVICED[]\n");
        print($"[lightgray]Loop time: $ ms", floor(@time - start));
        printflush(message1);
    end;
end;

inline void procesUnit(unit)
    ubind(unit);
    if @unit.flag == QUERY_FLAG then
        flag(ANSWER_FLAG);
        SERVICED++;
    end;
end;
@cardillan cardillan added the enhancement New feature or request label Sep 24, 2024
@cardillan
Copy link
Owner Author

I planned to make semicolons compulsory to make the grammar more rigid in the hope it will improve error messages. However, the true nature of wrong error messages turned out to be something different (see #156). According to some of my experiments it doesn't seem that optional semicolons make the error messages less focused.

Nevertheless, I'm still going to make semicolons compulsory to remove harmful ambiguities in the syntax:

print "hi" compiles when semicolons are optional, but produces no code. This is horrible, because the nature of the error (missing parentheses around "hi") isn't at all apparent.

Another issue is this loop:

for i in 1 .. SIZE - 1
    print(i)
end

can be (and was) parsed identically to

for i in 1 .. SIZE do
    -1;        // Does nothing, but is legal
    print(i);
end

That was unexpected, to say the least. 😳

Making semicolons compulsory, as well as the do and then keywords, would handle this. In some future version, maybe the next one, the semicolons and do/then keywords will be compulsory by default and there will be an option in both the web app and command-line compiler to make them optional, for some transitional period of time.

@limonovthesecond2
Copy link
Contributor

Suggestion: After mandatory do in loops, I think it's possible to change the loop syntax from do-loop-while; to do-while;. This syntax is a little shorter and more memorable since it is the only place where the loop keyword is used.

@cardillan
Copy link
Owner Author

cardillan commented Oct 10, 2024

You're right about loop, I'll do this and remove the loop keyword eventually.

Edit: unfortunately, removing the loop from loop while messes up the error reporting in a big way again. My hunch is that without the compulsory loop there's a lot of ambiguities, which causes the parser to get lost. So I'm not going to remove the loop from loop while until I'm able to solve this issue, which might not be anytime soon.

I plan to rewrite the grammar from scratch, which might help avoid some pitfalls in the current grammar, especially as I'd love to remove all the ambiguities from the grammar. Maybe I'll be able to avoid the loop keyword then. I'm not an expert on grammar design, though, so it's far from certain I'll be able to meet these objectives.

@cardillan
Copy link
Owner Author

For the next release, I plan to push the syntax towards compulsory semicolons and do/then keywords in some way. There's a problem with existing code not compiling under the new syntax. Let's consider this code as an example:

const RADIUS_WITHIN     = 8
const RADIUS_APPROACH   = 6

const SUPPLY_INTERVAL   = 50 - 3        // Some comment

const UNIT_CHECK_TIME   = 5000          // Some comment

DOME = dome1
while DOME == null
    print("[gold]Waiting for an overdrive dome to be connected...")
    printflush(message1)
    DOME = dome1
end

When compiled using the new syntax, it produces the following list of errors (note the error reporting got way better thanks to fixing #156):

a.mnd:2:1 ERROR: missing ';' at 'const'
a.mnd:4:1 ERROR: missing ';' at 'const'
a.mnd:6:1 ERROR: 'const': no viable alternative at input '-3const'
a.mnd:6:1 ERROR: missing ';' at 'const'
a.mnd:8:1 ERROR: missing ';' at 'DOME'
a.mnd:9:1 ERROR: missing ';' at 'while'
a.mnd:10:5 ERROR: missing 'do' at 'print'
a.mnd:11:5 ERROR: missing ';' at 'printflush'
a.mnd:12:5 ERROR: missing ';' at 'DOME'
a.mnd:13:1 ERROR: missing ';' at 'end'
a.mnd:14:1 ERROR: missing ';' at '<EOF>'

The missing semicolons are reported at the beginning of the next statement. That's unfortunate and the third error report looks very strange - there's no -3const in the source file, but ANTLR lists the sequence of tokens (which excludes whitespace and comments). It gives the proper position of the error in the file, but it doesn't help much in the web application.

To make the transition easier, I can support two versions of the syntax for a while, which I'm calling strict and relaxed for the moment. Under the relaxed syntax, the above code would compile with no problem. Unfortunately, parametrizing the grammar affects the error reporting negatively and I don't know how to avoid that:

a.mnd:2:1 ERROR: 'const': no viable alternative at input 'const'
a.mnd:4:1 ERROR: 'const': no viable alternative at input 'const'
a.mnd:6:1 ERROR: 'const': no viable alternative at input 'const'
a.mnd:8:1 ERROR: 'DOME': no viable alternative at input 'DOME'
a.mnd:9:1 ERROR: 'while': no viable alternative at input 'while'
a.mnd:10:5 ERROR: 'print': no viable alternative at input 'print'
a.mnd:11:5 ERROR: 'printflush': no viable alternative at input 'printflush'
a.mnd:12:5 ERROR: 'DOME': no viable alternative at input 'DOME'
a.mnd:13:1 ERROR: 'end': no viable alternative at input 'end'
a.mnd:14:1 ERROR: '<EOF>': no viable alternative at input '<EOF>'

The errors are reported at the same positions, but the error messages aren't very descriptive. I could intercept them and reword them into some other message - maybe "Cannot parse source file, check for missing ';' or keyword in the previous expression", but I'm not sure it is optimal.

I was, however, able to add explicit handling for missing semicolons, which produces the following output:

a.mnd:1:28 ERROR: Missing a semicolon
a.mnd:2:28 ERROR: Missing a semicolon
a.mnd:4:33 ERROR: Missing a semicolon
a.mnd:6:31 ERROR: Missing a semicolon
a.mnd:8:13 ERROR: Missing a semicolon
a.mnd:10:5 ERROR: 'print': no viable alternative at input 'print'
a.mnd:10:68 ERROR: Missing a semicolon
a.mnd:11:25 ERROR: Missing a semicolon
a.mnd:12:17 ERROR: Missing a semicolon
a.mnd:13:4 ERROR: Missing a semicolon

It has two problems, though: firstly, it makes parsing slower. It appears we'll just get to the times that were usual prior to the #156 fix, so maybe it wouldn't be a problem.

Secondly, the missing semicolon will be reported absolutely everywhere where adding a semicolon would allow the parser to resolve the last token encountered:

8abc;
fluffy!bunny;
print(");

b.mnd:1:2 ERROR: Missing a semicolon
b.mnd:2:7 ERROR: Missing a semicolon
b.mnd:3:7 ERROR: token recognition error at: '");\r'
b.mnd:3:6 ERROR: Missing a semicolon
b.mnd:5:1 ERROR: '<EOF>': no viable alternative at input '('

Despite the disadvantages, I'd probably go with the last solution, as it seems to help most in making the necessary changes to existing Mindcode - it is now possible to integrate the compiler output with an IDE and obtain clickable links from the error output.

In the future, support for the relaxed syntax will be certainly removed and I'll see whether keeping the special semicolon handling would still be worth the downsides.

I'd love to hear what y'all think.

@limonovthesecond2
Copy link
Contributor

I plan to push the syntax towards compulsory semicolons and do/then keywords... The code block is enclosed between begin and end keywords

This makes Mindcode more like Pascal, which I think is funny :)

It will not be possible to declare a function within another function or code block

Will it be mandatory to declare functions before the main code block?

Secondly, the missing semicolon will be reported absolutely everywhere where adding a semicolon would allow the parser to resolve the last token encountered

It looks a little weird, but not terrible. Anyway, the error report is much better now.

it is now possible to integrate the compiler output with an IDE and obtain clickable links from the error output

This is amazing! To think that a project with one error message for all situations has grown so much now. This is very inspiring, especially remembering your words 9 months ago: "These syntax error messages are not very intuitive, but I have no idea how to improve them".

@cardillan
Copy link
Owner Author

Pascal was one of the first programming languages I learned, so perhaps I'm unconsciously pushing Mindcode in that direction...

Seriously, though, Mindcode now really needs one-liner if statements. Currently we have

if condition then statement; end;

and that sucks, because the end is easy to forget and a pain to type everywhere.

I'm considering two (well, three) possibilities:

  1. The Ruby way - condition modifiers: statement if condition;. Very unintuitive if you're not a Rubyist. Also in Ruby it all needs to be on a single line. In Mindcode, we ignore line endings, which means this could be split on two lines, making the damned thing utterly confusing. This is probably not a good idea.
  2. Butchering the ternary operator: condition ? statement;. It will take some time getting used to. Probably the best.
  3. The Pascal way. Pro: Pascal syntax is well designed and time-tested. Con: I'm not sure I want to evolve Mindcode into Pascal. Plus this would mean another rather significant change to existing sources.

Will it be mandatory to declare functions before the main code block?

Function declarations will be still allowed in any order, before or after the main block (or both).

"These syntax error messages are not very intuitive, but I have no idea how to improve them".

That's still true, except that I stumbled upon a strange construct in the grammar definition and simplifying it fixed the issue. Relatively small changes to the grammar definition can alter error resolution in ways I don't understand. I plan to rewrite the grammar from the start, which should bring new insights, but I need to prepare some test suite before that to see that I'm not breaking the error reporting on the go.

By the way, the web app will have clickable error messages too, which should help a lot.

@limonovthesecond2
Copy link
Contributor

The Pascal way

A little reminder of how this is implemented in Pascal. Here is an example of one-line if-else expressions:

if x > 10 then write('True');
if x > 10 then write('True') else write('False');

Since there is no begin-end; blocks after the then keyword, only one operation will be executed. To execute the else block, the semicolon must be omitted before the else keyword. Otherwise, this block will be separated from the if statement, which will result in an error.
Here's what it looks like in a multi-line expression:

if x > 10 then
  write('True')
else
  write('False');

This is what it looks like if there is more than one operation to perform:

if x > 10 then
  begin
    write('True');
    write('True');
  end
else
  begin
    write('False');
    write('False');
  end;

What can be seen here? The operations are inside begin-end; blocks, and the semicolon before else is omitted. Now compare with Mindcode syntax:

if x > 10 then
    print('True');
    print('True');
else
    print('False');
    print('False');
end;
if x > 10 then print('True'); else print('False'); end;

The single line expression is not as pretty as in Pascal, but the multiline one looks a little better to me.

Butchering the ternary operator

Adding a ternary operator would definitely be a nice addition

(x > 10) ? print("True") : print("False");
result = (x > 10) ? "True" : "False";

@cardillan
Copy link
Owner Author

That's right about Pascal. As far as I recall, I used to format the source code like this:

if x > 10 then begin
    write('True');
    write('True');
end else begin
    write('False');
    write('False');
end;

I do like the current Mindcode version better - it eliminates typing quite a few begin and end keywords.

Adding a ternary operator would definitely be a nice addition

Mindcode already has the ternary operator and both of your examples compile. You don't even need the parentheses around the condition. They work just like if, except they need exactly one expression/statement in each branch.

What I propose is supporting the ternary operator without the false branch (well, it wouldn't be ternary anymore), e.g.

x > 10 ? print("More");

to use as a conditional one-liner. Rewriting a snippet of my code using this I get:

        if TARGET > 0 then
            print($"\nUsing [green]$active/$TARGET[] units ($UNIT) [gold]+$items_in_transit");
            CHANGE > 0 ? print("\n[][salmon]Cannot acquire additional units![]");
        end;
        print($"\n[]Local items: [gold]$container_items");
        EFF_LOCAL_LIMIT < 100 ? print($"[] (limit [orange]$LOCAL_MARGIN[])");
        if SHOW_REMOTE_LEVEL then
            remote_level = CORE.sensor(ITEM);
            print($"\n[]Remote items: [gold]$remote_level");
            EFF_REMOTE_LIMIT < 100 ? print($"[] (limit [orange]$REMOTE_MARGIN[])");
        end;

I don't like it much. ? is not a keyword and doesn't really stand out in the code, unlike if. (Of course, even if we support this, no one is required to use it.)

What about iif having just the true branch with a single statement?

        if TARGET > 0 then
            print($"\nUsing [green]$active/$TARGET[] units ($UNIT) [gold]+$items_in_transit");
            iif CHANGE > 0 then print("\n[][salmon]Cannot acquire additional units![]");
        end;
        print($"\n[]Local items: [gold]$container_items");
        iif EFF_LOCAL_LIMIT < 100 then print($"[] (limit [orange]$LOCAL_MARGIN[])");
        if SHOW_REMOTE_LEVEL then
            remote_level = CORE.sensor(ITEM);
            print($"\n[]Remote items: [gold]$remote_level");
            iif EFF_REMOTE_LIMIT < 100 then print($"[] (limit [orange]$REMOTE_MARGIN[])");
        end;

It feels confusing and quirky.

Sometimes I wish we had C/C#/Java code blocks - { }, but then the keywords (do, then) would feel inappropriate and I don't want to make a change this large anyway - even the Pascal syntax would be closer to the current Mindcode.

Looks like the current syntax for conditions may be better than any of the alternatives.

@cardillan
Copy link
Owner Author

I've updated the main comment to reflect features that were already implemented, plus advances in planning the new syntax.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants