Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[breaking change] Stop supporting script tag based execution of part files #52575

Closed
eernstg opened this issue May 31, 2023 · 16 comments
Closed
Assignees
Labels
area-meta Cross-cutting, high-level issues (for tracking many other implementation issues, ...). breaking-change-approved breaking-change-request This tracks requests for feedback on breaking changes

Comments

@eernstg
Copy link
Member

eernstg commented May 31, 2023

Intended change in behavior

Currently, dart allows a part file to be treated as a script (where a 'script' is spec-ese meaning "a library that determines an executable program", aka "an entry point"), and it allows a script tag to occur as the first line of the part file. For example:

Here is the part file 'n016part.dart':

#!/usr/bin/env dart
part of 'n016.dart';
void run() => print("Running!");

Here is the library 'n016.dart':

part 'n016part.dart';
void main() => run();

This issue is a request for a breaking bug fix: The CFE should report part files like 'n016part.dart' as a syntax error (or as any other compile-time error, if that is more convenient), because the grammar rules deriving the contents of a part file does not include script tags.

The analyzer already reports a directive_after_declaration error on 'n016part.dart', which could be taken as evidence that the breakage is minimal. However, the analyzer does not report any errors on 'n016.dart', in spite of the fact that 'n016.dart' cannot be analyzed without loading 'n016part.dart'.

Justification for making this change

First, it seems very likely that the change is only minimally breaking, because of the error reported by the analyzer.

Next, the ability to run a part file as a script might be useful in the case where it could be used to run many different variants of the associated entry point. However, this mechanism does not work unless the part file uses a URI in its part-of directive, which means that the part file uniquely identifies the entry point, and the entry point again uniquely identifies the part file. In other words, we can not use this feature to support convenient execution of many different variants of the given program. All in all, this (accidental) feature does not seem to bring any value.

The migration advice would be: "Put the script tag in the entry point rather than in the part file, and then execute the entry point."

Expected impact of this change

The expected impact is minimal, given that the analyzer already reports an error when the part file with a script tag is analyzed.

The migration advice illustrates that it is rather easy to adapt current scripts using this accidental feature. Also, it is not easy to find part files in existing source code where this accidental feature is used today. There are probably very few of them.

Mitigating the change

For an existing part file with a script tag, move the script tag to the corresponding entry point. Executions of the part file as an OS level executable program file should then be replaced by execution of the corresponding entry point. Permissions may need to be adjusted such that the part file can no longer be executed (optionally), and the entry point can be executed (necessarily, otherwise the script tag will be ignored).

If the changes to the execution commands are inconvenient, it may be useful to rename the entry point and part file such that the execution commands do not need to be updated.

@eernstg eernstg added the breaking-change-request This tracks requests for feedback on breaking changes label May 31, 2023
@itsjustkevin itsjustkevin changed the title Stop supporting script tag based execution of part files [breaking change] Stop supporting script tag based execution of part files May 31, 2023
@itsjustkevin itsjustkevin self-assigned this May 31, 2023
@itsjustkevin
Copy link
Contributor

@vsmenon @grouma and @Hixie another breaking change.

@Hixie
Copy link
Contributor

Hixie commented Jun 1, 2023

fine by me

@modulovalue
Copy link
Contributor

For what it's worth, I still think that shebangs on part files should not be a syntax error. I think that they should be "ignored" in a similar way how comments are meant to be ignored by Dart compilers, and only act as convenient syntax for embedding shell/terminal-related information into Dart programs that is ignored by Dart-related tooling.

This would allow for a simpler mental model of Dart syntax, where:

  • the outermost (syntactical) language is UTF-related and characterized by an optional leading FEFF [that contains a]
  • shell-related language that is characterized by an optional leading shebang [that contains a]
  • dart-version-related language that is characterized by an optional leading language version comment [that contains]
  • the actual dart program

Conditionally supporting the "shell-related language" only if the innermost "actual dart program" does not have a part header seems unnecessarily restrictive and I really don't understand why this is considered to be a bug. (Or is the ability to execute part files the actual bug?)

In other words, I think that a script tag at the top of a part file should not cause its library to fail compilation (if the library was executed properly) and I agree that there's no good reason to support executing part files.

@eernstg
Copy link
Member Author

eernstg commented Jun 1, 2023

@modulovalue wrote:

I really don't understand why this is considered to be a bug.

That's very simple: The specification contains a grammar, that grammar doesn't derive the part file contents whose first line is a script tag, and hence a would-be part file that starts with a script tag is not a part file, it's a syntax error. In contrast, a <libraryDeclaration> does derive a library file contents starting with a script tag.

This could be a bug in the parser, or it could be a bug in the specification (if such script tags are useful, and they were omitted from the grammar rule by mistake). During discussions about this topic I haven't heard any arguments in favor of adding support for these script tags to the specification, until now — your comment is exactly such an argument:

This would allow for a simpler mental model of Dart syntax, where:

  • the outermost (syntactical) language is UTF-related and characterized by an optional leading FEFF [that contains a]
  • shell-related language that is characterized by an optional leading shebang [that contains a]
  • dart-version-related language that is characterized by an optional leading language version comment [that contains]
  • the actual dart program

That is indeed a nice, complete perspective on Dart source code artifacts.

I think you could say that this model of Dart source code artifacts is similar to a model that we frequently use with packets on a network (linking to figure on https://www.cs.ait.ac.th/~on/O/oreilly/tcpip/firewall/ch06_03.htm):

However, I'm still not convinced that this apparently general and all-covering perspective brings anything particularly useful to Dart.

First, as you mentioned, the leading 0xFEFF (byte-order mark) is specifically part of the encoding. You could argue that it should be a built-in property of the lexical analyzer that it reads this byte-order mark and processes it as "not part of the encoded document, only part of the encoding". It's there because of the history, but we might want to fix it (such that the grammar doesn't mention the byte-order mark any more).

Next, the script tag: This is indeed very similar to the network packet structure, where the script-tag-processor ignores the rest of the file, passing it on as "data" to the next tool. However, I don't see much value in allowing script tags in part files, especially if they won't ever work (as soon as we fix this issue, of course ;-).

The magic language version comments (like // @dart = 2.19) can occur "near the beginning" of the library/part. We do have some test libraries where it occurs on line 10. This means that we would have to include support for parsing whitespace plus all kinds of comments (if not more) in this "version header", and we would artificially consider the "version language" to be different from Dart, even though it will have to duplicate all those grammar rules.

We could use that kind of layered model of the Dart source code artifacts, but it is not obvious to me that the effort that we'd put into an implementation of this model would bring any clear benefits.

a script tag at the top of a part file should not cause its library to fail compilation

Alas, but why not? 😀

@modulovalue
Copy link
Contributor

That's very simple

Thank you for the explanation and I agree with your view that this is a bug. I was too imprecise in my wording and meant to rather imply that I disagree with making the presence of a script tag in a part file a compile-time error.


until now — your comment is exactly such an argument:

I left a similar comment here in the last section of the issue description. (It might not have been as clear as the one that I left here.)


On the script tag:

However, I don't see much value in allowing script tags in part files, especially if they won't ever work

Have you considered that the script tag can contain strings that are unrelated to dart tooling? Consider this example:

(* bar.dart *)
part "foo.dart";

void main() {}
(* foo.dart *)
#! echo hello there

part of 'bar.dart';

void baz() {}
> chmod +x foo.dart
> ./foo.dart

outputs

hello there ./foo.dart

this is an example of something that "would work" even if part files themselves could not be ran the way that they can be today. We are not limited to providing the script tag with the path to a dart binary, but we can specify any shell script.


On the language version string:

and we would artificially consider the "version language" to be different from Dart

By doing that we would fix a minor inconsistency that I've pointed out in the following issue: dart-lang/language#2945.


(Consider the following for exposition purposes:)

Start ::= 
        // UTF-header
	'\uFEFF'? 
	// Shebang-header
	('#!' (~('\r' | '\n'))* ('\r' | '\n' | '\r\n'))? 
	// Version-header
	(Ws? ('//' Ws? '@' ' '* 'dart' ' '* '=' ' '* 0-9+ '.' 0-9+ ('\r' | '\n' | '\r\n'))?
	// Program
	DartProgram
Ws ::= (' ' | '\t' | '\r' | '\n')+
DartProgram ::= DartHeader DartBody
DartHeader ::= LibraryHeader | PartHeader
DartBody ::= ...
LibraryHeader ::= ...
PartHeader ::= ...

brings anything particularly useful to Dart.

I think that by not allowing DartHeader to have influence over the "terminal-sublanguage" we can enforce the principle of least privilege in a very clean and simple way:

  • There might be a UTF-header that can be ignored by Dart-related tooling (it exists for historical reasons).
  • There might be a Shebang-header that can be ignored by Dart-related tooling (it exists to provide compatibility with Shebang-related tooling).
  • There might be a Version-header that is used to select a version that DartProgram must be considered under. (It's not a comment, it just looks like one)
  • And then there's the actual DartProgram.

@grouma
Copy link
Member

grouma commented Jun 1, 2023

I don't anticipate this having any impact to ACX.

@eernstg
Copy link
Member Author

eernstg commented Jun 7, 2023

@modulovalue, you're fighting hard for the ability to have a script tag in a part file. ;-) Or, as a different perspective, to structure the language in layers, somewhat similar to network packets, and use the same over structure for libraries and part files.

However, I don't think the idea has much support. So I'd recommend that we go ahead with this breaking change and make it a syntax error (as specified) for a part file to have a script tag.

The analyzer already reports a compile-time error when it encounters a script tag in a part file, but the CFE would need to be modified such that the error is reported (and the code that actually allows it to run can be deleted).

@dart-lang/language-team, @johnniwinther, @itsjustkevin, do you wish to have further steps in the breaking change process, or can we simply go ahead and do it? I believe this can be a CFE-only modification, and I suspect that the actual breakage is tiny.

@mit-mit mit-mit added the area-meta Cross-cutting, high-level issues (for tracking many other implementation issues, ...). label Jun 7, 2023
@modulovalue
Copy link
Contributor

you're fighting hard for the ability to have a script tag in a part file. ;-)

I agree that it doesn't seem like a high impact decision, but this exception looks to me like a cut in the figurative Death by a Thousand Cuts.

And the fact that network packets employ a similar design seems like strong evidence to me that that would be way to go.

@itsjustkevin
Copy link
Contributor

Pinging @vsmenon for a final look here.

@eernstg my only recommendation here is that we have some sort of timeline for this breaking change.

@munificent
Copy link
Member

I can't imagine any positive user value coming from allowing shebang lines in part files which aren't runnable Dart scripts anyway.

If you want a file that:

  1. Starts with a shebang line.
  2. Isn't a valid Dart library that can be run.

Might I suggest giving it the file extension .pl, .py, .rb, or .sh? :)

@modulovalue
Copy link
Contributor

modulovalue commented Jun 9, 2023

For the rest of eternity the analyzer (and compiler) will have to check whether a file is a part file with a script tag only to emit an error (that likely nobody will ever see, I agree!), and everybody will have to pay a price for breaking a 30+ year old convention.

Or they could just ignore script tags if they are the first token (or the second one if the first token is FEFF), like they already do with non-part files and keep things simpler and backwards compatible for everybody.

There are people who seem to rely on hashbangs for detecting filetypes example a example b example c. There are over two million dart developers. Why risk causing churn and making people unhappy while breaking a well-supported convention?

Again, to be clear, I'm not talking about executing part files, but about removing the syntax for hashbangs/shebangs/script tags in part files.

Am I guilty of bikeshedding here? Maybe, but I don't think that the arguments that I've provided so far are wrong.

@vsmenon
Copy link
Member

vsmenon commented Jun 9, 2023

lgtm

@lrhn
Copy link
Member

lrhn commented Jun 9, 2023

I don't believe it's a 30 year old convention that every file format must allow an initial line starting with #!.
A lot of file format don't. A lot of those who do usually use # as comment character to begin with.

An example of adding support for #! is that Java allowed #! at the beginning of source files from version 11, so it does happen.

Anyway, even if we allow #! at the start of every Dart file, it still shouldn't allow #! dart to use a part file as entry point, only script files are allowed as entry points, and those must be libraries, so you'd need some extra twiddling to make it actually do something.

@eernstg
Copy link
Member Author

eernstg commented Jul 14, 2023

@johnniwinther, this breaking change has been approved. Can you find a slot in the pipeline to do it?

copybara-service bot pushed a commit that referenced this issue Jul 21, 2023
In response to #52575

Change-Id: I452575517c378dda3000b8fcbf4f66274b1583a6
Reviewed-on: https://dart-review.googlesource.com/c/sdk/+/315420
Reviewed-by: Chloe Stefantsova <cstefantsova@google.com>
Commit-Queue: Johnni Winther <johnniwinther@google.com>
@johnniwinther
Copy link
Member

@eernstg
Copy link
Member Author

eernstg commented Jul 21, 2023

Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area-meta Cross-cutting, high-level issues (for tracking many other implementation issues, ...). breaking-change-approved breaking-change-request This tracks requests for feedback on breaking changes
Projects
Status: Complete
Development

No branches or pull requests

10 participants