-
Notifications
You must be signed in to change notification settings - Fork 1.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Unclosed strings at EOF sometimes tokenized as T_WHITESPACE by the JS tokenizer #1718
Comments
Also see #1719 |
I suspect many js parser bugs and I know some repo's with data for brute forcing, I wish I had dedicated time to fix all important issues though... It might be a hard to set up the tools correctly, as the js community didn't want to maintain some tools (and stopped maintaining while pushing through non-compatible changes) as others started to create their own incomplete tools. (also, the tools have to be adopted for our test cases as well, but since these repo's have batches of test data, it might be worth writing the tool long term) The js grammar (little outdated compared to live spec) can be found here although it might be a little bit overwhelming (click on the "lexical grammar" link to only see the lexical part): |
Looks like this only happens if there is no newline char at the end of the file. Otherwise, the tokeniser sees the newline char and bails out. |
Yep, looked like an exception needed to be made for this specific case. Thanks for reporting it. |
A JS file with the following code - note the missing closing
'
!is being tokenized as follows:
I believe that the fourth token
'hi);
is incorrectly tagged asT_WHITESPACE
. While this is caused by a parse error in the original JS code, this is clearly not whitespace.Maybe the catch-all
T_STRING
or even aT_UNKNOWN
would be more appropriate ?Full tokenizer processing log:
The text was updated successfully, but these errors were encountered: