-
-
Notifications
You must be signed in to change notification settings - Fork 57
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Reminder: revisit yield from
tokenization
#529
Comments
Loosely related to: squizlabs/PHP_CodeSniffer#3808 |
Okay, so I've had a look at this and as I suspected there are some issues which should be fixed. I'm currently undecided on how to fix these. I'm fully aware that the issues found are edge cases and will rarely, if ever, be found in real life codebases. Having said that, that's all the more reason to fix these tokenizer issues as PHPCS kicking in and flagging things correctly is especially important for those edge cases. Issue squizlabs/PHP_CodeSniffer#3808 is a case in point for this. Tokenizer basic principlesGuiding principles:
Additionally:
FindingsFor "plain" Things get interesting once we add a comment/ignore annotation between the Test set upI've used the following code sample to examine the tokenization: <?php
function generator()
{
yield from gen2();
yield /* comment */ from gen2();
yield
from
gen2();
yield // comment
from gen2();
yield
/* comment */
from
gen2();
} The above code sample is valid in PHP 8.3, but shows parse errors in PHP < 8.3 for those code samples with a comment between the PHP native tokenizationTaking the PHP native tokenization as a basis, there are four (five) different tokenizations to take into account.
Details about the differences in PHP native tokenizationThe PHP native tokenization is as follows: yield from gen2();
# PHP 5.4:
# Line 4: T_STRING ('yield')
# Line 4: T_WHITESPACE (' ')
# Line 4: T_STRING ('from')
#
# PHP 5.5 - 5.6:
# Line 4: T_YIELD ('yield')
# Line 4: T_WHITESPACE (' ')
# Line 4: T_STRING ('from')
#
# PHP 7.0 - 8.2:
# Line 4: T_YIELD_FROM ('yield from')
#
# PHP 8.3:
# Line 4: T_YIELD_FROM ('yield from')
yield /* comment */ from gen2();
# PHP 5.4:
# Line 6: T_STRING ('yield')
# Line 6: T_WHITESPACE (' ')
# Line 6: T_COMMENT ('/* comment */')
# Line 6: T_WHITESPACE (' ')
# Line 6: T_STRING ('from')
#
# PHP 5.5 - 5.6:
# Line 6: T_YIELD ('yield')
# Line 6: T_WHITESPACE (' ')
# Line 6: T_COMMENT ('/* comment */')
# Line 6: T_WHITESPACE (' ')
# Line 6: T_STRING ('from')
#
# PHP 7.0 - 8.2:
# Line 6: T_YIELD ('yield')
# Line 6: T_WHITESPACE (' ')
# Line 6: T_COMMENT ('/* comment */')
# Line 6: T_WHITESPACE (' ')
# Line 6: T_STRING ('from')
#
# PHP 8.3:
# Line 6: T_YIELD_FROM ('yield /* comment */ from')
yield
from
gen2();
# PHP 5.4:
# Line 8: T_STRING ('yield')
# Line 8: T_WHITESPACE ('
# ')
# Line 9: T_STRING ('from')
#
# PHP 5.5 - 5.6:
# Line 8: T_YIELD ('yield')
# Line 8: T_WHITESPACE ('
# ')
# Line 9: T_STRING ('from')
#
# PHP 7.0 - 8.2:
# Line 8: T_YIELD_FROM ('yield
# from')
#
# PHP 8.3:
# Line 8: T_YIELD_FROM ('yield
# from')
yield // comment
from gen2();
# PHP 5.4:
# Line 12: T_STRING ('yield')
# Line 12: T_WHITESPACE (' ')
# Line 12: T_COMMENT ('// comment
# ')
# Line 13: T_WHITESPACE (' ')
# Line 13: T_STRING ('from')
#
# PHP 5.5 - 5.6:
# Line 12: T_YIELD ('yield')
# Line 12: T_WHITESPACE (' ')
# Line 12: T_COMMENT ('// comment
# ')
# Line 13: T_WHITESPACE (' ')
# Line 13: T_STRING ('from')
#
# PHP 7.0 - 8.2:
# Line 12: T_YIELD ('yield')
# Line 12: T_WHITESPACE (' ')
# Line 12: T_COMMENT ('// comment
# ')
# Line 13: T_WHITESPACE (' ')
# Line 13: T_STRING ('from')
#
# PHP 8.3:
# Line 12: T_YIELD_FROM ('yield // comment
# from')
yield
/* comment */
from
gen2();
# PHP 5.4:
# Line 15: T_STRING ('yield')
# Line 15: T_WHITESPACE ('
# ')
# Line 16: T_COMMENT ('/* comment */')
# Line 16: T_WHITESPACE ('
# ')
# Line 17: T_STRING ('from')
#
# PHP 5.5 - 5.6:
# Line 15: T_YIELD ('yield')
# Line 15: T_WHITESPACE ('
# ')
# Line 16: T_COMMENT ('/* comment */')
# Line 16: T_WHITESPACE ('
# ')
# Line 17: T_STRING ('from')
#
# PHP 7.0 - 8.2:
# Line 15: T_YIELD ('yield')
# Line 15: T_WHITESPACE ('
# ')
# Line 16: T_COMMENT ('/* comment */')
# Line 16: T_WHITESPACE ('
# ')
# Line 17: T_STRING ('from')
#
# PHP 8.3:
# Line 15: T_YIELD_FROM ('yield
# /* comment */
# from') To summarize: it looks like PHP natively did not support a comment between the PHPCS tokenizationAs things are, the PHPCS tokenization is consistent for PHP 5.4 - 8.2 and in line with a comment between the keywords being a parse error. The PHPCS tokenization is as follows: yield from gen2();
# PHP 5.4 - 8.2:
# 10 | L04 | C 5 | CC 1 | ( 0) | T_YIELD_FROM | [ 10]: yield from
#
# PHP 8.3:
# 10 | L04 | C 5 | CC 1 | ( 0) | T_YIELD_FROM | [ 10]: yield from
yield /* comment */ from gen2();
# PHP 5.4 - 8.2:
# 19 | L06 | C 5 | CC 1 | ( 0) | T_YIELD | [ 5]: yield
# 20 | L06 | C 10 | CC 1 | ( 0) | T_WHITESPACE | [ 1]:
# 21 | L06 | C 11 | CC 1 | ( 0) | T_COMMENT | [ 13]: /* comment */
# 22 | L06 | C 24 | CC 1 | ( 0) | T_WHITESPACE | [ 1]:
# 23 | L06 | C 25 | CC 1 | ( 0) | T_STRING | [ 4]: from
#
# PHP 8.3:
# 19 | L06 | C 5 | CC 1 | ( 0) | T_YIELD_FROM | [ 24]: yield /* comment */ from
yield
from
gen2();
# PHP 5.4 - 8.2:
# 32 | L08 | C 5 | CC 1 | ( 0) | T_YIELD_FROM | [ 5]: yield
#
# 33 | L09 | C 1 | CC 1 | ( 0) | T_YIELD_FROM | [ 8]: from
#
# PHP 8.3:
# 28 | L08 | C 5 | CC 1 | ( 0) | T_YIELD_FROM | [ 5]: yield
#
# 29 | L09 | C 1 | CC 1 | ( 0) | T_YIELD_FROM | [ 8]: from
yield // comment
from gen2();
# PHP 5.4 - 8.2:
# 43 | L12 | C 5 | CC 1 | ( 0) | T_YIELD | [ 5]: yield
# 44 | L12 | C 10 | CC 1 | ( 0) | T_WHITESPACE | [ 1]:
# 45 | L12 | C 11 | CC 1 | ( 0) | T_COMMENT | [ 10]: // comment
#
# 46 | L13 | C 1 | CC 1 | ( 0) | T_WHITESPACE | [ 1]:
# 47 | L13 | C 2 | CC 1 | ( 0) | T_STRING | [ 4]: from
#
# PHP 8.3:
# 39 | L12 | C 5 | CC 1 | ( 0) | T_YIELD_FROM | [ 16]: yield // comment
#
# 40 | L13 | C 1 | CC 1 | ( 0) | T_YIELD_FROM | [ 5]: from
yield
/* comment */
from
gen2();
# PHP 5.4 - 8.2:
# 56 | L15 | C 5 | CC 1 | ( 0) | T_YIELD | [ 5]: yield
# 57 | L15 | C 10 | CC 1 | ( 0) | T_WHITESPACE | [ 0]:
#
# 58 | L16 | C 1 | CC 1 | ( 0) | T_WHITESPACE | [ 4]:
# 59 | L16 | C 5 | CC 1 | ( 0) | T_COMMENT | [ 13]: /* comment */
# 60 | L16 | C 18 | CC 1 | ( 0) | T_WHITESPACE | [ 0]:
#
# 61 | L17 | C 1 | CC 1 | ( 0) | T_WHITESPACE | [ 4]:
# 62 | L17 | C 5 | CC 1 | ( 0) | T_STRING | [ 4]: from
#
# PHP 8.3:
# 49 | L15 | C 5 | CC 1 | ( 0) | T_YIELD_FROM | [ 5]: yield
#
# 50 | L16 | C 1 | CC 1 | ( 0) | T_YIELD_FROM | [ 17]: /* comment */
#
# 51 | L17 | C 1 | CC 1 | ( 0) | T_YIELD_FROM | [ 8]: from To summarize:
ConclusionsAs things currently are, there are the following issues in the PHPCS polyfill/re-tokenization:
Possible mitigationsHow to consistently tokenize There are three options I can see:
Independently of the chosen solution, tab replacement should be enabled for the Aside from the above, it will probably be a good idea to add a sniff to PHPCompatibility to check for comments between the OpinionsI'd love to hear some opinions on the above and preferences for the solution directions (or even potential alternative I'm personally leaning towards solution 2️⃣ , but am still not 100% happy with this. /cc @asispts @dingo-d @fredden @GaryJones @greg0ire @kukulich @michalbundyra @Ocramius @sirbrillig @stronk7 @weierophinney @wimg |
Another option not listed:
We can link the two Proposed:
|
@fredden Good input. Thanks for that. I'll have to think it over. Main reason why I didn't include that option in the original list is because if makes the gap between the PHP native tokenization and the PHPCS tokenization larger/makes it more inconsistent. As for the comparison with function declarations: that's not a comparable syntax as PHP natively does not tokenize the The only multi-character tokens which AFAICS come close are cast tokens which can contain arbitrary whitespace on the inside of the parentheses, though those still can't contain new lines or comments. |
I've opened an issue in PHP Core to ask whether the PHP 8.3 change was intentional or should be regarded as a bug. The answer to that should help inform the solution direction for this issue. |
The issue in PHP Core has now escalated to PHP Internals: https://externals.io/message/124462 |
There is a proposal open to change the tokenization (yet) again now in PHP 8.4, without reverting the previous change: php/php-src#15041 This is just complicating things even more as it will cause an even larger BC break (removal of the We'll have to wait and see what happens, but if that gets merged, the problem will be even larger as that will break every sniff which looks for |
Update: I've now gone ahead and implemented option 2. I'll be pulling the related changes over the next few days. The reasons to implement option 2 (rather than @fredden's option 4 or the other two listed options) are as follows:
Note: I'm not adverse to potentially consider a larger change, like option 4, in the future, but such a change would need to be made in a major PHPCS release as the BC-break would be significantly larger than the currently made change. It would be great if by the time such larger change is discussed, there is more clarity about the direction PHP itself will take too, but we'll have to see about that. |
PR #647 is now open if anyone wants to test the implementation before I merge it (will leave it open for ~week). |
Just a reminder to self to revisit the
yield from
tokenization and add more extensive tests for this.Things to have a critical look at whether they are tokenized correctly/how they are tokenized:
yield
andfrom
with only whitespace between the words, but the words are on different linesyield
andfrom
with a comment between the wordsAlso needs a check on how PHP natively handles this.
The text was updated successfully, but these errors were encountered: