You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
As things are, the tokenization of PHP open tag tokens is taken from PHP itself, without changes, but the tokenization in PHP itself is inconsistent in regards to how whitespace following the actual tag is tokenized.
Open tag / followed by
<?php
<? (with short_open_tags=on)
<?=
New line
new line included in T_OPEN_TAG
T_OPEN_TAG + T_WHITESPACE
T_OPEN_TAG_WITH_ECHO + T_WHITESPACE
Trailing spaces + new line
one space included, followed by T_WHITESPACE
T_OPEN_TAG + T_WHITESPACE
T_OPEN_TAG_WITH_ECHO + T_WHITESPACE
No space
**
T_OPEN_TAG
T_OPEN_TAG_WITH_ECHO
One space
space included in T_OPEN_TAG
T_OPEN_TAG + T_WHITESPACE
T_OPEN_TAG_WITH_ECHO + T_WHITESPACE
Multiple spaces
one space included, followed by T_WHITESPACE
T_OPEN_TAG + T_WHITESPACE
T_OPEN_TAG_WITH_ECHO + T_WHITESPACE
** How the <?php tag without space after it is tokenized depends on whether short_open_tags=on.
With short_open_tags=off: tokenized as T_INLINE_HTML.
With short_open_tags=on: tokenized as T_OPEN_TAG<? + T_STRING for php + whatever came after.
☝️ In my opinion, the above is correct and will not be changed as <?php without any space after it, should be considered a parse error anyway.
The Problem
The short of it, is that short open tags - <? - and short open echo tags - <?= - are tokenized consistently as just the plain tag without any whitespace inclusion.
But the token for long open tags - <?php - may include either a new line or a single space after the tag itself, but only that. Any additional whitespace after that is tokenized as a separate T_WHITESPACE token.
This inconsistency means that sniff writers need to remember this unintuitive peculiarity and have to take it into account whenever a sniff needs to deal with PHP open tags.
This also makes the code in these type of sniffs more complicated and increases the cognitive load.
I suspect there will be numerous sniffs out there (including in PHPCS itself) which don't take the difference in tokenization between the long and short open tags into account correctly. This will, in most cases, only be problematic for files using short open tags, as most sniffs default to expecting long open tags.
However, this also means that those sniffs will be buggy if they are used to scan files with short open tags (with short_open_tags=on).
Proposal
I'd like to propose to streamline the tokenization of PHP open tags to never include whitespace.
In practice, only the tokenization of long open tags <?php is affected by this change proposal, as those would no longer include a new line/single space.
I expect the impact of this change on existing code to be relatively low as there are not that many sniffs which specifically deal with PHP open tags.
I also expect that the majority of the affected sniffs will be PHPCS itself (as external standards rarely have the need for extra sniffs for open tags as PHPCS itself has most of the things which could be examined covered via the existing sniffs).
Planning
As this would be a breaking change for sniffs which examine the T_OPEN_TAG token, this change would go into PHPCS 4.0.
The change going into PHPCS 4.0 also means we don't have to consider the impact of this change on CSS and JS tokenization and sniffs anymore.
Opinions
Please let me know if you have any concerns about this proposal or any suggestions for alternative approaches.
Looks like the PHP close tag also may include whitespace. As there is only one flavour of PHP close tags, I'm not sure anything should change for close tags though.
Current Situation
Inspired by #588
As things are, the tokenization of PHP open tag tokens is taken from PHP itself, without changes, but the tokenization in PHP itself is inconsistent in regards to how whitespace following the actual tag is tokenized.
<?php
<?
(withshort_open_tags=on
)<?=
T_OPEN_TAG
T_OPEN_TAG
+T_WHITESPACE
T_OPEN_TAG_WITH_ECHO
+T_WHITESPACE
T_WHITESPACE
T_OPEN_TAG
+T_WHITESPACE
T_OPEN_TAG_WITH_ECHO
+T_WHITESPACE
T_OPEN_TAG
T_OPEN_TAG_WITH_ECHO
T_OPEN_TAG
T_OPEN_TAG
+T_WHITESPACE
T_OPEN_TAG_WITH_ECHO
+T_WHITESPACE
T_OPEN_TAG
+T_WHITESPACE
T_OPEN_TAG_WITH_ECHO
+T_WHITESPACE
** How the
<?php
tag without space after it is tokenized depends on whethershort_open_tags=on
.With
short_open_tags=off
: tokenized asT_INLINE_HTML
.With
short_open_tags=on
: tokenized asT_OPEN_TAG
<?
+T_STRING
forphp
+ whatever came after.☝️ In my opinion, the above is correct and will not be changed as
<?php
without any space after it, should be considered a parse error anyway.The Problem
The short of it, is that short open tags -
<?
- and short open echo tags -<?=
- are tokenized consistently as just the plain tag without any whitespace inclusion.But the token for long open tags -
<?php
- may include either a new line or a single space after the tag itself, but only that. Any additional whitespace after that is tokenized as a separateT_WHITESPACE
token.This inconsistency means that sniff writers need to remember this unintuitive peculiarity and have to take it into account whenever a sniff needs to deal with PHP open tags.
This also makes the code in these type of sniffs more complicated and increases the cognitive load.
I suspect there will be numerous sniffs out there (including in PHPCS itself) which don't take the difference in tokenization between the long and short open tags into account correctly. This will, in most cases, only be problematic for files using short open tags, as most sniffs default to expecting long open tags.
However, this also means that those sniffs will be buggy if they are used to scan files with short open tags (with
short_open_tags=on
).Proposal
I'd like to propose to streamline the tokenization of PHP open tags to never include whitespace.
In practice, only the tokenization of long open tags
<?php
is affected by this change proposal, as those would no longer include a new line/single space.I expect the impact of this change on existing code to be relatively low as there are not that many sniffs which specifically deal with PHP open tags.
I also expect that the majority of the affected sniffs will be PHPCS itself (as external standards rarely have the need for extra sniffs for open tags as PHPCS itself has most of the things which could be examined covered via the existing sniffs).
Planning
As this would be a breaking change for sniffs which examine the
T_OPEN_TAG
token, this change would go into PHPCS 4.0.The change going into PHPCS 4.0 also means we don't have to consider the impact of this change on CSS and JS tokenization and sniffs anymore.
Opinions
Please let me know if you have any concerns about this proposal or any suggestions for alternative approaches.
/cc @asispts @dingo-d @fredden @GaryJones @greg0ire @kukulich @michalbundyra @Ocramius @sirbrillig @stronk7 @weierophinney @wimg
The text was updated successfully, but these errors were encountered: