Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Tokenizer/PHP: add extra hardening to the (DNF) type handling + efficiency improvement #508

Merged
merged 4 commits into from
May 22, 2024

Conversation

jrfnl
Copy link
Member

@jrfnl jrfnl commented May 21, 2024

Description

Follow up on PR #507.

Tokenizer/PHP: efficiency improvement for DNF type handling

The PHP::processAdditional() method walks back from the end of the file to the beginning.

With that in mind, and knowing that a type can never end on an open parenthesis, and an open parenthesis can never be seen in a type before the close parenthesis has been seen (at least for valid/non-parse error types), it makes no sense to trigger the type handling logic for open parentheses.

This should make the tokenizer slightly more efficient as (open) parentheses are used a lot in code ;-)

It also prevents the type handling layer from acting on these type of invalid/parse error types, while it previously would.

Includes tests safeguarding the behaviour of the type handling layer for this type of invalid/parse error types.

Of these tests, the type for the OO constant and for the property were previously not handled consistently/correctly.
The parameter type + the return type were fine.

Tokenizer/PHP: don't retokenize tokens in a broken type DNF declaration

A DNF type always needs to have matching open and close parentheses. If the parentheses are unbalanced, we can't be sure this is a DNF type and run the risk of retokenizing non-type tokens.

Even if the code under scan was intended as a DNF type, it will be invalid code/contain a parse error if the parentheses are unbalanced.
In that case, retokenizing to the type specific tokens is likely to make the effect of this parse error on sniffs worse.

With that in mind, I'm adding extra hardening to the type handling layer in the PHP tokenizer, which will ensure that the retokenization to type specific tokens only happens when there are balanced pairs of DNF parentheses.

Includes tests safeguarding the behaviour of the type handling layer for this type of invalid/parse error types.

Safe for two, all these tests would previously fail on containing at least some type specific tokens.

Tokenizer/PHP: efficiency fix

Reminder: the PHP::processAdditional()` method walks back from the end of the file to the beginning.

The type handling retokenization layer is triggered for each &, | and ) the tokenizer encounters.

When something is recognized as a valid type declaration, the relevant tokens will all be retokenized in one go the first time the type handling layer is triggered, which means that - as the type tokens will have been retokenized already -, the type handling layer will not be triggered again for any of the other type related tokens in the type.

However, if the type is not recognized as a valid type, the type handling layer will keep getting retriggered and will (correctly) keep concluding this is not a valid type.

The change in this PR, prevents the type handling layer from doing any work when it is retriggered on a token which was previously already seen and concluded to be, either not part of a type or part of an invalid type.

This should make the tokenizer marginally faster for complex types containing an error, like (A&B)|(C&D|(E&F).

Tokenizer/PHP: minor doc fix in type handling layer

Suggested changelog entry

  • Efficiency improvement to the type handling layer in the PHP tokenizer.
  • Tokenizer/PHP: extra hardening against handling parse errors in the type handling layer

Related issues/external references

Loosely related to #504, #505, #507

jrfnl added 4 commits May 22, 2024 00:51
The PHP::processAdditional()` method walks _back_ from the end of the file to the beginning.

With that in mind, and knowing that a type can never end on an open parenthesis, and an open parenthesis can never be seen in a type before the close parenthesis has been seen (at least for valid/non-parse error types), it makes no sense to trigger the type handling logic for open parentheses.

This should make the tokenizer slightly more efficient as (open) parentheses are used a lot in code ;-)

It also prevents the type handling layer from acting on these type of invalid/parse error types, while it previously would.

Includes tests safeguarding the behaviour of the type handling layer for this type of invalid/parse error types.

Of these tests, the type for the OO constant and for the property were previously not handled consistently/correctly.
The parameter type + the return type were fine.
A DNF type always needs to have matching open and close parentheses. If the parentheses are unbalanced, we can't be sure this is a DNF type and run the risk of retokenizing non-type tokens.

Even if the code under scan was intended as a DNF type, it will be invalid code/contain a parse error if the parentheses are unbalanced.
In that case, retokenizing to the type specific tokens is likely to make the effect of this parse error on sniffs _worse_.

With that in mind, I'm adding extra hardening to the type handling layer in the PHP tokenizer, which will ensure that the retokenization to type specific tokens _only_ happens when there are balanced pairs of DNF parentheses.

Includes tests safeguarding the behaviour of the type handling layer for this type of invalid/parse error types.

Safe for two, all these tests would previously fail on containing at least _some_ type specific tokens.
Reminder: the PHP::processAdditional()` method walks _back_ from the end of the file to the beginning.

The type handling retokenization layer is triggered for each `&`, `|` and `)` the tokenizer encounters.

When something is recognized as a valid type declaration, the relevant tokens will all be retokenized in one go the first time the type handling layer is triggered, which means that - as the type tokens will have been retokenized already -, the type handling layer will not be triggered again for any of the other type related tokens in the type.

However, if the type is *not* recognized as a valid type, the type handling layer will keep getting retriggered and will (correctly) keep concluding this is not a valid type.

The change in this PR, prevents the type handling layer from doing any work when it is retriggered on a token which was previously already seen and concluded to be, either not part of a type or part of an invalid type.

This should make the tokenizer marginally faster for complex types containing an error, like `(A&B|(C&D)|(E&F)`.
@jrfnl jrfnl added this to the 3.10.1 milestone May 21, 2024
@jrfnl jrfnl merged commit fb351b3 into master May 22, 2024
50 checks passed
@jrfnl jrfnl deleted the feature/tokenizer-php-harden-the-dnf-layer-some-more branch May 22, 2024 12:52
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant