Skip to content

Conversation

@jrfnl
Copy link
Contributor

@jrfnl jrfnl commented Apr 2, 2017

I noticed two bugs in the DisallowSpaceIndent sniff when the inital indentation whitespace on a line had a mix of tabs and spaces.

  • If the line started with spaces, but had a tab after it and spaces after that, only the spaces at the start would be fixed.
  • If the line started with tabs, but had spaces after that, no error was thrown at all.

This PR fixes that.

Notes:

  • A new Line indent: mixed metric will be recorded for this sniff. This is inline with the same in the DisallowTabIndent sniff.
  • Moved the throwing of the error down to prevent code duplication as the same logic & error can be used for both situations.
  • Includes minor documentation fix.
  • Includes unit tests for the bugs and adjusted fix file for existing unit tests which were previously not fixed due to this bug.

}

// If tabs are being converted to spaces, the original content
// If spaces are being converted to tabs, the original content
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This comment is referring to PHPCS converting tabs to spaces during tokenizing, so the original comment is correct in this case.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've reverted the word order, but adjusted the comment slightly for clarification of what was meant here.

@gsherwood
Copy link
Member

This behaviour is intentional. Some indentation can only be achieved by using spaces after the normal tab indent, so this sniff does not assume that everything is perfectly aligned to a tab stop. The scope indent sniff makes the same assumption, which is why it never checks for exact indentation unless you tell it to. And when you do, it almost always reports false positives.

DisallowTabIndents is much easier because a tab is always banned due to it having no use in a space-based indent rule.

Changing the behaviour of this sniff would be a BC break, as would changing the definitions of the metric.

To make it a better sniff, I think it needs to continue to allow the space exception, but probably only if those spaces indent the code to a non tab stop. So if a tab could be used instead, the spaces would throw an error and be replaced. But if putting a tab there would cause the code to begin at a different column, the spaces should not be reported or replaced. That's the job of the ScopeIndent sniff.

@jrfnl
Copy link
Contributor Author

jrfnl commented Apr 3, 2017

Hi @gsherwood, thanks for the feedback.

This behaviour is intentional. Some indentation can only be achieved by using spaces after the normal tab indent, so this sniff does not assume that everything is perfectly aligned to a tab stop.

Looking at the code, IMHO the existing behaviour of the sniff is inconsistent with the above statement.

When a line starts with a space and is not a file/class level docblock and there are less spaces than the tabWidth, it still replaces the space(s) with a tab. This goes against what you suggest above.

Similarly, when the line starts with a tab, but also has spaces, no replacement is ever done, even when there are sets of 4 (tabWidth) spaces which could be replaced by a tab. (part of the original report above).

Studying the code in even more detail, I find more inconsistencies as the replacement seems to be ordered wrong.

Example (using . for space and -> for tab):

// Original line:
.......->...// code <- 7 spaces, 1 tab, 3 spaces

// Would be replaced by:
->->..->//code <- 2 tabs, 2 spaces, 1 tab

// While I would have expected:
->->->..//code <- 3 tabs, 2 spaces

Changing the behaviour of this sniff would be a BC break

What I would like to suggest is to align the behaviour of the sniff with your statement and with what I understand to be the intention of the sniff: Disallowing spaces for alignment, but allowing precision alignment using spaces.

In practice this would mean:

  • If there are more spaces > tabWidth, spaces would be replaced by tabs.
  • Any remaining spaces will be placed at the end of the whitespace, so the "whitespace order" will always be tabs first, spaces at the end.

I would not consider that a BC break, but a bug fix as the sniff currently does not behave in line with the intention of the sniff.

If you agree with that, the only question is what to do with the current behavioral inconsistency: should a set of less than 4 spaces in a line beginning with a space still be replaced with a tab ? (when not T_DOC_COMMENT_WHITESPACE)

as would changing the definitions of the metric.

As for the metrics, am I correct that any precision spacing should be ignored, but other than that, the metrics should be recorded ?

  • spaces for only spaces and more spaces > tabWidth (i.e. not precision spacing)
  • tabs for only tabs
  • mixed for a mixture of tabs and spaces where the amount of spaces > tabWidth

Or should the tabs metric also be recorded when the whitespace is tabs with only precision spacing ?
If so, continue ignoring file/class comments for the metric ?

To make it a better sniff, I think it needs to continue to allow the space exception, but probably only if those spaces indent the code to a non tab stop. So if a tab could be used instead, the spaces would throw an error and be replaced. But if putting a tab there would cause the code to begin at a different column, the spaces should not be reported or replaced.

I've made the necessary code changes for this in line with the above. If you could clarify your stance on the points I've described here, I can finish it off and push a replacement commit.

@gsherwood
Copy link
Member

There seems to be a mix of misunderstanding and inconsistent testing (I cannot reproduce your results). The sniff currently reports on leading spaces in indentation, and replaces those leading spaces with tabs. In my testing, it is still working like this.

For example, you're 7 spaces, 1 tab, 3 spaces example is fixed like this for me:

E: [Line 8] Tabs must be used to indent lines; spaces are not allowed (Generic.WhiteSpace.DisallowSpaceIndent.SpacesUsed)
	Generic_Sniffs_WhiteSpace_DisallowSpaceIndentSniff (line 128) replaced token 39 (T_WHITESPACE) "·······\t···echo" => "\t···\t···echo"

That is, 1 tab, 3 spaces, 1 tab, 3 spaces. It replaced as much of the leading whitespace as it could and then left the rest of the indentation as-is.

But this doesn't really matter because you came to the same conclusion as me, although wrote it up more succinctly:

In practice this would mean:

If there are more spaces > tabWidth, spaces would be replaced by tabs.
Any remaining spaces will be placed at the end of the whitespace, so the "whitespace order" will always be tabs first, spaces at the end.

I complexly agree with this. It is what I was suggesting in my final comment, which would make the sniff more useful.

As for your questions:

should a set of less than 4 spaces in a line beginning with a space still be replaced with a tab ? (when not T_DOC_COMMENT_WHITESPACE)

If we assume the rule is "take the indentation and rewrite it to use tabs" than the answer would be that if the indentation is less than a tab width, it would remain as spaces. If it is greater, as many spaces will be removed as possible. The --tab-width setting would need to be used to figure this out.

As for the metrics, am I correct that any precision spacing should be ignored, but other than that, the metrics should be recorded ?

I think the rules are:

  • the indent contains only spaces = spaces
    • e.g., ............echo 'test';
  • the indent contains only tabs = tabs
    • e.g., ->->->echo 'test';
  • the indent contains tabs then spaces < tab-width = tabs
    • e.g., ->->->..echo 'test';
  • the indent contains spaces, then a tab somewhere = mixed
    • e.g., ....->....echo 'test';
    • e.g., ..->->->echo 'test';
  • the indent contains tabs, then spaces >= tab-width = mixed
    • e.g., ->->....echo 'test';
    • e.g., ->........echo 'test';

@jrfnl
Copy link
Contributor Author

jrfnl commented Apr 4, 2017

Ok, verifying my implementation with your comments now.

If we assume the rule is "take the indentation and rewrite it to use tabs" than the answer would be that if the indentation is less than a tab width, it would remain as spaces. If it is greater, as many spaces will be removed as possible. The --tab-width setting would need to be used to figure this out.

This would change the behaviour for the unit tests on line 18 and 19 where the indentation would no longer be fixed. Is that ok ?

https://github.com/squizlabs/PHP_CodeSniffer/blob/master/CodeSniffer/Standards/Generic/Tests/WhiteSpace/DisallowSpaceIndentUnitTest.inc#L18-L19
https://github.com/squizlabs/PHP_CodeSniffer/blob/master/CodeSniffer/Standards/Generic/Tests/WhiteSpace/DisallowSpaceIndentUnitTest.inc.fixed#L18-L19

Edit: Also line 22 and 24 though the whitespace order for those lines would be changed.

@jrfnl
Copy link
Contributor Author

jrfnl commented Apr 4, 2017

One more question: in the existing implementation, file/class docblocks are ignored for the metrics.
According to the below, they should be recorded as tabs (provided they are otherwise indented with tabs).
Continue ignoring the file/class docblock lines or let them follow the same rules as all the other lines ?

the indent contains tabs then spaces < tab-width = tabs
e.g., ->->->..echo 'test';

@gsherwood
Copy link
Member

Yeah, I think it is ok for lines 18,19,22,24 to have different test results now. Especially lines 18 and 19, which were changing the indent level of the code.

Continue ignoring the file/class docblock lines or let them follow the same rules as all the other lines ?

Yes, they need to be ignored or else tab-indented code would end up with space-indented metrics for all top-level docblocks, like file and class comments.

…e indentations.

Details:
* Now also detects & fixes spaces if used after tabs and not used for precision indentation.
* Now also corrects the whitespace order if needed. Order should be tabs first, precision spaces second. For this a new error message and error code has been added.
* Adds the `Line endings: mixed` metric to this sniff in line with how it's used in the `DisallowTabIndent` sniff
* Adds `.fixed` files for the CSS and JS tests.
* Minor documentation clarifications.
@jrfnl jrfnl force-pushed the feature/bug-fix-space-indent branch from de30476 to bf763be Compare April 4, 2017 15:38
@jrfnl
Copy link
Contributor Author

jrfnl commented Apr 4, 2017

Hi @gsherwood, thanks for those responses.

I've replaced the original commit with a new one which takes everything we've discussed into account.

The logic of the original sniff has essentially been rewritten to allow for this.

Details for the fixed sniff:

  • Now also detects & fixes spaces if used after tabs and not used for precision indentation.
  • Now also corrects the whitespace order if needed. Order should be tabs first, precision spaces second. For this a new error message and error code has been added.
  • Adds the Line endings: mixed metric to this sniff in line with how it's used in the DisallowTabIndent sniff
  • Adds .fixed files for the CSS and JS tests.
  • Minor documentation clarifications.

Please let me know if you need any further changes.

@gsherwood gsherwood merged commit bf763be into squizlabs:master Apr 5, 2017
gsherwood added a commit that referenced this pull request Apr 5, 2017
@gsherwood
Copy link
Member

Thanks a lot for doing that.

I did end up removing the extra error message as I don't think there is a need to provide a distinction there. I'd prefer to keep the error message simple and consistent with previous versions.

@jrfnl jrfnl deleted the feature/bug-fix-space-indent branch April 5, 2017 01:03
@jrfnl
Copy link
Contributor Author

jrfnl commented Apr 5, 2017

You're welcome. Thanks for the merge.

}
} else {
if ($numTabs === 0) {
// Precision indentation.
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This logic looks very suspicious to me (compare #547 (comment)). Consider something like this:

		if (! ($env === 'silent'
		       || ($env === 'production'
		       && ($level >= Minz_Log::NOTICE)))) {

Greatly increasing the tab width doesn't suffice as a workaround because then you'll get nonsensical errors about line length.

Frenzie added a commit to Frenzie/FreshRSS that referenced this pull request Apr 1, 2019
This also requires an update of phpcs, since the old version won't run on PHP 7.3.

By setting setting the tab-width to 40 it works around the behavior introduced in squizlabs/PHP_CodeSniffer#1404 which erroneously interprets positioning spaces as indentation. ("If the line started with tabs, but had spaces after that, no error was thrown at all.")

That makes any line lengths checks ineffective, but I think line length checks aren't very useful anyway. They're basically just a (very!) rough indication that you might want to consider some refactoring.
Alkarex pushed a commit to FreshRSS/FreshRSS that referenced this pull request Apr 1, 2019
This also requires an update of phpcs, since the old version won't run on PHP 7.3.

By setting setting the tab-width to 40 it works around the behavior introduced in squizlabs/PHP_CodeSniffer#1404 which erroneously interprets positioning spaces as indentation. ("If the line started with tabs, but had spaces after that, no error was thrown at all.")

That makes any line lengths checks ineffective, but I think line length checks aren't very useful anyway. They're basically just a (very!) rough indication that you might want to consider some refactoring.
mdemoss pushed a commit to mdemoss/FreshRSS that referenced this pull request Mar 25, 2021
This also requires an update of phpcs, since the old version won't run on PHP 7.3.

By setting setting the tab-width to 40 it works around the behavior introduced in squizlabs/PHP_CodeSniffer#1404 which erroneously interprets positioning spaces as indentation. ("If the line started with tabs, but had spaces after that, no error was thrown at all.")

That makes any line lengths checks ineffective, but I think line length checks aren't very useful anyway. They're basically just a (very!) rough indication that you might want to consider some refactoring.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants