Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUGFIX] Remove all ‘unprocessable’ tags. #650

Merged
merged 1 commit into from
Feb 11, 2019
Merged

Conversation

JakeQZ
Copy link
Contributor

@JakeQZ JakeQZ commented Feb 3, 2019

Not all ‘unprocessable’ tags (<wbr> by default) were being removed, because a
‘live’ NodeList was being iterated. This was a regression as a result of #627.

Also modified related tests to include a <body> element in the input HTML.
Prior to #627, removeUnprocessableTags() did not behave as documented – if
elements had content it would still remove the tags (though not the content).
The tests did not pick this up because DOMDocument::loadHTML() creates a
<body> element if there isn’t one, and if there is text content (at the start)
not inside any element, wraps it in a <p> element
(https://secure.php.net/manual/en/domdocument.loadhtml.php#88864). (It does not
create a <p> element if there is already a <body> element.)

Not all ‘unprocessable’ tags (`<wbr>` by default) were being removed, because a
‘live’ NodeList was being iterated.  This was a regression as a result of #627.

Also modified related tests to include a `<body>` element in the input HTML.
Prior to #627, `removeUnprocessableTags()` did not behave as documented – if
elements had content it would still remove the tags (though not the content).
The tests did not pick this up because `DOMDocument::loadHTML()` creates a
`<body>` element if there isn’t one, and if there is text content (at the start)
not inside any element, wraps it in a `<p>` element
(https://secure.php.net/manual/en/domdocument.loadhtml.php#88864).  (It does not
create a `<p>` element if there is already a `<body>` element.)
@JakeQZ JakeQZ self-assigned this Feb 3, 2019
@JakeQZ JakeQZ added the bug label Feb 3, 2019
@JakeQZ JakeQZ added this to the 2.2.0 milestone Feb 3, 2019
@JakeQZ
Copy link
Contributor Author

JakeQZ commented Feb 3, 2019

An observation arises. The <wbr> tag causes issues because DOMDocument does not know it is self-closing (void). Several other elements also suffer the same fate: <command>, <embed>, <keygen>, <source>, <track> (from https://bugs.php.net/bug.php?id=73175). Should these all by default be removed from the input? Or should we perhaps support them by simply stripping the erroneous closing tags from the output (which I believe is all that is required)?

@oliverklee, @zoliszabo, what do you think?

@oliverklee
Copy link
Contributor

Or should we perhaps support them by simply stripping the erroneous closing tags from the output (which I believe is all that is required)?

I'm for this approach.

Copy link
Contributor

@oliverklee oliverklee left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good!

@oliverklee oliverklee merged commit b83cea6 into master Feb 11, 2019
@oliverklee oliverklee deleted the bugfix/sibling-wbr branch February 11, 2019 22:48
JakeQZ added a commit that referenced this pull request Feb 13, 2019
Added support to `AbstractHtmlProcessor` for HTML5 self-closing tags not
recognized as such by PHP’s `DOMDocument` implementation.  In effect this is a
workaround for the issue reported in https://bugs.php.net/bug.php?id=73175.

Affected tags require a self-closing slash in the HTML input to `DOMDocument`
(e.g. `<wbr/>` rather than `<wbr>`), and their invalid corresponding closing tag
(e.g. `</wbr>`) removing from its HTML output.

Follows from discussion in #650.
JakeQZ added a commit that referenced this pull request Feb 13, 2019
Added support to `AbstractHtmlProcessor` for HTML5 self-closing tags not
recognized as such by PHP’s `DOMDocument` implementation.  In effect this is a
workaround for the issue reported in https://bugs.php.net/bug.php?id=73175.

Affected tags require a self-closing slash in the HTML input to `DOMDocument`
e.g. `<wbr/>` rather than `<wbr>`, and their invalid corresponding closing tag
(e.g. `</wbr>`) removing from its HTML output.

Follows from discussion in #650.
JakeQZ added a commit that referenced this pull request Feb 13, 2019
Added support to `AbstractHtmlProcessor` for HTML5 self-closing tags not
recognized as such by PHP’s `DOMDocument` implementation.  In effect this is a
workaround for the issue reported in https://bugs.php.net/bug.php?id=73175.

Affected tags require a self-closing slash in the HTML input to `DOMDocument`
(e.g. `<wbr/>` rather than `<wbr>`), and their invalid corresponding closing tag
(e.g. `</wbr>`) removing from its HTML output.

Follows from discussion in #650.
JakeQZ added a commit that referenced this pull request Feb 13, 2019
Added support to `AbstractHtmlProcessor` for HTML5 self-closing tags not
recognized as such by PHP’s `DOMDocument` implementation.  In effect this is a
workaround for the issue reported in https://bugs.php.net/bug.php?id=73175.

Affected tags require a self-closing slash in the HTML input to `DOMDocument`
(e.g. `<wbr/>` rather than `<wbr>`), and their invalid corresponding closing tag
(e.g. `</wbr>`) removing from its HTML output.

Follows from discussion in #650.
@JakeQZ
Copy link
Contributor Author

JakeQZ commented Feb 13, 2019

Or should we perhaps support them by simply stripping the erroneous closing tags from the output (which I believe is all that is required)?

I'm for this approach.

Also required is those tags being in XML self-closing format in the input (i.e. <wbr/> rather than <wbr>). See PR #651

oliverklee pushed a commit that referenced this pull request Mar 13, 2019
Add support to `AbstractHtmlProcessor` for HTML5 self-closing tags not
recognized as such by PHP’s `DOMDocument` implementation.  In effect this is a
workaround for the issue reported in https://bugs.php.net/bug.php?id=73175.

Affected tags require a self-closing slash in the HTML input to `DOMDocument`
(e.g. `<wbr/>` rather than `<wbr>`), and their invalid corresponding closing tag
(e.g. `</wbr>`) removing from its HTML output.

Follows from discussion in #650.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants