-
Notifications
You must be signed in to change notification settings - Fork 154
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[BUGFIX] Remove all ‘unprocessable’ tags. #650
Conversation
Not all ‘unprocessable’ tags (`<wbr>` by default) were being removed, because a ‘live’ NodeList was being iterated. This was a regression as a result of #627. Also modified related tests to include a `<body>` element in the input HTML. Prior to #627, `removeUnprocessableTags()` did not behave as documented – if elements had content it would still remove the tags (though not the content). The tests did not pick this up because `DOMDocument::loadHTML()` creates a `<body>` element if there isn’t one, and if there is text content (at the start) not inside any element, wraps it in a `<p>` element (https://secure.php.net/manual/en/domdocument.loadhtml.php#88864). (It does not create a `<p>` element if there is already a `<body>` element.)
An observation arises. The @oliverklee, @zoliszabo, what do you think? |
I'm for this approach. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good!
Added support to `AbstractHtmlProcessor` for HTML5 self-closing tags not recognized as such by PHP’s `DOMDocument` implementation. In effect this is a workaround for the issue reported in https://bugs.php.net/bug.php?id=73175. Affected tags require a self-closing slash in the HTML input to `DOMDocument` (e.g. `<wbr/>` rather than `<wbr>`), and their invalid corresponding closing tag (e.g. `</wbr>`) removing from its HTML output. Follows from discussion in #650.
Added support to `AbstractHtmlProcessor` for HTML5 self-closing tags not recognized as such by PHP’s `DOMDocument` implementation. In effect this is a workaround for the issue reported in https://bugs.php.net/bug.php?id=73175. Affected tags require a self-closing slash in the HTML input to `DOMDocument` e.g. `<wbr/>` rather than `<wbr>`, and their invalid corresponding closing tag (e.g. `</wbr>`) removing from its HTML output. Follows from discussion in #650.
Added support to `AbstractHtmlProcessor` for HTML5 self-closing tags not recognized as such by PHP’s `DOMDocument` implementation. In effect this is a workaround for the issue reported in https://bugs.php.net/bug.php?id=73175. Affected tags require a self-closing slash in the HTML input to `DOMDocument` (e.g. `<wbr/>` rather than `<wbr>`), and their invalid corresponding closing tag (e.g. `</wbr>`) removing from its HTML output. Follows from discussion in #650.
Added support to `AbstractHtmlProcessor` for HTML5 self-closing tags not recognized as such by PHP’s `DOMDocument` implementation. In effect this is a workaround for the issue reported in https://bugs.php.net/bug.php?id=73175. Affected tags require a self-closing slash in the HTML input to `DOMDocument` (e.g. `<wbr/>` rather than `<wbr>`), and their invalid corresponding closing tag (e.g. `</wbr>`) removing from its HTML output. Follows from discussion in #650.
Also required is those tags being in XML self-closing format in the input (i.e. |
Add support to `AbstractHtmlProcessor` for HTML5 self-closing tags not recognized as such by PHP’s `DOMDocument` implementation. In effect this is a workaround for the issue reported in https://bugs.php.net/bug.php?id=73175. Affected tags require a self-closing slash in the HTML input to `DOMDocument` (e.g. `<wbr/>` rather than `<wbr>`), and their invalid corresponding closing tag (e.g. `</wbr>`) removing from its HTML output. Follows from discussion in #650.
Not all ‘unprocessable’ tags (
<wbr>
by default) were being removed, because a‘live’ NodeList was being iterated. This was a regression as a result of #627.
Also modified related tests to include a
<body>
element in the input HTML.Prior to #627,
removeUnprocessableTags()
did not behave as documented – ifelements had content it would still remove the tags (though not the content).
The tests did not pick this up because
DOMDocument::loadHTML()
creates a<body>
element if there isn’t one, and if there is text content (at the start)not inside any element, wraps it in a
<p>
element(https://secure.php.net/manual/en/domdocument.loadhtml.php#88864). (It does not
create a
<p>
element if there is already a<body>
element.)