Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[TASK] Normalize the DOCTYPE declaration #866

Merged
merged 1 commit into from
Apr 23, 2020

Conversation

JakeQZ
Copy link
Contributor

@JakeQZ JakeQZ commented Apr 23, 2020

Ensure that the DOCTYPE declaration consists of uppercase DOCTYPE and
lowercase root element name (html).

This is done when the DOMDocument is created from an HTML source. Once the
DOMDocument has been created, the DOMDocumentType cannot be changed, so the
document type declaration must be manipulated (if necessary) in the HTML
beforehand. (Since only HTML documents are supported, the declaration is only
normalized when the root element name is HTML, in whatever case - the precise
specification for any element name involves lists of various Unicode character
ranges which it would be superfluous to allow for and try to match. PHP's
DOMDocument/libxml itself will output the DOCTYPE keyword in uppercase in
any case.)

This normalization is consistent with the relevant part of the
polyglot markup specification.
While polyglot markup is primarily intended for serialization of HTML as XML
(we don't actually support outputting as XHTML), is also recommended for maximum
interoperability and robustness when rendering HTML.

This also makes the output consistent with that of Masterminds/html5-php and
would eliminate the need to change associated tests specifically for #831.

Closes #858.

Ensure that the DOCTYPE declaration consists of uppercase `DOCTYPE` and
lowercase root element name (`html`).

This is done when the `DOMDocument` is created from an HTML source.  Once the
`DOMDocument` has been created, the `DOMDocumentType` cannot be changed, so the
document type declaration must be manipulated (if necessary) in the HTML
beforehand.  (Since only HTML documents are supported, the declaration is only
normalized when the root element name is HTML, in whatever case - the precise
specification for any element name involves lists of various Unicode character
ranges which it would be superfluous to allow for and try to match.  PHP's
`DOMDocument`/`libxml` itself will output the `DOCTYPE` keyword in uppercase in
any case.)

This normalization is consistent with the relevant part of the
[polyglot markup specification](
  https://dev.w3.org/html5/html-polyglot/html-polyglot.html#doctype
).
 While polyglot markup is primarily intended for serialization of HTML as XML
(we don't actually support outputting as XHTML), is also recommended for maximum
interoperability and robustness when rendering HTML.

This also makes the output consistent with that of `Masterminds/html5-php` and
would eliminate the need to change associated tests specifically for #831.

Closes #858.
@JakeQZ JakeQZ added this to the 4.0.0 milestone Apr 23, 2020
@JakeQZ JakeQZ self-assigned this Apr 23, 2020
Copy link
Contributor

@oliverklee oliverklee left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.

@oliverklee oliverklee merged commit eae40ec into master Apr 23, 2020
@oliverklee oliverklee deleted the feature/normalize-doctype branch April 23, 2020 22:49
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Normalize DOCTYPE?
2 participants