Fix discarding html[lang]
#104
Merged
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
DOMDocument::loadHTMLwill parse HTML documents as ISO-8859-1 if there is nometa[charset]tag. This means that UTF-8-encoded HTML fragments such as those coming from JSON-LDarticleBodyfield would be parsed with incorrect encoding.In f14428e, we tried to resolve it by putting
meta[charset]tag at the start of the HTML fragment. Unfortunately, it turns out that causes parser to auto-insert ahtmlelement, losing the attributes of the originalhtmltag.Let’s try to insert the
meta[charset]tag into the proper place in the HTML document.We do not need to use the same trick with
JSLikeHTMLElement::__setsince that expects smaller HTML fragments, nothtmldocuments, so creatinghtmlandheadelements will not be a problem.Also include some unrelated test cleanups I noticed during.