Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[bug] [regression] Namespace not output if DOCTYPE present and remove_namespaces! is used #2266

Closed
gioele opened this issue Jun 13, 2021 · 1 comment

Comments

@gioele
Copy link

gioele commented Jun 13, 2021

Please describe the bug

After using remove_namespaces!, to_xhtml does not add the XHTML xmlns to a document, but only if a DOCTYPE is present. Without a DOCTYPE, the XHTML xmlns is added (as expected).

Without DOCTYPE: "<html xmlns=\"http://www.w3.org/1999/xhtml\"></html>\n"
With DOCTYPE: "<!DOCTYPE html>\n<html></html>\n"

NOTE: This a regression: version 1.11.3 does not exhibit this behavior; version >= 1.11.4 have this bug.

Help us reproduce what you're seeing

Reproduction script:

#!/usr/bin/env ruby

require "nokogiri"

p Nokogiri::VERSION

h1 = "<html xmlns=\"http://www.w3.org/1999/xhtml\"></html>\n"
doc1 = Nokogiri::XML(h1)
doc1.remove_namespaces!
p h1
p doc1.to_xhtml

raise "different serialization" if h1 != doc1.to_xhtml

h2 = "<!DOCTYPE html>\n<html xmlns=\"http://www.w3.org/1999/xhtml\"></html>\n"
doc2 = Nokogiri::XML(h2)
doc2.remove_namespaces! # <<< the bug disappears if this line is commented out
p h2
p doc2.to_xhtml

raise "different serialization" if h2 != doc2.to_xhtml

Expected behavior / Actual behavior

Regardless of whether a DOCTYPE is present or remove_namespaces! has been used, to_xhtml should always produce conformant XHTML files with the required xmlns.

This is the output produced by the reproduction script with Nokogiri 1.11.3 (expected):

"1.11.3"
"<html xmlns=\"http://www.w3.org/1999/xhtml\"></html>\n"
"<html xmlns=\"http://www.w3.org/1999/xhtml\"></html>\n"
"<!DOCTYPE html>\n<html xmlns=\"http://www.w3.org/1999/xhtml\"></html>\n"
"<!DOCTYPE html>\n<html xmlns=\"http://www.w3.org/1999/xhtml\"></html>\n"

This is the output produced with Nokogiri 1.11.4 and 1.11.7 (broken):

"1.11.4"
"<html xmlns=\"http://www.w3.org/1999/xhtml\"></html>\n"
"<html xmlns=\"http://www.w3.org/1999/xhtml\"></html>\n"
"<!DOCTYPE html>\n<html xmlns=\"http://www.w3.org/1999/xhtml\"></html>\n"
"<!DOCTYPE html>\n<html></html>\n"
RuntimeError: different serialization
  ./test.rb:21:in `<top (required)>'

Environment

$ nokogiri -v
# Nokogiri (1.11.7)
    ---
    warnings: []
    nokogiri:
      version: 1.11.7
      cppflags:
      - "-I/[RBENV]/versions/2.6.2/lib/ruby/gems/2.6.0/gems/nokogiri-1.11.7-x86_64-linux/ext/nokogiri"
      - "-I/[RBENV]/versions/2.6.2/lib/ruby/gems/2.6.0/gems/nokogiri-1.11.7-x86_64-linux/ext/nokogiri/include"
      - "-I/[RBENV]/versions/2.6.2/lib/ruby/gems/2.6.0/gems/nokogiri-1.11.7-x86_64-linux/ext/nokogiri/include/libxml2"
      ldflags: []
    ruby:
      version: 2.6.2
      platform: x86_64-linux
      gem_platform: x86_64-linux
      description: ruby 2.6.2p47 (2019-03-13 revision 67232) [x86_64-linux]
      engine: ruby
    libxml:
      source: packaged
      precompiled: true
      patches:
      - 0001-Remove-script-macro-support.patch
      - 0002-Update-entities-to-remove-handling-of-ssi.patch
      - 0003-libxml2.la-is-in-top_builddir.patch
      - 0004-use-glibc-strlen.patch
      - 0005-avoid-isnan-isinf.patch
      - 0006-update-automake-files-for-arm64.patch
      - 0007-Fix-XPath-recursion-limit.patch
      libxml2_path: "/[RBENV]/versions/2.6.2/lib/ruby/gems/2.6.0/gems/nokogiri-1.11.7-x86_64-linux/ext/nokogiri"
      memory_management: ruby
      iconv_enabled: true
      compiled: 2.9.12
      loaded: 2.9.12
    libxslt:
      source: packaged
      precompiled: true
      patches:
      - 0001-update-automake-files-for-arm64.patch
      - 0002-Fix-xml2-config-check-in-configure-script.patch
      compiled: 1.1.34
      loaded: 1.1.34
    other_libraries:
      zlib: 1.2.11
@gioele gioele added the state/needs-triage Inbox for non-installation-related bug reports or help requests label Jun 13, 2021
@flavorjones
Copy link
Member

Hi, thanks for opening this issue.

I'm going to give an answer similar to the one I gave at #2265, which is that remove_namespaces! shouldn't be used if you want standards-compliant behavior.

In particular we disagree about this statement:

Regardless of whether ... remove_namespaces! has been used, to_xhtml should always produce conformant XHTML files with the required xmlns.

This probably is likely happening in Nokogiri v1.11.4 and later because that's the version that upgraded libxml2 to v2.9.12, which did have many changes to namespace handling behavior, particularly in HTML/XHTML contexts.

Can I ask why you're removing namespaces from the document if you expect namespaces to be handled correctly?

@flavorjones flavorjones added topic/namespaces meta/user-help and removed state/needs-triage Inbox for non-installation-related bug reports or help requests labels Jun 14, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants