Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Invalid xml:id permitted in XML documents #2364

Closed
jcoyne opened this issue Nov 8, 2021 · 4 comments
Closed

Invalid xml:id permitted in XML documents #2364

jcoyne opened this issue Nov 8, 2021 · 4 comments

Comments

@jcoyne
Copy link

jcoyne commented Nov 8, 2021

Please describe the bug

Nokogiri permits invalid "xml:id" attributes.

doc = Nokogiri.XML('<bad id="////"></bad>')
doc.xpath('//bad').first['id']
=> "////"

This id value is not valid according to the spec https://www.w3.org/TR/xml-id/#syntax The value of an "xml:id" attribute must an NCName (https://www.w3.org/TR/xml-names11/#NT-NCName)

I would expect that the case above ought to result in some sort of exception.

I'm very sorry... 🥲

@jcoyne jcoyne added the state/needs-triage Inbox for non-installation-related bug reports or help requests label Nov 8, 2021
@jcoyne jcoyne changed the title [bug] Invalid xml:id permitted in docs Nov 8, 2021
@jcoyne jcoyne changed the title Invalid xml:id permitted in docs Invalid xml:id permitted in XML documents Nov 8, 2021
@flavorjones
Copy link
Member

Hey, Justin, thanks for opening this issue. It may be a day or two before I'm able to dig in, but I promise I will take a look.

@flavorjones
Copy link
Member

Whoop, I see what's happening. You're using id and not xml:id. Here's what happens if you do that with xml:id:

#! /usr/bin/env ruby

require "nokogiri"

doc = Nokogiri.XML('<bad xml:id="////"></bad>')
doc.errors
# => [#<Nokogiri::XML::SyntaxError: 1:19: ERROR: xml:id : attribute value //// is not an NCName>]

doc.xpath('//bad').first.attributes
# => {"id"=>#<Nokogiri::XML::Attr:0x50 name="id" namespace=#<Nokogiri::XML::Namespace:0x3c prefix="xml" href="http://www.w3.org/XML/1998/namespace"> value="////">}

So, a few things to note:

  • Nokogiri by default will create a SyntaxError and add it to doc#errors, and not raise an exception.
  • libxml2 doesn't consider this a fatal error! Using ParseOptions::STRICT or unsetting ::RECOVER doesn't change this behavior. This is arguably a bug and Nokogiri should raise an exception anyway.
  • Note that the attribute value is unchanged -- libxml2 doesn't know how to correct this, so it leaves it.

Does this answer your questions?

@flavorjones flavorjones added meta/user-help state/will-close and removed state/needs-triage Inbox for non-installation-related bug reports or help requests state/will-close labels Nov 12, 2021
@flavorjones
Copy link
Member

Going to close this, but please do comment if I haven't addressed your question in my previous post and I'm happy to reopen it.

@jcoyne
Copy link
Author

jcoyne commented Nov 12, 2021

Thanks @flavorjones. This was most helpful. Sorry for misunderstanding the spec.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants