Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DocumentFragment#xpath fails to find specific attribute for elements at the root of the fragment #213

Closed
Phrogz opened this issue Jan 25, 2010 · 7 comments

Comments

@Phrogz
Copy link

Phrogz commented Jan 25, 2010

require 'nokogiri'
html = DATA.read
doc1 = Nokogiri::HTML(html)
doc2 = Nokogiri::HTML::DocumentFragment.parse(html)

ELEMENT_ONLY = ".//h2"
WITH_ID      = ".//h2[@id='foo']"

p doc1.xpath(ELEMENT_ONLY).first['id'],
  doc1.xpath(WITH_ID),
  doc2.xpath(ELEMENT_ONLY).first['id'],
  doc2.xpath(WITH_ID)

#=> "foo"
#=> [#<Nokogiri::XML::Element:0x80a3c168 name="h2" attributes=[#<Nokogiri::XML::Attr:0x80a3bbb8 name="id" value="foo">] children=[#<Nokogiri::XML::Text:0x80a3b288 "Heading 1">]>]
#=> "foo"
#=> []

__END__
<h2 id="foo">Heading 1</h2>

Same problem applies to at_xpath.

@Phrogz
Copy link
Author

Phrogz commented Jan 25, 2010

Workaround is to use css/at_css on the DocumentFragment, which only works if your id attributes do not have colons or periods in the name.

@Phrogz
Copy link
Author

Phrogz commented Jan 27, 2010

The plot thickens. Apparently it fails to find elements at the root of the fragment, but succeeds if they're nested:

require 'nokogiri'
s1 = "<a href='foo'>hi</a>"
s2 = "<a href='foo'>hi</a>\n"
s3 = "<a href='foo'>hi</a><a href='bar'>bye</a>"
s4 = "<a href='foo'>hi</a>\n<a href='bar'>bye</a>"
s5 = "<p><a href='foo'>hi</a></p>"
s6 = "<a href='foo'>hi</a><p><a href='bar'>bye</a></p>"

[s1,s2,s3,s4,s5,s6].each do |s|
  fragment = Nokogiri::HTML::DocumentFragment.parse(s)
  p s, fragment.xpath('.//a[@href]').length
  puts ""
end

#=> "<a href='foo'>hi</a>"
#=> 0
#=> 
#=> "<a href='foo'>hi</a>\n"
#=> 0
#=> 
#=> "<a href='foo'>hi</a><a href='bar'>bye</a>"
#=> 0
#=> 
#=> "<a href='foo'>hi</a>\n<a href='bar'>bye</a>"
#=> 0
#=> 
#=> "<p><a href='foo'>hi</a></p>"
#=> 1
#=> 
#=> "<a href='foo'>hi</a><p><a href='bar'>bye</a></p>"
#=> 1

Similarly, an xpath like .//a/@href will only select the attribute in elements not at the root of the fragment.

@tenderlove
Copy link
Member

I believe this is related to the fact that we just need to redo the partial implementation. I suggest that if you can, grab a prerelease version of nokogiri and use the Node#parse method.

We're going to try backing the fragment code with Node#parse for the next release.

@tenderlove
Copy link
Member

I'm starting to think this is either a) expected behavior or b) a bug in libxml2. Apparently switching to the new document fragment stuff I was working on didn't fix this issue.

Anyway, the reason I suspect it's either expected behavior or a bug in libxml2 is that if you adjust the XPath, you can find those elements:

require 'nokogiri'
s1 = "<a href='foo'>hi</a>"
s2 = "<a href='foo'>hi</a>\n"
s3 = "<a href='foo'>hi</a><a href='bar'>bye</a>"
s4 = "<a href='foo'>hi</a>\n<a href='bar'>bye</a>"
s5 = "<p><a href='foo'>hi</a></p>"
s6 = "<a href='foo'>hi</a><p><a href='bar'>bye</a></p>"

[s1,s2,s3,s4,s5,s6].each do |s|
  fragment = Nokogiri::HTML::DocumentFragment.parse(s)
  p s, fragment.xpath('a[@href] | .//a[@href]').length
  puts ""
end

Which will output (using nokogiri master):

"<a href='foo'>hi</a>"
1

"<a href='foo'>hi</a>\n"
1

"<a href='foo'>hi</a><a href='bar'>bye</a>"
2

"<a href='foo'>hi</a>\n<a href='bar'>bye</a>"
2

"<p><a href='foo'>hi</a></p>"
1

"<a href='foo'>hi</a><p><a href='bar'>bye</a></p>"
2

I am researching more.

@bogdan
Copy link

bogdan commented Oct 16, 2011

Meanwhile I am using the wrapper:

doc = DocumentFragement.parse("<div id='__wrapper__'>#{body}</div>")
#proccess
result  = doc.xpath("#__wrapper__").to_s

@Phrogz
Copy link
Author

Phrogz commented Nov 22, 2011

See also #370 and #572

flavorjones added a commit that referenced this issue Jan 2, 2015
@flavorjones
Copy link
Member

Folding this into #572, the underlying issue is the same.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

6 participants
@tenderlove @flavorjones @Phrogz @bogdan and others