Replacing by not well-formed string doesn't work on pure Java version #490
Description
Reported on jruby-users ml: http://old.nabble.com/Nokogiri-1.5.0-Released-to31981984.html
Following code doesn't work correctly on pure Java version.
require 'rubygems'
require 'nokogiri'
input = "<root><p>xyz</p></root>"
doc = Nokogiri::XML(input, nil, 'UTF-8')
p.replace("<s/>A:B")
puts doc.to_s
Pure Java version prints out,
<?xml version="1.0" encoding="UTF-8"?>
<root><s>A:B</s></root>
while CRuby version prints out,
<?xml version="1.0" encoding="UTF-8"?>
<root><s/>A:B</root>
The reason is that Xerces dones't parse not well- formed string. While parsing the string given by a replace method, an exception is raised. Then, Nokogiri tries to parse the given string as HTML adding html and body tags. This time, NekoHTML parses it and creates a document successfully. However, NekoHTML's tag balancer takes part in here and adds closing tag of the "s" element.
Easiest solution is to change Ruby code so that all tags will be a well-formed:
diff --git a/lib/nokogiri/xml/document_fragment.rb b/lib/nokogiri/xml/document_fragment.rb
index f4f7b7d..94f7de5 100644
--- a/lib/nokogiri/xml/document_fragment.rb
+++ b/lib/nokogiri/xml/document_fragment.rb
@@ -11,7 +11,7 @@ module Nokogiri
return self unless tags
children = if ctx
- ctx.parse(tags)
+ ctx.parse("<root>#{tags}</root>").xpath("/root/node()")
else
XML::Document.parse("<root>#{tags}</root>") \
.xpath("/root/node()")
As far as I ran rake from JRuby, the change seems no harm to others.
Is this agreeable change?