Skip to content

Replacing by not well-formed string doesn't work on pure Java version #490

Closed
@yokolet

Description

Reported on jruby-users ml: http://old.nabble.com/Nokogiri-1.5.0-Released-to31981984.html

Following code doesn't work correctly on pure Java version.

require 'rubygems'
require 'nokogiri'

input = "<root><p>xyz</p></root>"
doc = Nokogiri::XML(input, nil, 'UTF-8')
p.replace("<s/>A:B")

puts doc.to_s

Pure Java version prints out,

<?xml version="1.0" encoding="UTF-8"?>
<root><s>A:B</s></root>

while CRuby version prints out,

<?xml version="1.0" encoding="UTF-8"?>
<root><s/>A:B</root>

The reason is that Xerces dones't parse not well- formed string. While parsing the string given by a replace method, an exception is raised. Then, Nokogiri tries to parse the given string as HTML adding html and body tags. This time, NekoHTML parses it and creates a document successfully. However, NekoHTML's tag balancer takes part in here and adds closing tag of the "s" element.

Easiest solution is to change Ruby code so that all tags will be a well-formed:

diff --git a/lib/nokogiri/xml/document_fragment.rb b/lib/nokogiri/xml/document_fragment.rb
index f4f7b7d..94f7de5 100644
--- a/lib/nokogiri/xml/document_fragment.rb
+++ b/lib/nokogiri/xml/document_fragment.rb
@@ -11,7 +11,7 @@ module Nokogiri
     return self unless tags

     children = if ctx
-                      ctx.parse(tags)
+                     ctx.parse("<root>#{tags}</root>").xpath("/root/node()")
                    else
                      XML::Document.parse("<root>#{tags}</root>") \
                        .xpath("/root/node()")

As far as I ran rake from JRuby, the change seems no harm to others.

Is this agreeable change?

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions