Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[help] XML Builder and Newlines Oddity #2522

Closed
svoop opened this issue Apr 30, 2022 · 3 comments
Closed

[help] XML Builder and Newlines Oddity #2522

svoop opened this issue Apr 30, 2022 · 3 comments

Comments

@svoop
Copy link
Contributor

svoop commented Apr 30, 2022

To build large and rather cumbersome AIXM (XML) files, I'm currently switching from the "builder" gem to Nokogiri's built-in XML builder. DocumentFragments come in quite handy to build the blocks with which to assemble the final document.

Most DocumentFragments have a container with which everything works like a charm:

require 'nokogiri'

Nokogiri::XML::DocumentFragment.parse('').tap do |document|
  Nokogiri::XML::Builder.with(document) do |builder|
    builder.root do |root|
      root.foo('bar')
      root.fii('bir')
    end
  end
end.to_xml
<root>
  <foo>bar</foo>
  <fii>bir</fii>
</root>

Unfortunately, a few DocumentFragments merely contain a bunch of elements. For some reason, the result is not formatted anymore, everything ends up on one line:

require 'nokogiri'

Nokogiri::XML::DocumentFragment.parse('').tap do |document|
  Nokogiri::XML::Builder.with(document) do |builder|
    builder.foo('bar')
    builder.fii('bir')
  end
end.to_xml
<foo>bar</foo><fii>bir</fii>

Is there a way have newslines after each element as in the containered version?

(I've unsuccessfully looked for methods to extract part of a document into a new DocumentFragment, say, using the first example to build, then extract a DocumentFragment containing only the children of root.)

Thanks for your help!

@svoop
Copy link
Contributor Author

svoop commented Apr 30, 2022

From what I read about other implementations using libxml2, this appears to be a "feature" of libxml2 rather than to happen on Nokogiri's end, right?

In any case, I'm working around this readability issue likes so now:

def build_fragment
  Nokogiri::XML::DocumentFragment.parse('').tap do |document|
    Nokogiri::XML::Builder.with(document) do |builder|
      yield builder
    end
    document.elements.each { _1.add_next_sibling("\n") }   # add newline between tags on top level
  end
end

Then use it:

build_fragment do |builder|
  builder.foo('bar')
  builder.fii('bir')
end

To get:

<foo>bar</foo>
<fii>bir</fii>

@svoop svoop closed this as completed Apr 30, 2022
@flavorjones
Copy link
Member

Please see my explanation at #2521 (comment) for how libxml2 decides to format subtrees when serializing.

TL;DR, if any child of a node is a TEXT, CDATA, or ENTITY_REF node, then that node (and recursively its subtree) will be printed literally, without formatting and indentation.

@svoop
Copy link
Contributor Author

svoop commented Apr 30, 2022

I came across this bit somewhere deep down on Stackoverflow. And knowing that this decision is made by libxml2 ultimately led to this workaround. It's nicely tucked away and doesn't pollute my builder models, so works for me. 🎉

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants