Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closing slashes in single tags (img, br) #47

Closed
l3x4 opened this issue Sep 21, 2011 · 7 comments
Closed

Closing slashes in single tags (img, br) #47

l3x4 opened this issue Sep 21, 2011 · 7 comments
Labels

Comments

@l3x4
Copy link

l3x4 commented Sep 21, 2011

There's an example use case here: http://wonko.com/post/sanitize
It's actually very close to the Readme example, but differs a bit.

html = '<b><a href="http://foo.com/">foo</a></b><img src="http://foo.com/bar.jpg" />'
Sanitize.clean(html, Sanitize::Config::RELAXED)
# => '<b><a href="http://foo.com/">foo</a></b><img src="http://foo.com/bar.jpg" />'

Please note the closing slash on the image tag - that's what I would actually expect as the result.

In reality the same example cuts out that slash:

html = '<b><a href="http://foo.com/">foo</a></b><img src="http://foo.com/bar.jpg" />'
Sanitize.clean(html, Sanitize::Config::RELAXED)
#=> '<b><a href="http://foo.com/">foo</a></b><img src="http://foo.com/bar.jpg">'

Same happens to slashes in br tags for example.
Is there a way to keep these slashes?

Thank you.

@rgrove
Copy link
Owner

rgrove commented Sep 21, 2011

That blog post was the initial announcement of Sanitize, back when Sanitize's output was XHTML by default. Since version 2.0, the default output format has been HTML. To preserve XHTML self-closing tags, just set the :output config option to :xhtml.

@rgrove rgrove closed this as completed Sep 21, 2011
@l3x4
Copy link
Author

l3x4 commented Sep 21, 2011

Thank you very much for the help and for this very useful library! :)

@l3x4
Copy link
Author

l3x4 commented Sep 22, 2011

But wait a sec, I still don't see the expected behaviour here.

html = '<b><a href="http://foo.com/">foo</a></b><img src="http://foo.com/bar.jpg" />'
Sanitize.clean(html,Sanitize::Config::RELAXED.merge({:output => :xhtml}))
#=> "<b>\n<a href=\"http://foo.com/\">foo</a>\n</b><img src=\"http://foo.com/bar.jpg\" />"

Now there are \ns before and after the a tag

@rgrove
Copy link
Owner

rgrove commented Sep 22, 2011

I can't reproduce this. I get the expected output:

#=> "<b><a href=\"http://foo.com/\">foo</a></b><img src=\"http://foo.com/bar.jpg\" />"

What version of Ruby and Nokogiri do you have installed?

@l3x4
Copy link
Author

l3x4 commented Sep 23, 2011

I use ree-1.8.7-head (via rvm) and nokogiri-1.5.0

@rgrove
Copy link
Owner

rgrove commented Oct 1, 2011

You're right. With Ruby 1.8.7 and Nokogiri 1.5.0, I get extra newlines. This definitely isn't anything Sanitize is doing though. It also happens when I call Nokogiri directly:

>> Nokogiri::HTML.fragment('<b><a href="http://foo.com/">foo</a></b><img src="http://foo.com/bar.jpg" />').to_xhtml
=> "<b>\n  <a href=\"http://foo.com/\">foo</a>\n</b><img src=\"http://foo.com/bar.jpg\" />"

You may want to file a bug against Nokogiri.

@l3x4
Copy link
Author

l3x4 commented Oct 1, 2011

Thank you!
I've raised this with Nokogiri: https://github.com/tenderlove/nokogiri/issues/543

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants