Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Shouldn't inner_html convert to UTF8 the same way as inner_text? #117

Closed
naofumi opened this issue Aug 7, 2009 · 4 comments
Closed

Shouldn't inner_html convert to UTF8 the same way as inner_text? #117

naofumi opened this issue Aug 7, 2009 · 4 comments

Comments

@naofumi
Copy link

naofumi commented Aug 7, 2009

There seems to be an inconsistency between how encoding conversion is applied with the inner_text, inner_html and to_html methods.

With #inner_text, I think all output is automatically converted to UTF8.
With #inner_html, encoding conversions are not applied.
With #to_html, you can specify the desired encoding for the result with the :encoding option.

I would prefer that the output for both #inner_html and #to_html are converted to UTF8 by default, but that you can override this with the :encoding option.

At least, it would be nice to be able to pass the :encoding option to #inner_html.

@naofumi
Copy link
Author

naofumi commented Aug 7, 2009

In order to provide an :encoding option for #inner_html, maybe the following example;
in nokogiri/xml/node.rb


def inner_html (*args)
children.map { |x| x.to_html(*args) }.join
end

in nokogiri/xml/node_set


def inner_html (*args)
collect{|j| j.inner_html(*args)}.join('')
end

@tenderlove
Copy link
Member

I've added the ability to pass encoding to #inner_html. I don't want to automatically convert all documents to UTF-8 when calling #to_html. I think that would be bad for people processing documents in something besides UTF-8, and they want the final output to remain the specified encoding.

If you always want the output to be UTF-8, just tell the document that it should be encoded with UTF-8 like so:

doc = Nokogiri::HTML open('http://example.com/')
doc.encoding = 'UTF-8' # Set the document encoding to UTF-8

After doing that inner_html and to_html will return UTF-8 documents.

@tenderlove
Copy link
Member

inner_html takes the same arguments as to_html. closed by ab9a8a0

@naofumi
Copy link
Author

naofumi commented Aug 31, 2009

Thanks. Sounds good.

flavorjones pushed a commit that referenced this issue Apr 7, 2021
This issue was closed.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants