Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: support Encoding class for xml write_to #2798

Merged

Conversation

ellaklara
Copy link
Contributor

What problem is this PR intended to solve?

Fixes #2774

Have you included adequate test coverage?

The PR adds two tests that cover both setting encoding with a string and the Encoding class.

Question

I discovered that the html5 node write_to (lib/nokogiri/html5/node.rb) also doesn't support the Encoding class, is this something you'd like me to add as well? Where would be a good place to test this? I couldn't find any encoding tests for this class.

@flavorjones
Copy link
Member

@ellaklara Thank you for contributing this! I've kicked off CI. I think we should support this in the HTML5 version as well. If there aren't tests we should definitely add that coverage. I'll take a deeper look in the next day or so.

@flavorjones
Copy link
Member

OK! I've looked into this a bit more, which was good for me because the tests for serialization encoding are somewhat implicit and sort of spread all over the place, so I'd like to add a bunch of test coverage to go along with this.

So backing up for a minute to re-describe the problem from #2774 in a bit more detail, #write_to isn't the only method we need to ensure supports Encoding objects, it's the whole set of serialization methods. At a minimum this includes:

  • Node#write_to
  • Node#serialize
  • Node#to_html
  • Node#to_xml
  • Node#to_xhtml

And as you pointed out, we also need to test the HTML5 patch for serialization that lives in lib/nokogiri/html5/node.rb. That code's encoding handling is tested in test/html5/test_encoding.rb and test/html5/test_api.rb.

So I think the approach to testing I'd like to take is:

  • write some thorough explicit tests for the current encoding behavior
  • then add some tests for Encoding (similar to what you have, but more)
  • and then make the tests pass

Let me add a commit to this PR with some explicit tests for the current behavior, if you don't mind? Then, if you're up for it, I'd love for you to add the Encoding tests for the other methods and finish the feature!

@flavorjones
Copy link
Member

Aaaand of course I think I'm finding encoding bugs while writing these tests. 😬

@flavorjones
Copy link
Member

flavorjones commented Feb 28, 2023

OK, bug report filed at #2801

But I think that's not material to what we're trying to do in this PR! So ... onward

flavorjones added a commit to ellaklara/nokogiri that referenced this pull request Feb 28, 2023
@flavorjones flavorjones force-pushed the support-encoding-class-for-write_to branch from 9ca7f96 to 7696d6a Compare February 28, 2023 17:54
@flavorjones
Copy link
Member

@ellaklara OK! I've added a commit on this PR (before yours) that introduces a new file, test/test_serialization_encoding.rb that has some coverage for encoded output for XML, HTML4, and HTML5 serialization methods.

Again, only if you're up for it, it would be great to add Encoding tests there, and make sure they work for all those classes and methods. Let me know either way! ❤️

@flavorjones
Copy link
Member

flavorjones commented Feb 28, 2023

I'm working on the failing JRuby tests, so don't worry about those!

Edit: see #2803, I've rebased this PR on top of those commits.

flavorjones added a commit to ellaklara/nokogiri that referenced this pull request Feb 28, 2023
@flavorjones flavorjones force-pushed the support-encoding-class-for-write_to branch from 7696d6a to 89e49c8 Compare February 28, 2023 20:30
flavorjones added a commit to ellaklara/nokogiri that referenced this pull request Mar 1, 2023
@flavorjones flavorjones force-pushed the support-encoding-class-for-write_to branch from 89e49c8 to e7b641b Compare March 1, 2023 03:08
@flavorjones
Copy link
Member

@ellaklara It's been about a week, so I'm going to go ahead and finish the work you so awesomely started! Will be sure to credit you in the CHANGELOG!

@flavorjones flavorjones force-pushed the support-encoding-class-for-write_to branch from e7b641b to a978f75 Compare March 6, 2023 21:19
flavorjones added a commit to ellaklara/nokogiri that referenced this pull request Mar 6, 2023
flavorjones and others added 4 commits March 6, 2023 18:00
This is what libxml2's `xmlsave.c` functions do, so let's make it
explicit in the caller so that JRuby has the same behavior.
Co-authored-by: Mike Dalessio <mike.dalessio@gmail.com>
@flavorjones flavorjones force-pushed the support-encoding-class-for-write_to branch from ab3c496 to 81faf60 Compare March 6, 2023 23:00
@flavorjones flavorjones merged commit 08c2ad8 into sparklemotion:main Mar 7, 2023
@flavorjones
Copy link
Member

Thank you, @ellaklara !

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[feature] Node#write_to and friends should accept Encoding objects for :encoding kwargs
2 participants