-
-
Notifications
You must be signed in to change notification settings - Fork 905
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[bug] DocumentFragment parse behavior wrt body
tag
#3023
Comments
@marcoroth I spent some time on this today, and I have a potential workaround for you that I'd like your feedback on, and if it works for Stimulus Reflex then I'll put some work in to make it easier to use. first some questions about the use caseThe "fragment" above is not a fragment -- it's a complete document. If possible, in these cases, it's better to parse it with I'd like to understand the use case of Stimulus Reflex a bit better to understand if there's a better API that Nokogiri could be offering here. Do you really not know if the content is a document or a fragment? transform the code into an equivalentFirst let me convert the code above to use #!/usr/bin/env ruby
require "bundler/inline"
gemfile do
source "https://rubygems.org"
gem "nokogiri", path: "."
end
html = <<~HTML
<html>
<body class="foo">
<div>hello</div>
</body>
</html>
HTML
puts Nokogiri::HTML4::DocumentFragment.new(Nokogiri::HTML4::Document.new, html).to_html
puts "---"
puts Nokogiri::HTML5::DocumentFragment.new(Nokogiri::HTML5::Document.new, html).to_html Note that the output is the same:
diagnosisBy default, both parsers will try to parse the fragment in the context of a The HTML4 behavior that we see is due to a hack made a long time ago in c23eb88: if a context node isn't specified, and there happens to be a line that matches If you change the script above to format the HTML slightly differently: html = <<~HTML
<html><body class="foo">
<div>hello</div>
</body>
</html>
HTML the output changes for the HTML4 fragment parser:
I really don't like this code very much, for a few reasons:
but it's been there for 14 years now and I can't really change it without breaking people's code. We see what we see for the HTML5 fragment parser because it's working exactly as intended: we parse in the context of a potential workaroundThe puts Nokogiri::HTML5::DocumentFragment.new(Nokogiri::HTML5::Document.new, html, "html").to_html will output
A couple of things to call out:
html = <<~HTML
<html>
<head>
<link rel="stylesheet" href="mystyle.css">
</head>
<body class="foo">
<div>hello</div>
</body>
</html>
HTML
# ... outputs
Perhaps naively, the HTML5 parser's output matches the input more exactly than the HTML4 parser's output. But I have no idea whether this is helpful for Stimulus Reflex's use case. Does this work for you? If not, why not? If so, then I'll do some work to make the context node input available to callers of |
Hey @flavorjones, thanks for taking a look at this!
Yeah this is true, ideally we could pick the right class to parse an HTML document. We have had efforts (like stimulusreflex/stimulus_reflex#601, stimulusreflex/stimulus_reflex#622) that tried to pick the right class depending on the context. But sadly we haven't found one Nokogiri class/method combination that solves all the behaviors we saw/needed. It seems like that we would need a combination of the parts of behavior we saw in these classes:
To explain the use-case a bit more: StimulusReflex exposes a Usually we see people do thing like: morph "#my-element", %(<div id="my-element>My element is updated!</div>) In this case we'd want to use morph "body", %(<body class="body with updated classes">...</body>) In this case In the Ideally Nokogiri would expose an API that would parse HTML as-is. So that it parses the HTML without having any spec in mind. And so, that it also wouldn't try to make it spec-compliant. I guess you could say we are looking for something like an AST for HTML that returns a representation of whatever you give it, be it valid/spec-conform or not. But I would totally understand if this is out-of-scope for a gem like Nokogiri, which is also why I mentioned in stimulusreflex/stimulus_reflex#652 that it might make sense to move off of Nokogiri. The problem is that Rails doesn't really care if you ship spec-compliant HTML to your browser through a Rails request/response cycle. But since StimulusReflex tries to dynamically update the rendered page by re-rendering the view and by passing it through Nokogiri it breaks the flow if you have non-spec-compliant HTML. If people would write HTML5-compliant HTML views we wouldn't have this discussion, but since modern browsers got so relaxed with processing invalid HTML it now gets into the way when StimulusReflex tries to update the view using Nokogiri. Maybe we can also do something in Rails that warns people if their controller action returns invalid HTML?
While I think that a "context node" argument would be useful I don't think it really solves our problem, since we'd still have to know which the right context node is given an arbitrary string of HTML. Working on this parsing topic through different projects and requirements (besides StimulusReflex) I think there's a bigger need for a tool/parser that goes beyond Nokogiri and even Ruby. I'm really unsure how we can and should proceed here. Maybe it would make sense to talk it through face-to-face so we don't talk past each other. Thank you for the support so far! |
@marcoroth Let's chat in slack, I think we'd benefit from higher-fidelity conversation. |
Circling back to close this off: after a good long chat, I think I understand the StimulusReflex use case much better, and I think Marco understands the challenges presented by the HTML5 spec a bit better. In summary, I think we agreed on this set of problem statements, which Nokogiri doesn't help answer today:
Figuring out how to do this as a user of Nokogiri's API is just too hard today. So I made a thing to try to ease this pain: https://github.com/flavorjones/nokogiri-html5-inference I'm going to close this ticket, and hopefully we can continue the conversation about what's working or not in an issue on that repo! Thanks again for bringing this up, @marcoroth! |
Thanks for the summary and for talking this through @flavorjones, it's really appreciated! 🙏🏼 |
I spent a little time testing if this could be parsed as a fragment with a template node as the context but unfortunately a body start tag is ignored in that case. Parsing using a template node as the context node may help the other cases like parsing td elements. |
…t parsing (#696) This pull request implements the [`nokogiri-html5-inference`](https://github.com/flavorjones/nokogiri-html5-inference) to deal with parsing HTML5 fragments to be used with Selector Morphs. The `nokogiri-html5-inference` gem come about out of the discussions in #652 and sparklemotion/nokogiri#3023. Fixes #652, Resolves #692, Resolves #674 Big thanks to @flavorjones to talking and working this through with me! ## Why should this be added This pull request makes the handling of document fragments for selector morphs more natural/predictable and resolves the last open issues to release the final versions of StimulusReflex 3.5.
Please describe the bug
emits:
Expected behavior
I would expect:
Additional context
@marcoroth reported at RubyConf! woot
The text was updated successfully, but these errors were encountered: