-
-
Notifications
You must be signed in to change notification settings - Fork 904
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[bug] Nokogiri/JRuby line numbering off by one when XML prolog is in the source document #2380
Comments
👋 Thanks for opening this issue. Line numbering in the JRuby implementation is known to be unreliable, you can read about why at #1223 and see how it was made slightly less unreliable (or at least consistently unreliable) at #2177. You can see by my comments in #1223 that I've exhausted my knowledge of the Java APIs to implement this correctly. If you've got the energy and/or the knowledge to implement this correctly, I would very very happily merge a PR that fixes it. |
@flavorjones thanks! I did read both of those. Minimally I wanted to document this edge case. I saw that you had the XML prolog tag in some of the comments there, but it didn't stand out as a particular edge case, so figured filing a defect could be helpful for tracking purposes. Let me take a look at the line numbering logic to see if there are ways to improve that. |
@dabdine I'm curious if you have plans to work on a fix for this? Again, just to be clear, I'm unlikely to work on this and this feels like a good opportunity for a JRuby user to help out by contributing fixes if they can. |
@flavorjones from my analysis, I think we'd need to rewrite the entire DOM parser using a SAX parser to track line numbers as AFAICT Xerces doesn't track this, and the Java interfaces/classes certainly do not. Even then, I believe the SAX parser will track the line number as the line with the closing tag. So, we would also need to validate that the line number is consistent for multiline XML between libxml. If the SAX parser does return the line number of the closing tag in a multiline XML document, we would need to use sibling traversal to recover the starting tag line number. I don't quite have the time to rewrite the DOM parsing logic at the moment, which would undoubtedly be a significant change the current Java implementation. |
OK, you've reached the same conclusion as me, then, which is a) it will be an invasive change b) which doesn't seem worth the investment of developer time right now. Blurgh. I don't usually like to leave an issue open if nobody's going to work on it (some context around why I feel this way is here, but I'm happy to leave this open for a bit if you think it's important. WDYT? |
I think we close it and maybe document the failing jruby cases? Namely that it doesn't take into account XML prolog and multi line xml comments (however, single line does work), maybe referencing this issue. What do you think? |
See context in sparklemotion#2380. The fix on JRuby is expensive and is probably not worth the investment of developer time right now. But let's preserve those failures as a known issue by introducing the `pending` and `pending_if` test helpers.
@flavorjones works for me |
See #2379 for my observation that JRuby/Xerces is emitting line numbers from the final formatted DOM, while CRuby/libxml2 emits line numbers from the original parsed document. I'd like to accept this difference in semantic behavior and document it. |
See context in sparklemotion#2380. The fix on JRuby is expensive and is probably not worth the investment of developer time right now. But let's preserve those failures as a known issue by introducing the `pending` and `pending_if` test helpers.
Merged #2379 |
Please describe the bug
Nokogiri under JRuby does not properly report line numbers when XML prolog is included at the beginning of a document.
Help us reproduce what you're seeing
See this DRAFT PR for tests that fail under JRuby (but should succeed under CRuby -- I can't yet validate that and I created a PR to attempt to use the Nokogiri CI to validate this for me).
#2379
Expected behavior
The line numbers (as demonstrated by the tests) should not be off by one in Nokogiri under JRuby.
Environment
N/A
Additional context
Test output from the referenced PR:
The text was updated successfully, but these errors were encountered: