-
-
Notifications
You must be signed in to change notification settings - Fork 114
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Parsing XML preceded by Unicode BOM triggers an error #155
Comments
This would cause a parsing error because the XML library does not handle BOMs. Required to work around netvl/xml-rs#155.
You can see the entire files that reproduce the error in the commit I just referenced above this comment. |
Having the same problem
|
Hello ! An user recently reported a bug to me that tracks down to this. I see there was a pull request opened in 2017 that has never been merged. @netvl, would you be ready to merge a PR that would implement the approach you mention in the PR of just adding a thin |
@lovasoa if you want to create such a wrapper, you don't really need to add it to this library - it would be much easier to publish it as a standalone crate, because it certainly has utility outside of the domain of XML parsing. As I mentioned elsewhere, in the reimplementation branch this problem will be fixed automatically due to how encoding detection with |
Yes, I understand I can work around this bug in my own programs, but I was proposing to work on a patch that will fix it for everyone, since it seems to affect many users. Of course, once you merge your branch adding support for other encodings, you can just ditch this Read wrapper. |
If you agree to merge a temporary solution, you can just close this bug today and hopefully never have to reopen it. |
That's not what I meant. I meant that the wrapper does not really belong to the library, and it certainly won't be enable by default because it would require changing types in the API. Therefore, it would be much more beneficial to the wider community to have this as a separate, independent crate, rather than a part of Having it as a separate crate does not have any downsides (except maybe another crate in dependencies, but with Cargo it's not a big problem), but in terms of general reuse it would be much better, for example, for people who don't need XML parsing but do want to remove BOMs for other purposes. Even when used with this library the code would look almost the same: // external crate
use bom_remover::BomRemover;
let file = File::open("file.xml").unwrap();
let file = BufReader::new(file):
let file = BomRemover::new(file);
let reader = EventReader::new(file); // built into xml_rs
use xml::BomRemover;
let file = File::open("file.xml").unwrap();
let file = BufReader::new(file):
let file = BomRemover::new(file);
let reader = EventReader::new(file); As a bonus point, you won't really need to wait for my review :) And I'm totally all right with adding a link to this new crate somewhere in the docs with a note that if people want to parse XML documents with BOMs, they can use this crate, with some examples. |
Why is that exactly ? Why would we need to expose this wrapper ? It should be completely transparent. The read wrapper will be created and stored as a private field in The problem with having an external crate is that most people who want to parse XML in rust are simply not aware of this bug. They are not actively looking for ways to work around it. Just like me, they write their program, test it, everything works, and some time later, in production, the code encounters an xml string that starts with a BOM, and fails. I really think that this library is the right place to fix it, and I'm ready to help. |
Maybe I misunderstood your comment on the original PR ? You were not proposing a wrapper that would be internal to xml-rs ? How would you then fix the bug when parsing XML from a str, which may also start with a BOM character ? |
In #213 I propose a |
|
BOM is now supported. |
When loading an XML file that begins with a byte order mark, xml-rs raises a parsing error.
Found indirectly when using the
xmltree
crateHere is the first few bytes of my XML file (
cat ATA6616C.atdf|xxd|head -n2
)The text was updated successfully, but these errors were encountered: