Description
Hi there,
I've just been playing around with quick-xml to handle reading and writing of Glyph Interchange Format files, and I had some observations I wanted to share.
[u8]
/Cow<[u8]>
vs Cow<str>
My understanding is that quick-xml
attempts to do the least work possible, and to this end exposes byte slices by default. I can understand the reasoning (don't allocate unless the user has asked for it explicitly) but there are two things I keep thinking about, here:
-
utf-8 is common: the majority of XML I've been dealing with is encoded in utf-8. In this case, we should be able to provide direct access to string slices, allocating only if we need to escape something.
-
In a non-utf8 encoding, it is hard to use bytes correctly: exposing bytes directly encourages users to do things like
if attr.key == b"mykey"
(which is used in the readme). This is fine with an ascii-compatible encoding like utf-8 or Shift JIS, but will fail unexpectly on, say, utf-16; and utf-16 is an explicitly supported encoding according to the xml spec.
API thoughts
I haven't thought about this too too much, but I do worry that the current behaviour has worse ergonomics than necessary in the general (utf-8) case, while also making it easy to write bugs in the exceptional (utf-16) case.
One possible alternative I could imagine would be having separate Event
and EventRaw
types; Event
would work with Cow<str>
, which would be slices of the underlying buffer where possible (when the encoding is utf-8) and which would be decoded and allocated during parsing, otherwise. EventRaw
would work like the current event
.
Another option would be to have add some new type, XmlSlice
, which would include a lazily-allocating to_str
method, while also exposing access to the underlying bytes and even the encoding; this type would be used everywhere that [u8]
or Cow<[u8]>
is, currently. This could also include information like the location of this text in the original document.
thanks
I want to clarify that this is not a feature request or a concrete proposal for a new API; I just wanted to voice these thoughts and hear if anyone had anything to add. It may be that what I'm describing should just be an alternative crate, or that there are other design constraints I'm not aware of.
Thank you for the time you've put into this code so far!