ergonomics & encodings

Hi there,

I've just been playing around with quick-xml to handle reading and writing of [Glyph Interchange Format](http://unifiedfontobject.org/versions/ufo3/glyphs/glif/#contour) files, and I had some observations I wanted to share.

## `[u8]`/`Cow<[u8]>` vs `Cow<str>`

My understanding is that `quick-xml` attempts to do the least work possible, and to this end exposes byte slices by default. I can understand the reasoning (don't allocate unless the user has asked for it explicitly) but there are two things I keep thinking about, here:

- _utf-8 is common_: the majority of XML I've been dealing with is encoded in utf-8. In this case, we should be able to provide direct access to string slices, allocating only if we need to escape something.

- _In a non-utf8 encoding, it is hard to use bytes correctly_: exposing bytes directly encourages users to do things like `if attr.key == b"mykey"` (which is used in the readme). This is fine with an ascii-compatible encoding like utf-8 or Shift JIS, but will fail unexpectly on, say, utf-16; and utf-16 is an explicitly supported encoding according to the [xml spec](https://www.w3.org/TR/REC-xml/#charencoding).

## API thoughts

I haven't thought about this too too much, but I do worry that the current behaviour has worse ergonomics than necessary in the general (utf-8) case, while also making it easy to write bugs in the exceptional (utf-16) case.

One possible alternative I could imagine would be having separate `Event` and `EventRaw` types; `Event` would work with `Cow<str>`, which would be slices of the underlying buffer where possible (when the encoding is utf-8) and which would be decoded and allocated during parsing, otherwise. `EventRaw` would work like the current `event`.

Another option would be to have add some new type, `XmlSlice`, which would include a lazily-allocating `to_str` method, while also exposing access to the underlying bytes and even the encoding; this type would be used everywhere that `[u8]` or `Cow<[u8]>` is, currently. This could also include information like the location of this text in the original document.


## thanks

I want to clarify that this is not a feature request or a concrete proposal for a new API; I just wanted to voice these thoughts and hear if anyone had anything to add. It may be that what I'm describing should just be an alternative crate, or that there are other design constraints I'm not aware of.

Thank you for the time you've put into this code so far!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

ergonomics & encodings #158

`[u8]`/`Cow<[u8]>` vs `Cow<str>`

API thoughts

thanks

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

ergonomics & encodings #158

Description

[u8]/Cow<[u8]> vs Cow<str>

API thoughts

thanks

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions

`[u8]`/`Cow<[u8]>` vs `Cow<str>`