Description
Proposal
Problem statement
When Utf8Error::valid_up_to
and Utf8Error::error_len
are used, their results will almost always be used to get substrings of the original string. However, since Utf8Error
does not have a reference to the original string, it cannot have methods to return the substrings.
Utf8LossyChunksIter
is also much easier to use.
Motivation, use-cases
This is useful when creating a custom byte string formatter. UTF-8 portions are usually output using the Display
implementation for str
or str::escape_debug
, but invalid portions might require custom formatting.
Example
Code using str::from_utf8
(requires unsafe
):
while !string.is_empty() {
let (valid, invalid) = match str::from_utf8(string) {
Ok(string) => (string, &[][..]),
Err(error) => {
let valid_len = error.valid_up_to();
let valid = unsafe { str::from_utf8_unchecked(&string[..valid_len]) };
let mut invalid = &string[valid_len..];
if let Some(invalid_len) = error.error_len() {
invalid = &invalid[..invalid_len];
}
(valid, invalid)
}
};
// formatting for `valid` and `invalid`
string = &string[valid.len() + invalid.len()..];
}
Code using the new API:
for chunk in Utf8Chunks::new(string) {
let valid = chunk.valid();
let invalid = chunk.invalid();
// formatting for `valid` and `invalid`
}
Solution sketches
Make the following changes, and change the feature for these structs from str_internals
to utf8_chunks
.
- Remove
Utf8Lossy
. - Rename
Utf8LossyChunksIter
toUtf8Chunks
. - Rename
Utf8LossyChunk
toUtf8Chunk
. - Rename
Utf8LossyChunk::broken
toinvalid
. - Change fields of
Utf8Chunks
into getter methods (i.e.,valid
andinvalid
). - Add:
impl<'a> Utf8Chunks<'a> { fn new(bytes: &'a [u8]) -> Self; }
Links and related work
What happens now?
This issue is part of the libs-api team API change proposal process. Once this issue is filed the libs-api team will review open proposals in its weekly meeting. You should receive feedback within a week or two.