-
Notifications
You must be signed in to change notification settings - Fork 24
Description
Proposal
Problem statement
When Utf8Error::valid_up_to and Utf8Error::error_len are used, their results will almost always be used to get substrings of the original string. However, since Utf8Error does not have a reference to the original string, it cannot have methods to return the substrings.
Utf8LossyChunksIter is also much easier to use.
Motivation, use-cases
This is useful when creating a custom byte string formatter. UTF-8 portions are usually output using the Display implementation for str or str::escape_debug, but invalid portions might require custom formatting.
Example
Code using str::from_utf8 (requires unsafe):
while !string.is_empty() {
let (valid, invalid) = match str::from_utf8(string) {
Ok(string) => (string, &[][..]),
Err(error) => {
let valid_len = error.valid_up_to();
let valid = unsafe { str::from_utf8_unchecked(&string[..valid_len]) };
let mut invalid = &string[valid_len..];
if let Some(invalid_len) = error.error_len() {
invalid = &invalid[..invalid_len];
}
(valid, invalid)
}
};
// formatting for `valid` and `invalid`
string = &string[valid.len() + invalid.len()..];
}Code using the new API:
for chunk in Utf8Chunks::new(string) {
let valid = chunk.valid();
let invalid = chunk.invalid();
// formatting for `valid` and `invalid`
}Solution sketches
Make the following changes, and change the feature for these structs from str_internals to utf8_chunks.
- Remove
Utf8Lossy. - Rename
Utf8LossyChunksItertoUtf8Chunks. - Rename
Utf8LossyChunktoUtf8Chunk. - Rename
Utf8LossyChunk::brokentoinvalid. - Change fields of
Utf8Chunksinto getter methods (i.e.,validandinvalid). - Add:
impl<'a> Utf8Chunks<'a> { fn new(bytes: &'a [u8]) -> Self; }
Links and related work
What happens now?
This issue is part of the libs-api team API change proposal process. Once this issue is filed the libs-api team will review open proposals in its weekly meeting. You should receive feedback within a week or two.