Skip to content

Tracking issue: UTF-8 decoder in libcore #33906

Closed
@strake

Description

@strake

Update (@SimonSapin): this is now the tracking issue for these items in both core::char and std::char:

  • decode_utf8() which takes an iterable of u8 and return DecodeUtf8
  • DecodeUtf8 which implements Iterator<Item=Result<char, InvalidSequence>>
  • InvalidSequence which is opaque

Original issue:

In libcore we have a facility to encode a character to UTF-8, i.e. char::EncodeUtf8, but no facility to decode a character from potentially-invalid UTF-8, and return 0xFFFD if it reads an invalid sequence, which seems a surprising omission to me as a libcore user, given in libstd we have string::String::from_utf8_lossy.

These options came to mind:

  • A function str::next_code_point_lossy or so which behaves as str::next_code_point but checks whether its input is valid and returns 0xFFFD if not
  • An iterator DecodeUtf8 which one can make from an arbitrary iterator of bytes, which decodes them

Metadata

Metadata

Assignees

No one assigned

    Labels

    B-unstableBlocker: Implemented in the nightly compiler and unstable.C-tracking-issueCategory: An issue tracking the progress of sth. like the implementation of an RFCT-libs-apiRelevant to the library API team, which will review and decide on the PR/issue.final-comment-periodIn the final comment period and will be merged soon unless new substantive objections are raised.

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions