Skip to content

read(io, Char) doesn't match collect(string) for malformed UTF-8 #50532

Closed
@stevengj

Description

@stevengj

The following mismatch seems undesirable to me: the same data "\xfc\xa8" is treated as 2 (malformed) characters for collect but as only 1 character for read:

julia> s = "\xfc\xa8"
"\xfc\xa8"

julia> io = IOBuffer(s);

julia> read(io, Char)
'\xfc\xa8': Malformed UTF-8 (category Ma: Malformed, bad data)

julia> collect(s)
2-element Vector{Char}:
 '\xfc': Malformed UTF-8 (category Ma: Malformed, bad data)
 '\xa8': Malformed UTF-8 (category Ma: Malformed, bad data)

cc @StefanKarpinski, the guru of malformed Char, who wrote this read code and the string iteration in #24999 — is this intentional?

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugIndicates an unexpected problem or unintended behaviorioInvolving the I/O subsystem: libuv, read, write, etc.strings"Strings!"

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions