Skip to content

as_mut_vec_for_path_buf in windows breaks UTF-8 is_known_utf8 assumption #126291

Closed
@yhx-12243

Description

@yhx-12243

pub struct Wtf8Buf {
bytes: Vec<u8>,
/// Do we know that `bytes` holds a valid UTF-8 encoding? We can easily
/// know this if we're constructed from a `String` or `&str`.
///
/// It is possible for `bytes` to have valid UTF-8 without this being
/// set, such as when we're concatenating `&Wtf8`'s and surrogates become
/// paired, as we don't bother to rescan the entire string.
is_known_utf8: bool,
}

pub(crate) fn as_mut_vec_for_path_buf(&mut self) -> &mut Vec<u8> {
&mut self.bytes
}

I tried this code:

use std::{ffi::OsString, os::windows::ffi::OsStringExt, path::PathBuf};

fn f() -> Result<String, OsString> {
    let mut utf8 = PathBuf::from(OsString::from("utf8".to_owned()));
    let non_utf8: OsString = OsStringExt::from_wide(&[0x6e, 0x6f, 0x6e, 0xd800, 0x75, 0x74, 0x66, 0x38]);
    utf8.set_extension(&non_utf8);
    utf8.into_os_string().into_string()
}

fn main() {
    dbg!(f());
}

I expected to see this happen:

[1.rs:11:5] f() = Err(
    "utf8.non\xED\xA0\x80utf8",
)

Instead, this happened:

[1.rs:11:5] f() = Ok(
    "utf8.non\u{d800}utf8",
)

(Obviously, Strings can't contain \u{d800}.)

1

Meta

rustc --version --verbose:

rustc 1.81.0-nightly (d0227c6a1 2024-06-11)
binary: rustc
commit-hash: d0227c6a19c2d6e8dceb87c7a2776dc2b10d2a04
commit-date: 2024-06-11
host: x86_64-pc-windows-gnu
release: 1.81.0-nightly
LLVM version: 18.1.7

Metadata

Metadata

Assignees

No one assigned

    Labels

    C-bugCategory: This is a bug.I-unsoundIssue: A soundness hole (worst kind of bug), see: https://en.wikipedia.org/wiki/SoundnessO-windowsOperating system: WindowsT-libsRelevant to the library team, which will review and decide on the PR/issue.

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions