Skip to content

Unsound bug in ScannerU8SliceAscii #2

Open
@safe4u

Description

@safe4u

Hi, thanks for your great contribution in this user-friendly scanner crate first.

We have found a unsound bug in ScannerU8SliceAscii, which uses lots of from_utf8_unchecked to convert the [u8] to str and then parse to target type, like next_parse and next_u8_until.

The contract of from_utf8_unchecked says "The bytes passed in must be valid UTF-8", which is intuitively satisfied as long as the bytes are ascii.
However, there is no validation in ScannerU8SliceAscii::new to guarantee the data is Ascii.
If the ScannerU8SliceAscii is used to scan the non-utf8 bytes, non-utf8 str would be generated and passed to parse and might lead to undefined behaviors.

Suggestions

This unsound problem can be easily handled by ensuring the data is Ascii in ScannerU8SliceAscii::new.
For example, add debug_assert!(data.iter().all(|&x| x < 128)) in it.

By the way, there is the same unsound problem in ScannerU8Slice which may require further consideration of the trade-off between soundness and the efficiency.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions