Unsound bug in `ScannerU8SliceAscii`

Hi, thanks for your great contribution in this user-friendly scanner crate first.

We have found a unsound bug in `ScannerU8SliceAscii`, which uses lots of `from_utf8_unchecked` to convert the [u8] to str and then parse to target type, like [`next_parse`](https://github.com/magiclen/scanner-rust/blob/118bbcfd64f9b7d4dcdfdedd6d09cacbcf63d2c0/src/scanner_u8_slice_ascii.rs#L334) and [`next_u8_until`](https://github.com/magiclen/scanner-rust/blob/118bbcfd64f9b7d4dcdfdedd6d09cacbcf63d2c0/src/scanner_u8_slice_ascii.rs#L568).

The contract of `from_utf8_unchecked` says "The bytes passed in must be valid UTF-8", which is intuitively satisfied as long as the bytes are ascii.
However, there is no validation in [`ScannerU8SliceAscii::new`](https://github.com/magiclen/scanner-rust/blob/118bbcfd64f9b7d4dcdfdedd6d09cacbcf63d2c0/src/scanner_u8_slice_ascii.rs#L27) to guarantee the `data` is Ascii.
If the `ScannerU8SliceAscii` is used to scan the non-utf8 bytes, non-utf8 str would be generated and passed to `parse` and might lead to undefined behaviors.

## Suggestions
This unsound problem can be easily handled by ensuring the data is Ascii in `ScannerU8SliceAscii::new`.
For example, add `debug_assert!(data.iter().all(|&x| x < 128))` in it.

> By the way, there is the same unsound problem in `ScannerU8Slice` which may require further consideration of the trade-off between soundness and the efficiency.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Unsound bug in `ScannerU8SliceAscii` #2

Suggestions

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Unsound bug in ScannerU8SliceAscii #2

Description

Suggestions

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions

Unsound bug in `ScannerU8SliceAscii` #2