Description
Hi, thanks for your great contribution in this user-friendly scanner crate first.
We have found a unsound bug in ScannerU8SliceAscii
, which uses lots of from_utf8_unchecked
to convert the [u8] to str and then parse to target type, like next_parse
and next_u8_until
.
The contract of from_utf8_unchecked
says "The bytes passed in must be valid UTF-8", which is intuitively satisfied as long as the bytes are ascii.
However, there is no validation in ScannerU8SliceAscii::new
to guarantee the data
is Ascii.
If the ScannerU8SliceAscii
is used to scan the non-utf8 bytes, non-utf8 str would be generated and passed to parse
and might lead to undefined behaviors.
Suggestions
This unsound problem can be easily handled by ensuring the data is Ascii in ScannerU8SliceAscii::new
.
For example, add debug_assert!(data.iter().all(|&x| x < 128))
in it.
By the way, there is the same unsound problem in
ScannerU8Slice
which may require further consideration of the trade-off between soundness and the efficiency.