Fix `escaped`, `escaped_transform`, `satisfy`, `one_of` and `none_of` interpreting `[u8]` as UTF-8 (#1679) #1864

micolous · 2025-09-03T01:52:40Z

Fix a number of issues relating to nom decoding [u8] buffers as UTF-8 when it shouldn't (#1679):

escaped, escape_transform now accept control_char: impl AsChar, rather than char.

This allows the functions to be used with a u8 or b'', which is useful for parsing text-like files that contain binary or non-UTF-8 data (like Lua).
escaped, escaped_transform, satisfy, one_of and none_of now iterate over individual bytes when using a [u8] input, rather than attempting to interpret bytes as UTF-8 sequences.

I've added many tests demonstrating some edge cases of these functions. Many existing tests incorrectly used str for [u8] inputs in some places, which can lead to some unexpected behaviour when handling binary data.

This will probably break API compatibility for a parser that takes a [u8] buffer as inputs and assumes everything is decoded as UTF-8. I'd argue this is incorrect usage anyway – those should be using str.

There's probably other parts of nom that assume [u8] is encoded as UTF-8, but searching for these is hard.

…ytes.

…te (rust-bakery#1679). This is potentially API breaking, as it won't silently coerece `char` to `u8` when working with `[u8]`.

…icate, while fixing the issue of silent UTF-8 usage (rust-bakery#1679)

micolous added 5 commits August 31, 2025 18:58

WIP: make control_char: impl AsChar (rust-bakery#1679)

e4c0cd6

WIP: Update more types

767f2a7

Add tests from rust-bakery#1679, use bytes always when testing with b…

461e41b

…ytes.

Make satisfy, one_of and none_of use AsChar for their predica…

3af9cb6

…te (rust-bakery#1679). This is potentially API breaking, as it won't silently coerece `char` to `u8` when working with `[u8]`.

Roll back satisfy, one_of and none_of to using char as a pred…

8d8509d

…icate, while fixing the issue of silent UTF-8 usage (rust-bakery#1679)

micolous requested a review from Geal as a code owner September 3, 2025 01:52

micolous mentioned this pull request Sep 3, 2025

Support escaped on ASCII + garbage winnow-rs/winnow#819

Open

2 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Fix `escaped`, `escaped_transform`, `satisfy`, `one_of` and `none_of` interpreting `[u8]` as UTF-8 (#1679) #1864

Fix `escaped`, `escaped_transform`, `satisfy`, `one_of` and `none_of` interpreting `[u8]` as UTF-8 (#1679) #1864

micolous commented Sep 3, 2025

Uh oh!

Uh oh!

Fix escaped, escaped_transform, satisfy, one_of and none_of interpreting [u8] as UTF-8 (#1679) #1864

Are you sure you want to change the base?

Fix escaped, escaped_transform, satisfy, one_of and none_of interpreting [u8] as UTF-8 (#1679) #1864

Conversation

micolous commented Sep 3, 2025

Uh oh!

Uh oh!

Fix `escaped`, `escaped_transform`, `satisfy`, `one_of` and `none_of` interpreting `[u8]` as UTF-8 (#1679) #1864

Fix `escaped`, `escaped_transform`, `satisfy`, `one_of` and `none_of` interpreting `[u8]` as UTF-8 (#1679) #1864