Skip to content

Add fix and tests for \x with no hex digits #478

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Sep 17, 2024

Conversation

NWilson
Copy link
Member

@NWilson NWilson commented Sep 17, 2024

Why mess around with something that's broken, and which no users are likely to want?

Grasp the nettle, and simply disallow \x with no following hex digits (following non-hex character, or at end of pattern).

As documented in the code comments added, this diverges from Perl's default behaviour, but aligns with Perl+warnings, so is consistent with best-practices Perl usage.

@PhilipHazel PhilipHazel merged commit 97a2937 into PCRE2Project:master Sep 17, 2024
12 checks passed
@PhilipHazel
Copy link
Collaborator

I hadn't known about Perl's warning. Good point. I have merged and will edit the docs.

@PhilipHazel
Copy link
Collaborator

Carlo asked which version of Perl does the warning - though I can't see his comment here - just his email. Looks like Perl is half baked:

Perl v5.40.0

/.*abc\xz/
Non-hex character 'z' terminates \x early.  Resolved as "\x00z" in regex; marked by <-- HERE in m/.*abc\x <-- HERE z/ at (eval 1) line 1, <INFILE> line 1.

It doesn't give a warning if \x is the last thing in the pattern.

@NWilson
Copy link
Member Author

NWilson commented Sep 18, 2024

I found this helpful site:

https://perlbanjo.com/bca01b0a27

The use re 'strict' error was introduced in Perl 5.22.4 (9 years ago).

https://perlbanjo.com/394c21e056

The use warnings logging (for this \x issue) has a warning going back as far as 5.8.9 (and probably older). The warning message was made more detailed in 5.32.1 (3 years ago).


Finally, the trailing \x at end of pattern generates no warning with use warnings on the latest Perl, but with use re 'strict' it is forbidden since 5.22.4 (9 years ago).

Comment on lines +2099 to +2100
when "use warnings" is enabled, and it's forbidden when "use strict"
is enabled. Because we don't have warnings, we simply forbid it. */
Copy link
Contributor

@carenas carenas Sep 20, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this comment needs updating so it correctly states use re 'strict' or even better had that part if removed as shown by:

diff --git a/src/pcre2_compile.c b/src/pcre2_compile.c
index bdf2e8c..d5cd0c2 100644
--- a/src/pcre2_compile.c
+++ b/src/pcre2_compile.c
@@ -2192,8 +2192,8 @@ else
         /* Perl has the surprising/broken behaviour that \x without following
         hex digits is treated as an escape for NUL. Their source code laments
         this but keeps it for backwards compatibility. A warning is printed
-        when "use warnings" is enabled, and it's forbidden when "use strict"
-        is enabled. Because we don't have warnings, we simply forbid it. */
+        when "use warnings" is enabled, Because we don't have warnings, we
+        simply forbid it. */
         if (ptr >= ptrend || (cc = XDIGIT(*ptr)) == 0xff)
           {
           /* Not a hex digit */

I would provide a PR but will be easier if Philip just commits that directly IMHO

Copy link
Contributor

@carenas carenas Sep 21, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oh dear, didn't realize that the "strict" confusion was also reflected in the documentation.

To clarify, Perl also shows warnings when using less than two hexadecimal digits (ex:/\xa/), and the warning we are referring to also apply to /\x/, which is why it doesn't apply to it when it is the last thing in a pattern.

That is documented below, but in the case of the single digit hexadecimals we are ignoring those warnings for backward compatibility. On a side note they all become errors with use re 'strict' as it is also described there.

Again, I am not arguing there should be any changes on behaviour to this code, just pointing out that there is some slight confusion on the rationale for doing them.

@carenas
Copy link
Contributor

carenas commented Sep 20, 2024

use re 'strict'

is still experimental though even in the unreleased version of Perl

the comments in the code referred by mistake to "use strict" though, that doesn't have any effect.

@NWilson
Copy link
Member Author

NWilson commented Sep 20, 2024

The use warnings logging has been around since forever. PCRE2 has precedent of following the dialect of "warning-free Perl" ie we follow what they document as being correct Perl, rather than more permissively follow what they implement. I personally don't feel guilty about fixing this.

@carenas
Copy link
Contributor

carenas commented Sep 20, 2024

I wasn't arguing that we shouldn't fix it, just got tripped by the comment typo and the fact that my test was also unlucky to have \x at the end of the pattern.

BTW thanks for the helpful site tip; \x at the end of the pattern not triggering a warning is not only happening in the last version but also older ones as well, and it is likely because Perl is triggering the warning when it find a character that can't identify as hexadecimal after \x while reading up to two additional characters.

@NWilson NWilson deleted the user/niwilson/fix-bsx-null branch September 23, 2024 09:53
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants