Skip to content
This repository has been archived by the owner on May 14, 2020. It is now read-only.

Update rx 942450 #1662

Merged
merged 3 commits into from
Jan 22, 2020
Merged

Update rx 942450 #1662

merged 3 commits into from
Jan 22, 2020

Conversation

wjwoodson
Copy link
Contributor

Update regexp in 942450 to reduce false positives. This is essentially the change to require \b before the potential hex string mentioned in #833.

@airween
Copy link
Contributor

airween commented Jan 9, 2020

Hi @wjwoodson,

thanks for the fix - I see that the regression tests passed successfully, but I just checked your regex with pcre2test, and looks like it doesn't work:

  re> /(?i:\b[^\d]?0x[a-f\d]{3,})/
data> %5cA0xf00dsdfdsa
No match
data> foo0xf00
No match

Here is the old version:

  re> /(?i:(?:\A|[^\d])0x[a-f\d]{3,})/
data> %5cA0xf00dsdfdsa
 0: A0xf00d
data> foo0xf00
 0: o0xf00

I think we need more check.

@wjwoodson
Copy link
Contributor Author

Hi @airween, I believe the urlDecodeUni transform in this rule is responsible for the disparity --
%5cA0xf00dsdfdsa should transform to \A0xf00dsdfdsa which will match.

@fgsch
Copy link
Contributor

fgsch commented Jan 9, 2020

I think you could drop the [^\d]? but otherwise looks good to me.

@fgsch
Copy link
Contributor

fgsch commented Jan 9, 2020

Actually, @airween is correct in that your pattern will not match e.g. fooA0xf00 (or %5cA0xf00dsdfdsa for the matter) because the \b after the [^\d]? needs a different type before the A (both o and A are in \w).

That said, I think fooAOxf00 (or %5cA0xf00dsdfdsa) should not be matching.
In other words, it should recognise 0x[a-f0-9]{3,} iff is not part of a longer string.

@wjwoodson
Copy link
Contributor Author

Yes var=foo0xf00 is a negative test and is intended not to match. As for [^\d]? I was really just trying to stay consistent with the existing test var=\A0xf00dsdfdsa. If it makes sense to you I will just simplify the expression to \b0x[a-f0-9]{3,} and update tests to match.

@fgsch
Copy link
Contributor

fgsch commented Jan 9, 2020

I'm inclined to say so (drop the [^\d]?) but let's wait for other folks' input.

@airween
Copy link
Contributor

airween commented Jan 10, 2020

I believe the urlDecodeUni transform in this rule is responsible for the disparity --
%5cA0xf00dsdfdsa should transform to \A0xf00dsdfdsa which will match.

ah, sorry, you're right, I totally forgot to check the rule. Sorry again.

@dune73
Copy link
Contributor

dune73 commented Jan 13, 2020

This looks like a good progress on this front. You guys understand much more of this than I do, but I'd lean on dropping the [^\d] too.

942450 is a bit thin on tests. Here is some stuff to enhance the test suite. If you find it useful @wjwoodson, then it might pay to pick some, rework them a bit and add them to the unit tests.

942450-tests.yaml.tar.gz

@wjwoodson
Copy link
Contributor Author

Hi all, I've updated to remove [^\d]? from the expression and updated/added tests.

@fgsch
Copy link
Contributor

fgsch commented Jan 22, 2020

LGTM.

@dune73
Copy link
Contributor

dune73 commented Jan 22, 2020

Thanks for the PR, the update and the additional tests, @wjwoodson. Merging now.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants