Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow token selection around a negative lookaround for a regex rule #2368

Closed
danny0838 opened this issue Nov 16, 2022 · 1 comment
Closed
Labels
bug Something isn't working fixed issue has been addressed

Comments

@danny0838
Copy link

danny0838 commented Nov 16, 2022

Related code: https://github.com/gorhill/uBlock/blob/2204451514f1a894a7627793a253a89ebbc6d845/src/js/static-filtering-parser.js#L3044-L3048

I suspect that '' instead of '\x01' should be returned for a negative lookaround.

A negative lookaround is a restrictive rule and does not add a possible match. We can expect that a URL matching a rule /\.foo(?!bar\.baz)bar\./ must match /\.foobar\./ and thus must contain foobar, and foobar doesn't need to be excluded during token selection.

@danny0838 danny0838 changed the title Allow token selection for a regex rule around a negative lookaround Allow token selection around a negative lookaround for a regex rule Nov 16, 2022
gorhill added a commit to gorhill/uBlock that referenced this issue Nov 17, 2022
Fixed flawed extraction of tokens with optional sequences, i.e.
when quantifier could be zero.
Related issue:
- uBlockOrigin/uBlock-issues#2367

Ignore look-around sequences as suggested when normalizing into
tokenizable string.
Related issue:
- uBlockOrigin/uBlock-issues#2368

Fix regex analyzer throwing with trailing `-` in character
class sequence.
Related issue:
- AdguardTeam/AdguardFilters#134630
@gwarser gwarser added bug Something isn't working fixed issue has been addressed labels Nov 17, 2022
@gwarser gwarser closed this as completed Nov 17, 2022
@danny0838
Copy link
Author

danny0838 commented Nov 17, 2022

@gorhill Thank you for the quick fixing.

I think positive lookarounds may (maybe "should") be treated differently, as it sometimes works like \b to mark a word separator, and may be used with a back reference to mimic an atomic group like (?=(regex))\1.

Actually I used to think the original version of treating is intentional as it works well for something like abc(?=def). Although it is flawed for something like abc(?=def)(?=ghi), it's not likely to happen in real world regexes.

Maybe we can do something like handling quantifiers to insert optional \x00 or \x01.

gorhill added a commit to gorhill/uBlock that referenced this issue Nov 17, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working fixed issue has been addressed
Projects
None yet
Development

No branches or pull requests

2 participants