Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[x-] is not correctly handled #2374

Closed
8 tasks done
danny0838 opened this issue Nov 19, 2022 · 24 comments
Closed
8 tasks done

[x-] is not correctly handled #2374

danny0838 opened this issue Nov 19, 2022 · 24 comments
Labels
bug Something isn't working fixed issue has been addressed

Comments

@danny0838
Copy link

danny0838 commented Nov 19, 2022

Prerequisites

  • I verified that this is not a filter list issue. Report any issues with filter lists or broken website functionality in the uAssets issue tracker.
  • This is not a support issue or a question. For any support, questions or help, visit /r/uBlockOrigin.
  • I performed a cursory search of the issue tracker to avoid opening a duplicate issue.
  • The issue is not present after disabling uBO in the browser.
  • I checked the documentation to understand that the issue I am reporting is not normal behavior.

I tried to reproduce the issue when...

  • uBO is the only extension.
  • uBO uses default lists and settings.
  • using a new, unmodified browser profile.

Description

The revised regex analyzer does not correctly handle [x-], which is treated as [\-] and thus x is not calculated during token extraction.

I have raised an issue to the library developer: foo123/RegexAnalyzer#5. Before the library is fixed, we can preprocess the regex string to escape problem-making "-"s in a character group before sending it to the analyzer:

const REGEX_RE_STR_FIXER = /\\[\s\S]|\[([^\\\]]*(?:\\[\S\s][^\\\]]*)*)\]/g;
const REGEX_RE_STR_FIXER_FUNC = (m, cg) => (cg ? '[' + cg.replace(REGEX_RE_STR_FIXER_CG, REGEX_RE_STR_FIXER_CG_FUNC) + ']' : m);
const REGEX_RE_STR_FIXER_CG = /^\^|(?:[^\\\]]|\\(?:x[0-9A-Fa-f]{2}|u[0-9A-Fa-f]{4}|[^dDsSwW]))-(?:[^\\\]]|\\(?:x[0-9A-Fa-f]{2}|u[0-9A-Fa-f]{4}|[^dDsSwW]))|\\[\s\S]|(-)/g;
const REGEX_RE_STR_FIXER_CG_FUNC = (m, h) => (h ? '\\-' : m);

// fix analyzer bug for e.g. [x-], [-x], [\d-x], [x-\d]
// https://github.com/foo123/RegexAnalyzer/issues/5
reStr = reStr.replace(REGEX_RE_STR_FIXER, REGEX_RE_STR_FIXER_FUNC);

This is somehow ugly and imperformant, but it may be the only way if we're not going to implement a full fix of the library.

Also note that this does not fix the bug in 5th case yet. Fortunately the case should not exist in a real-world URL filter.

A specific URL where the issue occurs.

https://example.com/xfoo

Steps to Reproduce

  1. Add a user rule /^https?://(?:example\.com|ex\.com)/[x-]foo\b/$doc
  2. Visit the URL

Expected behavior

The page should be blocked.

Actual behavior

The page is not blocked.

uBO version

1.45.3b4

Browser name and version

Independant

Operating System and version

Independant

@gwarser
Copy link

gwarser commented Nov 19, 2022

Maybe this is why regex was reported as invalid when - was last in the group?

@danny0838
Copy link
Author

@gwarser Where is it reported?

ECMAscript spec seems to allow such use case.

@gwarser
Copy link

gwarser commented Nov 19, 2022

@danny0838
Copy link
Author

danny0838 commented Nov 19, 2022

@gwarser I have read the thread but I don't think there is a consensus that trailing - is invalid. I think it is the app/library that has to support it as browsers and the spec support it.

@gwarser
Copy link

gwarser commented Nov 19, 2022

I mean, it's this library limitation.

gorhill added a commit to gorhill/uBlock that referenced this issue Nov 19, 2022
@gorhill
Copy link
Member

gorhill commented Nov 19, 2022

Any more of these cases with regex tokenizer, please add here instead of opening new issue.

@danny0838
Copy link
Author

The developer of Regex.js is fixing the issues and it seems that this issue is mostly resolved by the latest release (with a bug that may happen in a very rare case). Maybe we can simply wait for a while for his working.

@danny0838
Copy link
Author

Issues in foo123/RegexAnalyzer#5 seems resolved in 1.2.0

gorhill added a commit to gorhill/uBlock that referenced this issue Jan 1, 2023
gorhill added a commit to gorhill/uBlock that referenced this issue Jan 1, 2023
@gorhill
Copy link
Member

gorhill commented Jan 8, 2023

Version of 1.2.0 was imported.

@gorhill gorhill closed this as completed Jan 8, 2023
@gorhill gorhill added fixed issue has been addressed bug Something isn't working labels Jan 8, 2023
@micsthepick

This comment was marked as off-topic.

@garry-ut99
Copy link

garry-ut99 commented Sep 2, 2024

/a-z/ should be highlighted orange, but is not

Why, since it matches 3 characters literally, like any others: --- or abc.
You mean /a-z/ or /[a-z]/? Didn't you miss [ and ] in your example?
Also since /[a-z]/ means a single letter, then logically your request would also include making orange other single-char expressions like: \w or \W and similar.

A note: the whole discussion is regarding a fact that a single char filters in uBO are invalid (orange): for example: a or /a/, so a point I see in his request is to unify the syntax highlighting behavior towards single-char filters, so that also regexes which match only a single-char, are considered to be invalid (orange) as well.

@micsthepick

This comment was marked as off-topic.

@micsthepick

This comment was marked as off-topic.

@garry-ut99
Copy link

garry-ut99 commented Sep 2, 2024

/a-z/ should be highlighted orange, but is not

Why

Becuase that's what happens to regexes as a general rule, even simple ones

What happens is:

  • that they have yellow color, not orange
  • not always they have yellow or any color

Also do we perceive colors differently?

Orange color means an invalid filter, why do you want /a-z/ become invalid?:

That's a misconception - to be explicitly clear - if you tried putting a regex in that IS valid, and doesn't happen to contain an ungrouped a-z, etc. it WILL be marked orange, error or not.

You didn't provide an exact regex example to reproduce, some of these are yellow, some even don't have colors, but none are orange, which one do you exacty mean:

@micsthepick

This comment was marked as off-topic.

@micsthepick

This comment was marked as off-topic.

@micsthepick

This comment was marked as off-topic.

@micsthepick

This comment was marked as off-topic.

@micsthepick
Copy link

micsthepick commented Sep 2, 2024

Syntax highlighting is confusing:

/ex-ample\.com/ has no coloration
/example\.com/ has coloration
and yet no message to explain why.

@garry-ut99
Copy link

garry-ut99 commented Sep 2, 2024

micsthepick : a less crude example:
/ex-ample\.com/ has no "error"
/example\.com/ has 1 "error"

I see your point, but also one detail by the way: these are warnings, not errors, and such filters still work, just like in browser dev console there are warnings and errors.


micsthepick : #3362 (comment)

SPOILER (REPLY)

#3362 (comment)

micsthepick : the only thing I found aggressive was "unless you have seen that thread already, and you were inspired by it, and you still want to embed the solution into uBO itself to be even better." not anything else. Is that presumption?

You've just quoted my statement and still didn't explain what is bad/wrong in it, hence I am unable to determine what the hell is your problem.

micsthepick : It's also a bit ironic that you link to a post where you call someone a clown and yet continue to call me aggressive.

What is ironic is that I called him like that because he was aggressive towards me for no reason, just like you, and he throwed towards me "is everything okay at home, buddy?" which wasn't nice and is considered calling names as well, hence he deserved to be called accordingly as well. I wonder how did you miss that, and how did you cherry-pick only my statement, maybe because you're blind, but I think that rather you're just purposely trolling, you run out of arguments pretty fast, go troll elsewhere, and don't turn facts upside down.

@gorhill
Copy link
Member

gorhill commented Sep 2, 2024

/ex-ample\.com/ has no "error"
/example\.com/ has 1 "error"

Orange is not an error, it just means no token can be extracted from the regex, and as such the filter will be less efficient.

A token can be extracted from the first one, ample. No token can be extracted from the second one.

@garry-ut99
Copy link

garry-ut99 commented Sep 2, 2024

Also helpful threads related to uBO regex tokenization:

Some similiar examples from old version of uBO tokenizer:

Revision A: gorhill/uBlock#2781 (comment):

token: konto, regex: /moje-konto\/inzeraty/
token: *, regex: /moje-inzeraty/
token: *, regex: /prime-ads/

Revision B: gorhill/uBlock#2781 (comment):

token: konto, regex: /moje-konto\/inzeraty/
token: *, regex: /moje-inzeraty/
token: konto, regex: /moje-konto\/inzeraty/
token: *, regex: /ad-pay/
token: bin, regex: /cgi-bin\/conad.fcgi/
token: *, regex: /prime-ads/

@gorhill
Copy link
Member

gorhill commented Sep 2, 2024

The ultimate revision is the one based on a regex analyzer, introduced in gorhill/uBlock@426395a.

@garry-ut99
Copy link

#2374 (comment) : micsthepick :

Syntax highlighting is confusing:

/ex-ample\.com/ has no coloration
/example\.com/ has coloration
and yet no message to explain why.

Like on the 3rd screenshoot from #2374 (comment) - there is a message (tool tip) displayed when an error is highlighted, but not when a warning is highlighted. I don't know whether it's an inconsistency or intentionally by design. Besides, another related thread (in case u didn't see):

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working fixed issue has been addressed
Projects
None yet
Development

No branches or pull requests

5 participants