Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add wildcard support for TLD in $domain filters #1008

Closed
8 tasks done
ediowar opened this issue May 3, 2020 · 26 comments
Closed
8 tasks done

Add wildcard support for TLD in $domain filters #1008

ediowar opened this issue May 3, 2020 · 26 comments
Labels
enhancement New feature or request fixed issue has been addressed

Comments

@ediowar
Copy link

ediowar commented May 3, 2020

Prerequisites

  • I verified that this is not a filter issue
  • This is not a support issue or a question
  • I performed a cursory search of the issue tracker to avoid opening a duplicate issue
    • Your issue may already be reported.
  • I tried to reproduce the issue when...
    • uBlock Origin is the only extension
    • uBlock Origin with default lists/settings
    • using a new, unmodified browser profile
  • I am running the latest version of uBlock Origin
  • I checked the documentation to understand that the issue I report is not a normal behavior

Description

[Description of the bug or feature]
AdGuarg rule example
||block.domain^$all,domain=google.*

A specific URL where the issue occurs

[A specific URL is MANDATORY for issue happening on a web page, even if it happens "everywhere"]

Steps to Reproduce

  1. [First Step]
  2. [Second Step]
  3. [and so on...]

Expected behavior:

[What you expected to happen]

Actual behavior:

[What actually happened]

Your environment

  • uBlock Origin version:
  • Browser Name and version:
  • Operating System and version:
@uBlock-user uBlock-user added the enhancement New feature or request label May 3, 2020
@uBlock-user uBlock-user changed the title Add wildcard support for TLD in Network filters Add wildcard support for TLD in $domain filters May 3, 2020
@gorhill
Copy link
Member

gorhill commented May 3, 2020

AdGuarg rule example

Which AdGuard list contains this filter?

@Alex-302
Copy link

Alex-302 commented May 3, 2020

That is just example, we do not have this rule)

@gorhill
Copy link
Member

gorhill commented May 3, 2020

So I am being asked to support something for which there is no current use case?

I always need a use case, actually many use cases when it involves adding complexity to the filtering engine.

@gwarser
Copy link

gwarser commented May 3, 2020

AdGuard has /high-speed-download.png$domain=extramovies.*.

AdGuard doc: https://kb.adguard.com/en/general/how-to-create-your-own-ad-filters#wildcard-for-tld

Also tons of examples in this guy filters https://github.com/kano1/I but I'm not sure he know what he is doing.

@krystian3w
Copy link

DandelionSprout/adfilt#63 (comment)

extramovies.*

[rdk@on filterlists.com_resources]$ grep -r '$.*domain=.*\*'
166_AdGuard Base Filter (AdGuard for Chromium).txt:/high-speed-download.png$domain=extramovies.*
166_AdGuard Base Filter.txt:/high-speed-download.png$domain=extramovies.*
1568_AdGuard Base Filter Optimized.txt:/high-speed-download.png$domain=extramovies.*
1528_AdGuard Base Filter without EasyList.txt:/high-speed-download.png$domain=extramovies.*
2210_AdGuard Base Filter (uBlock Origin).txt:/high-speed-download.png$domain=extramovies.*
1568_AdGuard Base Filter (Optimized).txt:/high-speed-download.png$domain=extramovies.*
2214_AdGuard Base Filter without EasyList (uBlock Origin).txt:/high-speed-download.png$domain=extramovies.*

mail.google.*,gmail.*

2061_Cybo's Simplified Domains.txt:@@||ssl.gstatic.com/ui/v1/icons/mail/images/cleardot.gif$domain=mail.google.*,gmail.*
1836_Cybo's Hosts.txt:@@||ssl.gstatic.com/ui/v1/icons/mail/images/cleardot.gif$domain=mail.google.*,gmail.*
2060_Cybo's Hosts - Extra Format.txt:@@||ssl.gstatic.com/ui/v1/icons/mail/images/cleardot.gif$domain=mail.google.*,gmail.*

kdw*.com Invalid in uBO / AG?

2104_ADgk Mobile Advertising Rules - adgk.txt:||cdn-img.tadpoles.xyz/vipgg/pc^$domain=kdw*.com
2104_ADgk Mobile Advertising Rules - adgk.txt:||wx2.sinaimg.cn/mw1024^$domain=kdw*.com
2104_ADgk Mobile Advertising Rules - adgk.txt:||wx*.sinaimg.cn/large^$domain=kdw*.com
2104_ADgk Mobile Advertising Rules - adgk.txt:||wx*.sinaimg.cn/large^$domain=kdw*.com,important
[rdk@on filterlists.com_resources]$

@DandelionSprout
Copy link

DandelionSprout commented May 3, 2020

https://gitlab.com/eyeo/adblockplus/adblockpluscore/-/issues/123 claims that ABP also supports wildcards in $domain as of 3 months ago, which took me such by surprise that I didn't even think it'd be a possibility until 2 hours ago.

That being said, it doesn't seem to work in ABP 3.8.4 to the degree I've been able to test it in the span of 5 minutes.

@DandelionSprout
Copy link

DandelionSprout commented May 3, 2020

As for an actual current use example, I'd have loved to be able to distill e.g. ||ssl.p.jwpcdn.com^*/sharing.js$important,script,domain=eurosport.no|eurosport.dk|gamereactor.no|gamereactor.dk into ||ssl.p.jwpcdn.com^*/sharing.js$important,script,domain=eurosport.*|gamereactor.* in the regular version of my Nordic list as well (and not just in the AdGuard version), alongside ~15 other similar entries.

@uBlock-user
Copy link
Contributor

ABP also supports wildcards in $domain as of 3 months ago, which took me such by surprise that I didn't even think it'd be a possibility until 2 hours ago.

Neither did I, I asked about this years ago, but gorhill told that ABP syntax doesn't support this, so it never went ahead after that,

@gorhill
Copy link
Member

gorhill commented May 3, 2020

It's not supported by ABP, what they fixed is to reject those filters when encountered.

@peace2000

This comment has been minimized.

@krystian3w

This comment has been minimized.

@peace2000

This comment has been minimized.

@gorhill
Copy link
Member

gorhill commented May 4, 2020

Related: gorhill/uBlock#2133.

@gorhill
Copy link
Member

gorhill commented May 5, 2020

The only reason that I didn't declined this request outright is because of that one small part:

AdGuarg rule example
||block.domain^$all,domain=google.*

Now it turned out the provided filter did not really exist, I couldn't find it. I invested my own time to try to figure in which AdGuard filter list is located the mentioned filter. Other people invested their own time to finally find what @ediowar should have been taking the time to minimally detail as the requester: that there is a single instance of such filter in AdGuard list, specifically /high-speed-download.png$domain=extramovies.*.

And furthermore, there is a need to detail what this filter solves. Is what it solve already solved in another way in uBO? If so what is the real benefit of asking someone else to spend time and effort implementing this in uBO if there is no current benefit to a majority of end users?

So from now on, here is how this will work for request to add filtering feature to static filtering:

If you are not a filter list maintainer with a reasonable enough track record of maintaining good quality filter list(s) in broad use, or if you are not making a convincing case that a specific static filtering feature is of benefit to a majority of end users, the issue will be declined without further comment -- i.e. issues equivalent to "do this kthxbye" are not accepted.

I call these issues drive-by feature requests, i.e. non-long time contributors with little to no time invested to make the case of why someone else than them should invest more time and efforts in adding code and complexity, also taking into account future maintenance work of that code and complexity.

The people who have my utmost attention when it comes to adding static filtering feature are those who actually have a track record of maintaining filter lists used by uBO -- they are the one who I want to help make their life easier whenever I can do so, as technically feasible as possible. As an example, the sole reason of why I agreed to add the cname option recently is to make the work of filter list maintainers easier (denyallow= was not requested but I added it for the same exact reason, to avoid the tediousness of having to craft that sort of filters).

The other people who have my attention are those who care enough to spend their own time making a convincing case of why a specific static filtering feature request is of benefit to a majority of users. Here is an example of such issue.

Now regarding the specific issue of supporting entity syntax in domain= option: @mapx-, @okiehsch, @ryanbr, and all other usual contributors to filter lists, how useful would such feature be?

Is this a must-have feature that you would start using regularly immediately or is it something that won't make a big difference to the workload in the big picture whether it's supported or not? (or any stance in between.)

@mapx-
Copy link
Contributor

mapx- commented May 5, 2020

It would be useful in cases like this:

://192.168.*/images/$important,domain=pornhub.com|pornhub.net|pornhub.org|pornhubthbh7ap3u.onion|xtube.com
!pornhub.*,pornhubthbh7ap3u.onion,xtube.com##+js(aopw, AdDelivery)

to have the same thing to use in both places

Even more, there are lot of sites often changing their TLD (but keeping the same js code / tricks) and the domain=example.com filters become obsolete:
||googlesyndication.com/pagead/js/adsbygoogle.js$script,redirect=noopjs,domain=vev.io|vev.red
requesting our intervention to adjust the filter (adding / removing other TLD)

@DandelionSprout
Copy link

DandelionSprout commented May 5, 2020

If it helps on the matter, I've now asked Andrey Meshkov and his pals on Slack about whether they plan to use $domain wildcards in e.g. AdGuard Base on a much larger scale anytime soon.

If they were to say yes to my inquiry, it could lead to e.g. @@http://adsense.google.$document,domain=google.ad|google.ae|google.al|google.am|google.as|google.at|google.az|google.ba|google.be|google.bf|google.bg|google.bi|google.bj|google.bs|google.bt|google.by|google.ca|google.cat|google.cd|google.cf|google.cg|google.ch|google.ci|google.cl|google.cm|google.cn|google.co.ao|google.co.bw|google.co.ck|google.co.cr|google.co.id|google.co.il|google.co.in|google.co.jp|google.co.ke|google.co.kr|google.co.ls|google.co.ma|google.co.mz|google.co.nz|google.co.th|google.co.tz|google.co.ug|google.co.uk|google.co.uz|google.co.ve|google.co.vi|google.co.za|google.co.zm|google.co.zw|google.com|google.com.af|google.com.ag|google.com.ai|google.com.ar|google.com.au|google.com.bd|google.com.bh|google.com.bn|google.com.bo|google.com.br|google.com.bz|google.com.co|google.com.cu|google.com.cy|google.com.do|google.com.ec|google.com.eg|google.com.et|google.com.fj|google.com.gh|google.com.gi|google.com.gt|google.com.hk|google.com.jm|google.com.kh|google.com.kw|google.com.lb|google.com.ly|google.com.mm|google.com.mt|google.com.mx|google.com.my|google.com.na|google.com.nf|google.com.ng|google.com.ni|google.com.np|google.com.om|google.com.pa|google.com.pe|google.com.pg|google.com.ph|google.com.pk|google.com.pr|google.com.py|google.com.qa|google.com.sa|google.com.sb|google.com.sg|google.com.sl|google.com.sv|google.com.tj|google.com.tr|google.com.tw|google.com.ua|google.com.uy|google.com.vc|google.com.vn|google.cv|google.cz|google.de|google.dj|google.dk|google.dm|google.dz|google.ee|google.es|google.fi|google.fm|google.fr|google.ga|google.ge|google.gg|google.gl|google.gm|google.gp|google.gr|google.gy|google.hn|google.hr|google.ht|google.hu|google.ie|google.im|google.iq|google.is|google.it|google.je|google.jo|google.kg|google.ki|google.kz|google.la|google.li|google.lk|google.lt|google.lu|google.lv|google.md|google.me|google.mg|google.mk|google.ml|google.mn|google.ms|google.mu|google.mv|google.mw|google.ne|google.nl|google.no|google.nr|google.nu|google.pl|google.pn|google.ps|google.pt|google.ro|google.rs|google.ru|google.rw|google.sc|google.se|google.sh|google.si|google.sk|google.sm|google.sn|google.so|google.sr|google.st|google.td|google.tg|google.tk|google.tl|google.tm|google.tn|google.to|google.tt|google.vg|google.vu|google.ws being changed into @@http://adsense.google.$document,domain=google.* on short notice, alongside ~40 similar entries for Google, Amazon, Eurogamer, and other sites, and maybe most of all for Yandex in AdGuard Russian.

@gorhill
Copy link
Member

gorhill commented May 5, 2020

I've now asked Andrey Meshkov and his pals on Slack

We can cc him in case he wants to answer here directly: cc @ameshkov

@ameshkov
Copy link

ameshkov commented May 5, 2020

Hey everyone, yeah, we're going to, but later, when it's properly supported by all AG versions.

edit: which will happen in a couple of months from now. I wish I could be more precise:(

@okiehsch
Copy link

okiehsch commented May 6, 2020

Is this a must-have feature that you would start using regularly immediately or is it something that won't make a big difference to the workload in the big picture whether it's supported or not?

It is a feature that would be useful in the cases that mapx- described and I would use it if available,
I don't think it would make a big difference to the workload for uAssets.
For EasyList the difference would be bigger if they start using it which I doubt as long as AdblockPlus does not support it.

@gorhill
Copy link
Member

gorhill commented May 6, 2020

as long as AdblockPlus does not support it.

From their discussion thread, it does not look like they want to support this. It seems their key argument is worries about false positives but I consider this a secondary argument regarding whether to support the option or not -- the same could be said of many other currently existing filtering options, what matters in the end is that filter list maintainers should be trusted to make the right calls when it comes to use whatever filtering options is at their disposal.

For me the primary arguments is whether this will be used often enough and whether it makes the task of maintaining filter lists easier. So given the comments above, I decided I will support the syntax -- I don't see any issue to implement this code-wise.

@kulfoon
Copy link

kulfoon commented May 6, 2020

DandelionSprout : #1008 (comment): As for an actual current use example, I'd have loved to be able to distill e.g. (...) in the regular version of my Nordic list as well (and not just in the AdGuard version), alongside ~15 other similar entries.

If already talking about distilling:

The longest:

From uBlock Unbreak L3441-L3442 (44) x 2 = 88:

@@||static.ziffdavis.com/sitenotice/evidon-barrier.js$script,domain=allestoringen.be|allestoringen.nl|xn--allestrungen-9ib.at|xn--allestrungen-9ib.ch|xn--allestrungen-9ib.de|downdetector.ae|downdetector.ca|downdetector.c|downdetector.co.nz|downdetector.co.uk|downdetector.co.za|downdetector.com.ar|downdetector.com.au|downdetector.com.br|downdetector.com.co|downdetector.com|downdetector.cz|downdetector.dk|downdetector.ec|downdetector.es|downdetector.fi|downdetector.fr|downdetector.gr|downdetector.hk|downdetector.hr|downdetector.hu|downdetector.id|downdetector.ie|downdetector.in|downdetector.it|downdetector.jp|downdetector.mx|downdetector.my|downdetector.no|downdetector.pe|downdetector.pk|downdetector.pl|downdetector.pt|downdetector.ro|downdetector.ru|downdetector.se|downdetector.sg|downdetector.sk|downdetector.web.tr
@@||static.ziffdavis.com/sitenotice/*/translations/$script,domain=allestoringen.be|allestoringen.nl|xn--allestrungen-9ib.at|xn--allestrungen-9ib.ch|xn--allestrungen-9ib.de|downdetector.ae|downdetector.ca|downdetector.c|downdetector.co.nz|downdetector.co.uk|downdetector.co.za|downdetector.com.ar|downdetector.com.au|downdetector.com.br|downdetector.com.co|downdetector.com|downdetector.cz|downdetector.dk|downdetector.ec|downdetector.es|downdetector.fi|downdetector.fr|downdetector.gr|downdetector.hk|downdetector.hr|downdetector.hu|downdetector.id|downdetector.ie|downdetector.in|downdetector.it|downdetector.jp|downdetector.mx|downdetector.my|downdetector.no|downdetector.pe|downdetector.pk|downdetector.pl|downdetector.pt|downdetector.ro|downdetector.ru|downdetector.se|downdetector.sg|downdetector.sk|downdetector.web.tr

From uBlock filters L8679-L8680 (9) x 2 = 18:
@@||imasdk.googleapis.com/js/sdkloader/ima3.js$script,domain=esgentside.com|exclusivomen.com|gentside.com|gentside.co.uk|gentside.de|gentside.it|maxisciences.com|ohmirevista.com|ohmymag.co.uk|ohmymag.com|ohmymag.de|ohmymag.it
*$script,redirect-rule=noopjs,domain=esgentside.com|gentside.com|gentside.it|gentside.com|gentside.de|gentside.co.uk|gentside.com.br|maxisciences.com|ohmirevista.com|ohmymag.com|ohmymag.com.br|ohmymag.de|ohmymag.co.uk,3p

From uBlock filters L8684-L8685 (8) x 2 = 16:
@@||googletagservices.com/tag/js/gpt.js$script,domain=esgentside.com|gentside.com|gentside.it|gentside.com|gentside.de|gentside.co.uk|gentside.com.br|maxisciences.com|ohmirevista.com|ohmymag.com|ohmymag.de|ohmymag.co.uk
@@*/assets/prebid/$script,xhr,1p,domain=esgentside.com|gentside.com|gentside.it|gentside.com|gentside.de|gentside.co.uk|gentside.com.br|maxisciences.com|ohmirevista.com|ohmymag.com|ohmymag.com.br|ohmymag.de|ohmymag.co.uk

From uBlock Unbreak L2148-L2149 (6) x 2 = 12:
@@||adobedtm.com/*/satelliteLib$script,domain=fcbarcelona.cat|fcbarcelona.cn|fcbarcelona.com|fcbarcelona.es|fcbarcelona.fr|fcbarcelona.jp
@@||adobedtm.com/*/mbox-contents-$script,domain=fcbarcelona.cat|fcbarcelona.cn|fcbarcelona.com|fcbarcelona.es|fcbarcelona.fr|fcbarcelona.jp

From uBlock filters L8447 (12): ||booking.com^$popunder,domain=viamichelin.at|viamichelin.be|viamichelin.ch|viamichelin.co.uk|viamichelin.com|viamichelin.de|viamichelin.es|viamichelin.fr|viamichelin.it|viamichelin.nl|viamichelin.pl|viamichelin.pt

From uBlock Annoyance L788 (9):
@@||imasdk.googleapis.com/js/sdkloader/ima3.js$script,domain=gamereactor.asia|gamereactor.de|gamereactor.es|gamereactor.eu|gamereactor.fi|gamereactor.it|gamereactor.nl|gamereactor.no|gamereactor.pt

Apart from talking about distilling, I also provide a full list of all uBlock filters containing at least 2 domains which differ only by TLD, so you can check whether they will or not, benefit from implementing the wildcard feature:

The Full List:

uBlock filters:
L193-L196 (2) x 4 = 8
L355 (3) (mapx-'s)
L1075 (2)
L1080 (2)
L1190 (2) (mapx-'s)
L1911-L1912 (2) x 2 = 4
L1937 (2)
L3140 (2)
L3228 (3)
L5906 (2)
L6358 (3)
L8447 (12)
L8679-L8680 (8) x 2 = 16
L8684-L8685 (9) x 2 = 18
L11339 (2)
L15780-L15781 (3) x 2 = 6
L17041 (3) (similiar to mapx-'s L355)
L19586 (2)
L20458 (2)

uBlock Unbreak:
L352 (2)
L448 (3)
L2111-L2112 (2) x 2 = 4
L2148-L2149 (6) x 2 = 12
L3176 (3)
L3441-L3442 (44) x 2 = 88

uBlock Resource Abuse:
L124 (4)
L165 (4)
L234 (4)

uBlock Annoyance:
L788 (9)
L2873 (2)

uBlock Privacy:
L83 (2)

Ok, I spent already 5 hours to collect and format the data, enought as for now

@LennyFox
Copy link

LennyFox commented May 12, 2020

I would like to make a plea for wildcard entity-like wildcard support for TLD's in domain.

I use it in Adguard User Filters and it is really handy to cover all variants of country specific websites.

@peace2000
Copy link
Member

peace2000 commented May 12, 2020

Seems that ABP is going to add wildcards as well now: https://gitlab.com/eyeo/adblockplus/adblockpluscore/-/issues/123#note_339550064

@DandelionSprout
Copy link

https://gitlab.com/eyeo/adblockplus/adblockpluscore/-/merge_requests/334 seems to imply it is indeed underway, even if the technical details behind it elude me.

gorhill added a commit to gorhill/uBlock that referenced this issue May 24, 2020
Related issue:
- uBlockOrigin/uBlock-issues#1008

This commit adds support entity-matching in the filter
option `domain=`. Example:

    pattern$domain=google.*

The `*` above is meant to match any suffix from the Public
Suffix List. The semantic is exactly the same as the
already existing entity-matching support in static
extended filtering:

- https://github.com/gorhill/uBlock/wiki/Static-filter-syntax#entity

Additionally, in this commit:

Fix cases where "just-origin" filters of the form `|http*://`
were erroneously normalized to `|http://`. The proper
normalization of `|http*://` is `*`.

Add support to store hostname strings into the character
buffer of a hntrie container. As of commit time, there are
5,544 instances of FilterOriginHit, and 732 instances of
FilterOriginMiss, which filters require storing/matching a
single hostname string. Those strings are now stored in the
character buffer of the already existing origin-related
 hntrie container. (The same approach is used for plain
patterns which are not part of a bidi-trie.)
@uBlock-user uBlock-user added the fixed issue has been addressed label May 26, 2020
@uBlock-user
Copy link
Contributor

Entity support wasn't added for redirect/redirect-rule directives, so re-opening.

@uBlock-user uBlock-user removed the fixed issue has been addressed label Oct 25, 2020
gorhill added a commit to gorhill/uBlock that referenced this issue Nov 3, 2020
This commit moves the parsing, compiling and enforcement
of the `redirect=` and `redirect-rule=` network filter
options into the static network filtering engine as
modifier options -- just like `csp=` and `queryprune=`.

This solves the two following issues:

- #3590
- uBlockOrigin/uBlock-issues#1008 (comment)

Additionally, `redirect=` option is not longer afflicted
by static network filtering syntax quirks, `redirect=`
filters can be used with any other static filtering
modifier options, can be excepted using `@@` and can be
badfilter-ed.

Since more than one `redirect=` directives could be found
to apply to a single network request, the concept of
redirect priority is introduced.

By default, `redirect=` directives have an implicit
priority of 0. Filter authors can declare an explicit
priority by appending `:[integer]` to the token of the
`redirect=` option, for example:

    ||example.com/*.js$1p,script,redirect=noopjs:100

The priority dictates which redirect token out of many
will be ultimately used. Cases of multiple `redirect=`
directives applying to a single blocked network request
are expected to be rather unlikely.

Explicit redirect priority should be used if and only if
there is a case of redirect ambiguity to solve.
@uBlock-user uBlock-user added the fixed issue has been addressed label Nov 3, 2020
@asheroto

This comment was marked as spam.

@uBlockOrigin uBlockOrigin locked as resolved and limited conversation to collaborators Sep 18, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
enhancement New feature or request fixed issue has been addressed
Projects
None yet
Development

No branches or pull requests