Skip to content

Commit

Permalink
Upgrade the default rules (#134)
Browse files Browse the repository at this point in the history
* Upgrade the default rules

Add rules for GMX, DuckDuckGo, Tumblr, YouTube, Amazon and Bing, and replicate functionality of Neat URL webextension at Smile4ever/Neat-URL

* Import rule from 0x01h/gif-tracking-protection

Co-authored-by: Geeknik Labs <466878+geeknik@users.noreply.github.com>

* Upgrade GIF filter to general image filter

Now filters JPG, PNG, GIF and WEBP. It's also based in regular expressions, to filter URLs with different capitalizations (http://bad.site/TRACKER.GIF?id=666) and to register less false positives (http://good.site/giftrackingcounter?lang=en)

Co-authored-by: crssi <herbert@knavs.net>

* Revamp parameter trimming in images

Every request for an image is filtered now, regardless of name or file format. Exemptions for common crop and size parameters, and special rules for particular sites (Facebook, Instagram, WhatsApp) have been added.

Co-authored-by: crssi <herbert@knavs.net>
Co-authored-by: Geeknik Labs <466878+geeknik@users.noreply.github.com>

* Tweak image URL trimming and add Google Street View exception

Also some optional whitelist rules to restore functionality in YouTube.

* Add more terms to the blacklist of general parameter trimming

As detailed in arkenfox/user.js#149 (comment)

Co-authored-by: crssi <herbert@knavs.net>

* Tweak URL parameter trimming in images

Add exemptions for:
- embedded interactive Google Maps in 3rd-party sites,
- any kind of image shown in DuckDuckGo search results,
- and map and aerial view tilesets following WMS and WMTS standards (like one would find while editing OpenStreetMap).

* Unblock cas shown when registering a WikiMedia account

by whitelisting two innocuous URL parameters: `title` and `wpCaptchaId`.

* Tweak filter tracking parameters rule:

exempt common parameters found in websites that use IDs for the picture, instead of static paths (as found in region government page www.xunta.gal).

* Unblock Reddit pictures

They can be later targeted by a different rule if necessary.

* Add more image parameters to the whitelist

Trimming any of these parameters blocks captchas when creating an email account at Microsoft

* Whitelist name parameter

Some webpages (mis)use it for size selection, and some pictures in Twitter don't load without this.

* Remove two duplicate filters

* Add whitelist rules for several sites

- Dafont (font shopping and typesetting tests)
- Fontstruct (idem)
- Fontshop (idem)
- SignBank (transcription of sign languages)

* More exceptions for well-known sites

GMX Mailbox makes extensive use of the SID parameter and will not work otherwise. It will refuse to load importart content or enter a redirection loop.

The other rules are for Reddit, maps in Facebook, Google-based embedded maps, and the Ubuntu wiki.

* Innocuous image parameters to whitelist

Seen in https://incubator.wikimedia.org/wiki/Wp/ase/AS10002S1f548M519x514S1f548481x490S10002489x487_AS1f550S15a37S20e00S26502M531x512S15a37501x488S1f550507x495S20e00487x499S26502469x498 but might appear in typesetting pages and custom banner pages, too.

* More general-purpose changes

- Add more whitelisted URL parameters to the general image filter, in particular, PHP parameters used by Oracle's Site-Satellite cache.
- Improve support for maps embedded in Facebook.
- Trim unnecessary URL parameters in embedded Google maps.

* Clearer and louder whitelist rules

Most whitelist rules are now separate from the general image filter. They're also logged by default.

The URL exclusion in the general image filter is best used for requests that will be processed by another filter.

* IMPORTANT CHANGES

- Whitelist rules completely reworked: they're no longer baked into the
filter rules, and are now more specific, easily disabled, and more
helpful (they are also always logged):
  * GMX web client whitelisted (uses `sid` parameter);
  * DuckDuckGo whitelisted (helpful, and built with privacy in mind);
  * CAPTCHAs are logged as a whitelist rule now, ensuring no parameters
  are removed when creating a new account;
  * user avatars and user karmas are a logged whitelist rule now for
  similar reasons;
  * Reddit external previews are whitelisted by necessity;
  * Youtube seekbar and thumbnail previews are folded into a single
  whitelist filter now.
- New filter to anonymize Reddit's banner and community images.
- New filter to avoid image downsamplers: when enabled, retrieves the
original picture from the original domain. Can be disabled.
- New filter for cdn.embedly.com (an unnecessary wrapper for embedding
videos seen, by instance, in the site Know Your Meme)
- Google Street View filter disabled by default as it breaks Street View
in Firefox ESR (works correctly in the most current desktop Firefox).
- Redundant filter deleted (for URL paramater `fbclid`).
- General image filter tweaked:
  * whitelisted parameters for a couple systems of signed URLs, like Amazon's
  (https://docs.aws.amazon.com/AmazonCloudFront/latest/DeveloperGuide/private-content-signed-urls.html)
  and Facebook's (several URL parameters beginning with `_nc_`);
  * whitelisted innocuous parameters that don't break webpages, but
  result in a much longer log, like `*style`, `version`, `preview`
  and `i10c` (this last one is used for downsampling user avatars);
  * removed `url` from the whitelist (it will break some image in some
  random website but it is NOT worth it);
  * removed `wpCaptchaId` and `userid` as they are covered by the
  whitelist rules for user avatars and CAPTCHAs now.

* Improvements for VK, LinkedIn, and Facebook and others

- Some site-specific anti-redirector filters were merged into a generic
one. This new filter is effective against social network VK's
redirector.
- Tags are now more descriptive and unique for every filter.
- Facebook brands:
   * WhatsApp Web filters changed to fit the new avatar URLs.
   * Instagram's redirector is now accounted for.
   * More URL parameters used by Facebook blacklisted. `igshid` blacklisted globally.
- Removed `wprov` parameter by Wikipedia.
- Whitelisted more parameters for a different syntax for Amazon searches
- The optional YouTube filter now also blocks the "watchtime" images (unblocked recently at EasyPrivacy and AdGuard because of problems with logged accounts)
- General image filters:
   * Exceptions to this filter are clearer now.
   * Whitelisted URL parameters `bg` and `fg` (background and foreground colour) and `latex` (used for at least one LaTeX renderer, at WordPress). Also whitelisted `quality`, `sign`, `ssl`, `token-hash` and `token-time` (some of these are necessary at VK).
   * A second filter added for a gallery syntax (a PHP script to select an image based on numeric IDs). `uuid` is removed from the whitelist on the first filter.
- Whitelist rules:
   * Added rules for YouTube icons on profile pages (low impact) and LinkedIn (very high impact).
   * Images from `outlook.office.com` (Outlook's web client) and `www.osapublishing.org` (a site publishing scientific papers) are now allowed in same-domain policy.
   * Images from GettyImages, iStockPhoto, ImageBank and AltMetric are now allowed globally.
   * Tweaked rule for whitelisting avatars.
   * Rule for MoinMoin-powered wikis is now global.
   * ReCAPTCHA rule merged into general CAPTCHA rules.
   * Whitelisted maps and street view for Google, Bing, and HERE

* General image filters are now more granular and based on regexps

Also, more whitelist rules, and tweaks to other rules.

* Image server from LinkedIn is now whitelisted

URLs are very different, often change, and never use optional query parameters. Whitelisting the whole server will not cause the browsear to leak extra data to LinkedIn.

* Add filter for LinkedIn URLs

This filter should cut tracking parameters from hyperlinks at LinkedIn. Exceptions for special pages like lost password retrieval have been made.

Also, tweaks to other filters.

* Unbreak Disqus and LinkedIn, and tweak other rules

* Unbreak Amazon, Linkedin and Google

-Amazon's cart
-Linkedin's password management and recovery screens
-Google's reCAPTCHA

Co-authored-by: Geeknik Labs <466878+geeknik@users.noreply.github.com>
Co-authored-by: crssi <herbert@knavs.net>
  • Loading branch information
3 people authored Dec 25, 2020
1 parent aa79c88 commit b75d0e0
Showing 1 changed file with 1,231 additions and 48 deletions.
Loading

0 comments on commit b75d0e0

Please sign in to comment.