Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Answer #551: explore how bundles might but probably don't help anti-adblockers. #573

Merged
merged 2 commits into from
Apr 20, 2020

Conversation

jyasskin
Copy link
Member

@pes10k
Copy link

pes10k commented Apr 15, 2020

I still disagree with this, as stated from the private thread you created. Here is another example that has the same underlying properties. Take a fingerprinting script, or some other resource (many) users don't want, and that many filter lists target.

Currently, a site could choose to include it in any of the following ways:

  1. Inline the code (best defense against filter lists since there is no URL to target, but expensive for the site bc can cause bad performance, can't be deferred / async'ed, etc)
  2. src= include the code from the original host (easiest option for filter lists to target, easiest deployment option for the site)
  3. move the code first party, src= it as its own resource first party (difficult but not impossible for filter lists to target, medium work for site)
  4. if the code allows it, include the code in a build process (rollup, etc) (basically impossible for filter lists to target, medium for the site, since this code again can't be defer'ed / asynced)

In a web bundle world, the site wishing to do (in this case) fingerprinting, doesn't have to do any of the cost / benefit trade off discussed above. They get a new, strictly superior to all of the above, option: deferable / async able code without a URL filter lists can target.

Since web bundles give sites a new way of delivering code to users, in a way thats effectively unblockable by filter lists, in a way that has zero additional marginal cost to the site, i don't think you can say this helps adblockers; i think the proposal at best is neutral to adblockers, and almost certainly (very) net negative

@twifkak
Copy link
Collaborator

twifkak commented Apr 15, 2020

@pes10k Can you clarify why bundles have "zero additional marginal cost to the site"? It seems like a specialization of your (3), first-party delivery of the resource, but perhaps I'm missing something.

Furthermore, the same randomization to avoid filtering could be achieved through 1p delivery without bundles. For instance, one could:

  • Set up a URL router that says "if path starts with /blog/ and its sha256 % N == 0, then serve JS".
  • Modify the blog generator tool to try multiple variations of the slug (e.g. by adding/removing stopwords) until a non-matching sha256 is found.
  • Pick an N that's large enough so the expected number of variations needed to try is low, but small enough so it's not too costly to build convincing-looking fake blog URLs.

It seems like this would require a similar amount of work from the site -- the most difficulty likely being in making the HTML template dynamic (e.g. without sacrificing the utility of caching gateways).

@pes10k
Copy link

pes10k commented Apr 15, 2020

@twifkak

zero marginal cost

I mean that since the code is already delivered / downloaded as part of the bundle, there is no additional cost to making it an async request vs inlining it.

seems like a specialization of your (3)

Trackers are strictly better in web bundling than in case 3 for two reasons:

  • A site needing to do this w/o web bundles would need to take care to avoid name / URL collisions with other resources the were serving. These are the kinds of patterns filter lists look for. Web bundles free the tracker from even this constraint
  • the tracker reduces the benefit to the defender in a web bundle world, since the defender has already paid the network cost for the unwanted resource

the same randomization to avoid filtering could be achieved through 1p delivery without bundles

Sure, im not arguing that web bundling makes something possible thats currently impossible. Im saying it changes something thats currently moderately-difficult and costly into something easy and free (to the tracker / site / etc).

@twifkak
Copy link
Collaborator

twifkak commented Apr 15, 2020

@pes10k

I mean that since the code is already delivered / downloaded as part of the bundle, there is no additional cost to making it an async request vs inlining it.

I see, I think... but can't the site choose not to include the 3p script in the bundle? IIUC navigation to bundles still allows fetch, albeit with a different Origin header.

A site needing to do this w/o web bundles would need to take care to avoid name / URL collisions with other resources the were serving. These are the kinds of patterns filter lists look for. Web bundles free the tracker from even this constraint

I think a bundler would still need to avoid name collisions within a bundle, at the very least, but I agree it's easy to do that while avoiding detection. As for collisions between bundles and the unbundled web, I think it depends:

  1. Could the UA choose a cached resource over a bundled one?
  2. Could a SW intercept bundled requests?

Im saying it changes something thats currently moderately-difficult and costly into something easy and free (to the tracker / site / etc).

I'm not convinced it's more difficult without bundles, but I could be. My example involves changes that would mostly be internal to the CMS, and hence the cost amortized across its customers. Each customer would need to upgrade its version (something it may already doing regularly), and potentially modify a bit of frontend config (depending on the CMS -- some are end-to-end).

PS -- Sorry I edited your comment. I can't internet. I believe/hope I returned it to its original state.

@jyasskin
Copy link
Member Author

I definitely don't mean to claim bundles make things harder for attackers. As a new option for attackers that doesn't close off any of the old options, it can't possibly make things harder; the best we can hope for is that it doesn't make things easier. I checked, and I think the text says the right thing about this.

#573 (comment) is wrong about the performance implications of inlining code or including it in an existing build process. defer is basically equivalent to putting the inline <script> at the end of its document or to wrapping the code in window.addEventListener('DOMContentLoaded', ()=>{/* the code */});, and there are more ways to explicitly defer work today.

rollup.js'ed scripts deliver bytes at approximately the same time a web package would and defer work just as well.

I've added a paragraph talking about non-ad scripts, since my initial text here ignored that use for "ad" blockers. Fingerprint.js already gives step-by-step instructions for integrating it into a build process, so it's not like web packages could make the bar any lower.

@jyasskin jyasskin requested a review from irori April 17, 2020 16:02
Copy link
Collaborator

@irori irori left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As we are planning to request a TAG review of this proposal soonish, it would be nice to include this content.

@jyasskin jyasskin merged commit 7f17cc5 into WICG:master Apr 20, 2020
@jyasskin jyasskin deleted the anti-adblock branch April 20, 2020 15:47
@pes10k
Copy link

pes10k commented Apr 20, 2020

Now that this has been merged, where is the right place to continue this conversation?

In general, i'm happy to continue discussing point by point above, but lets not loose the forest for the trees. The general claim (one that seems seconded by the content and adblocking tool libraries who 👍 'ed the original issue before this was taken private) is that:

  1. consistent, descriptive URLs are useful for adblocking
  2. this proposal allows reduces the consistency and descriptiveness of URLs by changing them into arbitrary, opaque indexes into an archive.

Are we disagreeing about either of the above points?

Since rollup got mentioned above, its a perfect example here. Before rollup-and-the-link world, content blocking was ideal; URLs described (both conceptually, and frequently) one resource, and the user agent could reason about each URL independently. Post rollup-world, URLs are less useful (though not useless), since JS URLs now describe (often) many resources, about which its increasingly difficult for the UA to reason individually about. (on going research here, etc). URLs represent multiple interests, the user will often feel differnetly about, but which UA's are (generally) forced into an all or nothing position about.

This proposal does the same thing, but for websites entirely! The UA effectively gets just one URL to reason about (the entire web package), but looses the ability to reason about sub resources. This is very (very!) bad if we intend the web to be an open, transparent, user-first system!

Okie, now, replying to individual points, but eager to not loose site of the above big picture…

@jyasskin

#573 (comment) is wrong about the performance implications

This is not correct. Its partially correct in v8, bc in some cases v8 will defer the parsing of function bodies, but (i) even then there are exceptions, and (ii) I have even less familiarity with how other JS engines do this. I know that, for example, spidermonkey does not not defer parsing in cases where v8 will (e.g. JS in HTML attributes, onclick=X), but I dont have enough information to say in general (and I know even less about JavaScriptCore). But, point is

  1. there is in all cases some difference, because there is at lease some additional parsing and executing going on
  2. there may be significant difference in other platforms
  3. caching makes all this even more different, as platforms may differ on how and when they cache inline script
  4. none of this difference hangs on standards defined behavior, and so is not a sound basis for this standard to rely on

@twifkak

can't the site choose not to include the 3p script in the bundle

Sure, a site could choose this, but i'm not sure I follow the point. My point isn't that sites have to evade content blockers in the proposal, its that it gives them new options to circumvent the user's goals / aims / wishes.

As for collisions between bundles and the unbundled web…

Again, im not sure I follow you here. My point is that it'd be simple to change URLs during "bundling" so that they're (i) impossible for content blockers to reason about, and (ii) ensure they don't collide with real world urls. Say, every bundled resource has its URL changed to be a random domain 256 character domain and path.

My example involves changes that would mostly be internal to the CMS, and hence the cost amortized across its customers

Needing to update the large number of existing CMS's seems like a perfect example of why this is difficult for sites! Let alone other costs (loosing cache, in your hash guessing scheme paying an extra network request and on some platform OS thread or process, making static sites unworkable, etc etc etc).

@jyasskin
Copy link
Member Author

The right place to continue discussion is probably #551, which I intentionally didn't close with this PR. FWIW, I disagree with (2): URLs are already arbitrary, opaque indexes into a server if the content wants them to be, and web packages don't make that worse.

@pes10k
Copy link

pes10k commented Apr 20, 2020

okie i will move this over there.

URLs are already arbitrary, opaque indexes into a server if the content wants them to be, and web packages don't make that worse

  1. Yes URLs can be opaque
  2. they are never the less still useful (see Google safe browsing, EasyList, disconnect, caching policies etc etc etc)
  3. again the claim isn't that this proposal does something to fundamentally change URLs, its that it (i) takes something expensive and possible now, and makes it free and trivial, and (ii) basically makes every resource on a site into a single yes / no decision.

I'm sure im repeating myself on the above, so for everyone's sake, i'll copy it over to #551 and then stop

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants