-
Notifications
You must be signed in to change notification settings - Fork 3.1k
Redact ancestorOrigins using "the document's referrer" #2480
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
I meant to write: "to avoid labels". My bad. This can be fixed when landing, when we'll also need to add a link to web-platform-tests. |
See whatwg/html#1918 for the HTML Standard discussion and whatwg/html#2480 for the HTML Standard change.
|
@annevk Is there a reason you went with checking the referrer policy instead of the proposal in w3c/webappsec-referrer-policy#77 (comment) or something along those lines? The reason I ask is that using the document's referrer policy fails because we can have per element referrer policies. See discussion in w3c/webappsec-referrer-policy#77. I know it's long, but you really should read through it to figure out the various constraints here. :( |
|
Note that I think we could still have the "stop the first time there is a mismatch" behavior with that proposal too, as long as we are very careful to handle the sandboxed iframe case to NOT be treated as a mismatch. |
|
What other than taking the iframe element's referrer policy into account is listed there? Tried skimming through it several times, but not really coming up with something. (Accounting for iframes seems trivial, but wasn't mentioned in #1918 so I didn't think about it.) |
|
Here's a scenario not accounted for even if you consider the iframe element's referrer policy. Say page A at origin 1 loads a same-origin page B in a subframe. A and the relevant iframe element have no referrer policy settings, but B has a referrer policy to not send referrers. The user clicks a link in B to go to a page C, which is at at different origin. C looks at ancestorOrigins. In this situation, I claim that A (and B) have a reasonable expectation of not leaking referrer or ancestorOrigins information to C. My proposal handles this case. Examining only referrer policies of anything in A does not. Note, by the way, that #1918 (comment) explicitly said the Blink folks like the idea of tying this to referrer, not referrer policy. |
|
Hmm okay, so we'd tie it to https://html.spec.whatwg.org/#the-document's-referrer instead. That should be equally simple. |
|
(That also means that my testcase where I dynamically change the referrer policy needs to change.) |
|
@bzbarsky time for another look? If this is acceptable than I'll poke at the tests again. |
source
Outdated
| context</span>.</p></li> | ||
| <ol> | ||
| <li><p>If <var>current</var>'s <span>active document</span>'s <span data-x="the document's | ||
| referrer">referrer</span> is the empty string, then <span data-x="list append">append</span> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, this doesn't work right.
Say I have a parent A, which has a no-referrer referrer policy. It loads an untrusted thing B. B then proceeds to itself load C, and C queries ancestorOrigins. Since the load of C came from B, and B doesn't have a no-referrer thing going on, C has a nonempty referrer. So it gets access to the information that it's loaded inside A.
source
Outdated
| <li><p>Let <var>current</var> be <var>current</var>'s <span>parent browsing | ||
| context</span>.</p></li> | ||
| <ol> | ||
| <li><p>If <var>current</var>'s <span>active document</span>'s <span data-x="the document's |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This part is preexisting, but I expect that use of "active document" here is wrong, in the "security bug, cross-site information leak" sense. Just like most use of "active document" for pretty much everything is wrong in that "information leak" sense.
Consider page A that loads untrusted page B in a subframe. B creates a same-origin (with B) subframe C. C opens a new window and hands that window its Location object.
Now that window polls ancestorOrigins on the object it was handed. If A navigates its subframe to some new site D, then C's browsing context (which may still be around if things were salvageable) still has B-and-D's browsing context as parent, but its active document is D. So the polling window, which is same-origin with B and C but not A or D, would get to find out that A loaded D, which is a cross-site information leak.
We really need to be walking the creator document chain, not the browser context chain, as should pretty much everyone else who walks up to parents. And to the extent that this is more complicated than just walking up the browser context chain, we should fix that. For example, maybe we should have a concept of "parent document" for documents that are loaded in nested browsing contexts or something, which is the creator document of the document's browsing context.
|
I tried to reword your algorithm in something that's closer to standardese:
I'll look into the "parent document" abstraction now. |
|
On IRC we also discussed about returning "null" for the first failure: bz:
I asked:
bz:
|
|
Okay, as for creator/parent document, we used to have something like that, but reduced it to various properties to avoid leaking too much. See #792 which was inspired by a comment you made in w3c/webappsec-secure-contexts#18 (comment). That still seems like a good idea so maybe we should instead grab those properties from the browsing context rather than from the active document. |
This is a much stronger condition on "going back to appending things other than null" than in my proposal. I expect this condition will pretty much never be met. Is this intentional? Good catch on no longer having creator documents. Snapshotting things on the browsing context will be a bit complicated in this case, I think. Maybe it would be simpler to just have this API throw, or return some sort of nonsense if called on a document that is not "fully active" or something? I'm not entirely sure how "fully active" is defined (or whether it is) if we no longer have a concept of parent documents... |
|
You wrote:
That basically means it has to be an opaque origin, right? How could it be "null" otherwise? I don't understand why this case is more complicated with respect to snapshotting on the browsing context. How is this different from secure contexts or some such? (As far as I can tell we already snapshot the appropriate bits.) Fully active is defined in terms of the document being the active document and parent browsing context's their documents also being fully active. |
Ah, I see. I was unclear. My proposal had:
but it's missing a comma. It should be:
That is, in my proposal when you enter the appendNull state, you remember the current "parent's origin" and you stay in the appendNull state until you reach a "parent's origin" that is not same-origin with the one you remembered. Conceptually, sanitize out all the parts of the parent chain that belong to the site that did not want to be leaked. A bunch of this should probably go in informative text in the spec or something, because reconstructing all this reasoning from the algorithm is hard. :(
OK. How about you write down what you think should be snapshotted, and we'll see whether I was wrong. I might just be mis-modeling this in my head.
Secure contexts require snapshotting one single bit of information. This thing here would require snapshotting a list of stuff, right?
OK, so we still have a concept of a browsing context having a parent document, right? That's the one I was suggesting we use for this... (Obviously we have to handle cases of missing browsing contexts and whatnot.) |
|
No, we have a browsing context having a parent browsing context (which has "snapshotted" creator info). We don't have a parent document. |
They I don't understand how you define "fully active". |
|
The definition is at: https://html.spec.whatwg.org/multipage/browsers.html#fully-active. |
|
OK, so we have this concept of "the document through which it is nested". That's the thing that clearly needed to exist to make "fully active" definable, sure. But the point is, we have such a concept. |
|
Sigh. So we put a bit of effort into putting "creator" slots on browsing contexts, but it's rather pointless as the entire "creator" is leaked anyway through "browsing context container" and "nested through". (And of course, how any of this relates to the lifecycle of all these objects nobody really knows.) |
|
So one difference is that "nested through" only gives you parents, not openers, so it's different in that sense. I filed #2508 to see if we can restructure that better. |
|
So looking at this again, I'm actually not sure the active document stuff is a problem, since we instantiate this list (and cache it "forever") when a Location object is created. Is there a scenario under which a Location object can be created without a browsing context and an associated Document that is fully active? And to be clear, the case we are worried about is documents that are earlier or later than current in a browsing context's session history (or documents like that in the chain going up)? |
|
Question: if origin A embeds B and then B embeds A end then that A embeds C without referrer, should we leak the first A to C? I believe your algorithm says yes (and then one I just committed), due to B, but I can also see how it would be reasonable to make it say no (we'd have to replace lastRedactedOrigin with redactedOrigins and do the check for each origin in that set I think). |
Per spec? I'm not sure. What happens if you insert an In UAs, I think this can't happen. Again, not 100% sure. :(
I think doing so is fine, because B could have done that anyway. For example, C could do (Things are more complicated if the same entity controls both A and B, e.g. both subdomains of the same domain. The security model of the web slightly falls down here.) |
|
In case the Location object creation can happen without a browsing context or without the document being fully active due to prerendering or some such, I'm wondering how many other assumptions such a feature violates in the standard today. I'm not sure anything is actively designed with such a feature in mind. |
|
It would be consistent with |
Because they hate IDN, more or less. Fwiw, location.origin on http://кто.рф/ returns "http://xn--j1ail.xn--p1ai" in both Chrome and Safari. So does self.origin in Chrome. I guess we should get some bugs filed on them... |
Let's not ascribe motive. It's unnecessary and not obviously true either, as to end users Chrome and Safari display this domain just fine. Exposing the interchange format to software seems perfectly reasonable. That's also how we deal with UTF-8 percent-encoded paths of URLs across all browsers. I filed #2568 to sort this out. We can block on that here or deal with it in parallel. |
|
More seriously, Chrome's SecurityOrigin class only has one way to get a string out of it, and that one way returns punycode. See SecurityOrigin::BuildRawString.
Unless you make the mistake of copying from your URL bar. Then you get punycode... |
|
But yes, the "hate IDN" was mostly tongue-in-cheek... though they have in fact pushed back every single time on exposing anything but punycode anywhere on the web. |
|
Does that mean anything significant? The host parser in the URL Standard always returns ASCII too, but you can then apply a separate algorithm on that to get Unicode. Also, you first need to do ToASCII before you can do ToUnicode too to avoid some set of mismatches. |
Which "that"? |
|
|
Well, it means that any time someone needs a string for an origin they use the one hammer they have to hand, which is its ToString method. There's nothing else they can do with it. Yes, there could be another method around which then does the ToUnicode. It's not being called any of the places where you might think it would be called. And as a matter of API design, for something like a spec operation on an object you would probably want to actually have an API for doing it, if you planned to ever do it. |
See whatwg/html#1918 for the HTML Standard discussion and whatwg/html#2480 for the HTML Standard change.
See whatwg/html#1918 for the HTML Standard discussion and whatwg/html#2480 for the HTML Standard change.
See whatwg/html#1918 for the HTML Standard discussion and whatwg/html#2480 for the HTML Standard change.
1407609 to
ee7a6fb
Compare
Also rewrite the algorithm to avoid loops and use variables correctly. Tests: web-platform-tests/wpt#5402. Fixes #1918.
ee7a6fb to
a08939f
Compare
See whatwg/html#1918 for the HTML Standard discussion and whatwg/html#2480 for the HTML Standard change.
|
I opened a new PR to replace this one: #11560 |
See whatwg/html#1918 for the HTML Standard discussion and whatwg/html#2480 for the HTML Standard change.
This aligns `ancestorOrigins` exposure with referrer policy, so an embedder can prevent revealing its own origin to embedded documents. If an `<iframe>` uses `referrerpolicy="no-referrer"` or `same-origin` (and the parent and child are cross-origin), the parent’s origin and any same-origin ancestors are replaced with opaque origins (until reaching an ancestor that is cross-origin). Other policies continue to expose full origins. If there's no `referrerpolicy` attribute, the embedder document's referrer policy is used. This approach keeps existing behavior by default (for web compat) while addressing privacy concerns with an opt-out. The algorithm reuses the parent's existing list of ancestor origins, avoiding synchronous cross-process lookups and ensuring a stable snapshot even if ancestors mutate their `referrerpolicy` attributes later. Fixes #1918. Closes #2480.
Also rewrite the algorithm to avoid loops and use variables correctly.
Fixes #1918.
💥 Error: Wattsi server error 💥
PR Preview failed to build. (Last tried on Jan 15, 2021, 7:57 AM UTC).
More
PR Preview relies on a number of web services to run. There seems to be an issue with the following one:
🚨 Wattsi Server - Wattsi Server is the web service used to build the WHATWG HTML spec.
🔗 Related URL
If you don't have enough information above to solve the error by yourself (or to understand to which web service the error is related to, if any), please file an issue.