-
Notifications
You must be signed in to change notification settings - Fork 385
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
MSC3910: Content tokens for media #3910
Conversation
* There is currently no way to delete | ||
media. [matrix-spec#226](https://github.com/matrix-org/matrix-spec/issues/226) | ||
* If a user requests GDPR erasure, their media remains visible to all. | ||
* When all users leave a room, their media is not deleted from the server. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@turt2live says: fwiw, while the spec allows servers to GC rooms, Synapse currently doesn't (no server does, I believe)
@richvdh says: yes, this is matrix-org/synapse#4720.
} | ||
} | ||
``` | ||
* Encrypted media messages must include the `content_token` in the cleartext: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@turt2live says:
For clarity, would we do the same as relations and say that if the payload also has a
content_token
that it is ignored? ie: Client sendscontent_token
in payload but not cleartext, or sends it in both.
@richvdh says:
Same as relations, surely.
| [`GET /_matrix/media/v3/download/{serverName}/{mediaId}`](https://spec.matrix.org/v1.4/client-server-api/#get_matrixmediav3downloadservernamemediaid) | `GET /_matrix/client/v1/media/download/{serverName}/{mediaId}` | `GET /_matrix/federation/v1/media/download/{serverName}/{mediaId}` | | ||
| [`GET /_matrix/media/v3/download/{serverName}/{mediaId}/{fileName}`](https://spec.matrix.org/v1.4/client-server-api/#get_matrixmediav3downloadservernamemediaid) | `GET /_matrix/client/v1/media/download/{serverName}/{mediaId}/{fileName}` | N/A | | ||
| [`GET /_matrix/media/v3/thumbnail/{serverName}/{mediaId}`](https://spec.matrix.org/v1.4/client-server-api/#get_matrixmediav3thumbnailservernamemediaid) | `GET /_matrix/client/v1/media/thumbnail/{serverName}/{mediaId}` | `GET /_matrix/federation/v1/media/thumbnail/{serverName}/{mediaId}` | |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@turt2live says:
(origin,mediaId) tuple - media ID is not globally unique on its own.
... but I can't figure out what he's referring to.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
somewhere in the doc it said "look up the media in the cache by media ID" - I think HackMD was trying to be helpful and put the comment in a weird place.
[Question: should we move `/config`, `/upload` and `/preview_url` while | ||
we're at it, per [MSC1902](https://github.com/matrix-org/matrix-spec-proposals/pull/1902)?] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@turt2live says:
yes please, though we should probably leave splitting endpoints out to its own MSC anyways - this to-be MSC could depend on MSC1902.
Context: The title of the MSC imposes "problems" on what can be included in that MSC. Though relevant to the MSC itself, it's not obvious from the title that we'd also be changing where the endpoints live (and making the title a paragraph is really just crap). Suggestion is always to just split out functionality when one would add "and" to the title.
@richvdh says:
The problem with having separate MSCs is that, if we land MSC1902 without the auth changes, then we need another way to distinguish clients that support auth from those that don't.
In this proposal, we can tell when it's safe to turn off the backwards-compatibility "you can get media without auth" behaviour just by looking at the access logs and seeing who is hitting the old endpoint. If we extend existing endpoints, that gets way harder.
|
||
**Two** authorization headers must be given when calling these endpoints: | ||
|
||
* `Authorization` must be set the same as for any other CSAPI or federation |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@turt2live says:
nit: clients can still auth with ?access_token, but obviously discouraged. We should probably just kill that feature in the spec (independent of this MSC)
`X-Matrix-Content-Token` header, and returns either the media or a 403 | ||
error. | ||
|
||
If the media is uncached, the user's server makes a request to `GET |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The bridge might also choose to embed information such as the room that | ||
referenced the media, and the time that the link was generated, in the URL. | ||
(Such information would require an HMAC to prevent tampering.) This could be | ||
used to help control access to the media. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@turt2live says:
Sending a meta event into the room would be important for the server to flag a "usage" of the content, otherwise it'd be subject to expiration/GC.
(This was previously discussed in | ||
[MSC2858](https://github.com/matrix-org/matrix-spec-proposals/pull/2858#discussion_r543513811).) | ||
|
||
## Potential issues |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@turt2live says:
With the proposal above, the origin server for a piece of media could count a piece of media (it's original copy) as "unreferenced" if all visible copies are dereferenced. Specifically, if a user on alice.com sends something to a room and a user from bob.com forwards that to a room visible to charlie.com, alice.com could eventually consider the media unreferenced and charlie.com users wouldn't be able to download it.
(This was previously discussed in | ||
[MSC2858](https://github.com/matrix-org/matrix-spec-proposals/pull/2858#discussion_r543513811).) | ||
|
||
## Potential issues |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@turt2live says:
Issue 2: Custom emoji and such rely on
<img />
tags in the message HTML - clients (and the spec) generally require that the imgsrc
be an MXC URI - presumably we just add amx_content_token
attribute or something to fix this case.
b. Backwards compatibility with older clients and federating servers: | ||
servers may for a short time choose to allow unauthenticated access via the | ||
deprecated endpoints, even for media with an associated `content_token`. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@turt2live says:
I'm not sure this would be needed? The origin server (over federation) should still support the request if it also produced a content_token.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Consider a situation where @alice:orgin.com
is using a new server, and has uploaded some media. The origin.com
server now needs to deal with two sets of legacy implementations:
@bob:origin.com
, who is using an old client which doesn't know how to use content tokens.@charlie:remote.com
, whereremote.com
is an old server implementation. Whether or not charlie's client understands content tokens,remote.com
isn't going to forward them.
In either case, origin.com
will receive a request to the deprecated endpoint, and may choose to honour it (for now).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Somewhere it was clarified that this was talking about old clients talking to old servers in rooms where "new" media is posted - this wasn't immediately clear from the text, so I think it just needs a relatively minor clarification?
use the server) *and* the `content_token` (to prove they have the right to see | ||
the media in question). | ||
|
||
Servers can count the number of references to a given `content_token`, and |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Servers can count the number of references to a given `content_token`, and | |
Servers should count the number of references to a given `content_token`, and |
This should hopefully make things better for every server?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could we patch the metadata leaks later with another MSC for example?
Because with P2P we will eventually want to minimize the metadata leaked by the protocol
Is more metadata leaked than before this MSC?
* And `m.room.member` must include an `avatar_content_token`: | ||
```json | ||
{ | ||
"type": "m.room.member", | ||
"state_key": "@alice:example.org", | ||
"content": { | ||
"avatar_url": "mxc://example.org/AQwafuaFswefuhsfAFAgsw", | ||
"avatar_content_token": "hunter2", | ||
"displayname": "Alice Margatroid", | ||
"membership": "join" | ||
} | ||
} | ||
``` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
also m.presence
"body": "filename.jpg", | ||
"msgtype": "m.image", | ||
"url": "mxc://example.com/AQwafuaFswefuhsfAFAgsw", | ||
"content_token": "hunter2" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Spotted by @uhoreg: a server doesn't, in general, know which event types reference media, and where in their structure the media url would be.
There might also be more than one media item. So instead, we have to define content_token
to be a map at the top level of the event, with keys being the hash of an mxc url (to uniquely identify the media, without leaking the mxc uri outside encrypted events).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this doesn't neccesarily solve the concern about media embedded into the content of an event, like custom emoji: #3910 (comment)
(there's no key on the json for such media references)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think that's a problem. It doesn't need a key in the json for the media references; we can still include a content_token
map with inline images. e.g.
{
"body": "some fallback",
"formatted_body": "<img src=\"mxc://foo/bar\">",
"content_token": {
"hash_of_mxc://foo/bar": "token"
}
}
* Rather than trust users not to pass the content_token around, we could tie the media more | ||
strongly to an event, and then check that the user has access to the event. For example: | ||
|
||
* Uploading does not itself make the content visible by *anyone*; it does return a nonce | ||
* Sending events takes a `media_nonce` param. The media gets associated with | ||
the event and becomes visible to anyone who can see the event. The nonce | ||
gets invalidated. | ||
* For profile pics, the media gets duplicated for each m.room.member event it | ||
ends up in. | ||
* When the media is served over federation, we check if anyone on the server | ||
has a right to see it (as we would with its associated event), and if so, | ||
include the associated event in the response. The requesting server then | ||
filters further. | ||
|
||
(credit: based on ideas at | ||
https://cryptpad.fr/code/#/2/code/view/oWjZciD9N1aWTr1IL6GRZ0k1i+dm7wJQ7juLf4tJRoo/) | ||
|
||
Problems with this approach: | ||
* Forwarded media would require either a re-upload, or a server-side duplication. | ||
(Note that matrix-media-repo has a [LocalCopy](https://github.com/turt2live/matrix-media-repo/blob/master/api/unstable/local_copy.go#L19) implementation for this.) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've now written up something much like this at MSC3911.
* Put the `content_token` within the encrypted content, for encrypted data. | ||
This reduces the metadata leak, but means that none of the automated media | ||
removal functionality works. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think putting the content_token
in the cleartext actually solves the problem for automated media removal. And I don't think it can, fundamentally, because we can't force clients to cooperate in the metadata leak.
A non-cooperating client can always:
- Implement support for
content_token
within the encrypted payload. - Upload the media.
- Reference it from one or more junk catch-all rooms in the cleartext, to ensure the server considers it referenced.
- Put
content_token
in the encrypted payload for all real references.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Implement support for content_token within the encrypted payload.
Well, that won't help the non-cooperating client if it wants to talk to cooperating clients.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
True... Which seems like a mess? Game theoretically speaking, the cooperating clients would lose out on some messages compared to non-cooperating clients worried about metadata leaks, so the equilibrium would tend towards all clients supporting both methods.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Guess so.
Reference it from one or more junk catch-all rooms in the cleartext, to ensure the server considers it referenced.
this won't help unless both rooms have similar servers in, though?
... Anyway, I think MSC3911 was a better idea than this msc. It's probably not worth spending too much time on.
Rendered