Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[WIP] MSC3898: Native Matrix VoIP signalling for cascaded foci (SFUs, MCUs...) #3898

Draft
wants to merge 30 commits into
base: main
Choose a base branch
from
Draft
Changes from 1 commit
Commits
Show all changes
30 commits
Select commit Hold shift + click to select a range
750087f
Native Matrix VoIP signalling for cascaded SFUs
SimonBrandner Sep 25, 2022
aa53398
Update MSC number
SimonBrandner Sep 25, 2022
de302cb
Link to diagrams from MSC3401
SimonBrandner Oct 2, 2022
7474782
Use correct number for file
SimonBrandner Oct 2, 2022
5cad46d
Update sub and unsub ops
SimonBrandner Nov 11, 2022
2cbc2d6
Merge remote-tracking branch 'upstream/main' into SimonBrandner/msc/sfu
SimonBrandner Nov 11, 2022
f542fcb
Give a reason for specifying res in metadata
SimonBrandner Nov 11, 2022
6f01a94
Specify foci by `device_id` too
SimonBrandner Nov 12, 2022
575e16c
Fixup some json
SimonBrandner Nov 12, 2022
33b1880
Typo
SimonBrandner Nov 12, 2022
65faee4
Specify how to handle foci better
SimonBrandner Nov 13, 2022
9882c97
Amend TODOs
SimonBrandner Nov 13, 2022
c66bbe4
Add rationale behind usage of data channels
daniel-abramov Nov 15, 2022
1b2d740
Add TODO
SimonBrandner Dec 2, 2022
feb064b
Update event types
SimonBrandner Dec 2, 2022
d96d101
Add unstable prefixes
SimonBrandner Dec 2, 2022
d538e1e
Use `subscribe` instead of `select`
SimonBrandner Dec 6, 2022
91470a2
`op` -> `event`
SimonBrandner Dec 6, 2022
2ef7425
Fixup formatting
SimonBrandner Dec 6, 2022
5a186e4
Use `content`
SimonBrandner Dec 6, 2022
b461525
Namespace things
SimonBrandner Dec 6, 2022
e49e80d
Further namespacing
SimonBrandner Dec 6, 2022
6b3fd47
Update the events to match current Matrix
SimonBrandner Dec 6, 2022
bf52e02
Fix typo
SimonBrandner Dec 7, 2022
f81dd9d
Use `subscribe`/`unsbuscribe`
SimonBrandner Dec 7, 2022
9c32b96
Add informational section on active/preferred foci.
dbkr Dec 8, 2022
6f8c9d1
Change keepalives to ping/pong
dbkr Dec 8, 2022
ecf2425
Add empty line
SimonBrandner Dec 8, 2022
bf04b17
Fix event name
SimonBrandner Dec 9, 2022
1896fc7
Remove encryption section as it's glossing over details
SimonBrandner Dec 12, 2022
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Prev Previous commit
Next Next commit
Specify how to handle foci better
Signed-off-by: Šimon Brandner <simon.bra.ag@gmail.com>
  • Loading branch information
SimonBrandner committed Nov 13, 2022
commit 65faee445fa2108b79416433333aa1ad178a20e7
167 changes: 98 additions & 69 deletions proposals/3898-sfu.md
Original file line number Diff line number Diff line change
Expand Up @@ -26,39 +26,10 @@ right to subscribe to track?**
The diagrams of how this all looks can be found in
[MSC3401](https://github.com/matrix-org/matrix-spec-proposals/pull/3401).

### State events
### Additions to the `m.call.member` state event

#### `m.call` state event

This MSC proposes adding an _optional_ `m.foci` field to the `m.call` state
event. It as list of recommended SFUs that the call initiator can recommend to
users who do not want to use their own SFU (because they don't have one, or
because they would be the only person on their SFU for their call, and so choose
to connect direct to save bandwidth).

For instance:

```json
{
"type": "m.call",
"state_key": "cvsiu2893",
"content": {
"m.intent": "m.room",
"m.type": "m.voice",
"m.name": "Voice room",
"m.foci": [
{ "user_id": "@sfu-lon:matrix.org", "device_id": "FS5F589EF" },
{ "user_id": "@sfu-nyc:matrix.org", "device_id": "VT4GA35VS" },
],
}
}
```

#### `m.call.member` state event

This MSC proposes adding an _optional_ `m.foci` field to the `m.call.member`
state event. It is used, if the user wants to be contacted via an SFU rather
than called directly (either 1:1 or full mesh).
This MSC proposes adding two _optional_ fields to the `m.call.member` state event:
`m.foci.preferred` and `m.foci.active`.

For instance:

Expand All @@ -70,50 +41,100 @@ For instance:
"m.calls": [
{
"m.call_id": "cvsiu2893",

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note that the call_id does not seem to be necessary.

When the SFU sends To-Device messages to the clients, the conf_id is specified and given that the conf_id is a unique identifier of a conference/call, there seem to be no need to have a call_id in addition to that.

Recently I've ran into an issue where I realized that call_id and conf_id are not the same (despite MSC3401 giving me an impression that they are identical). The conf_id was the ID of a conference (as expected), but the call_id was another random string that was different for each single participant which forced us to use both call_id and conf_id when sending messages back to the clients (otherwise they would be rejected).

It looks like call_id should either be removed or (if we want to keep it for the backward compatibility with the older MSC?) it must be equal to the conf_id.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this comment belongs on MSC3401 as this line is specified in other MSC, although I thinik the conclusion is just that there's confusion between call_id and conf_id and we should rename this to conf_id (there's no other conf ID in this event so it is necessary, not just for backwards compat).

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I've also written a comment about it in MSC3401 😛

Basically, the problem is not only that they are called differently, but also that the value of call_id and conf_id is different, so they are different for some reason (and on the SFU we are obligated to take both into account: conf_id for a conference ID and the call_id to set a value in outgoing To-Device messages without which the client would discard the messages).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you agree that the correct resolution is to change this to m.conf_id?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, that would be great! Though I wonder what the consequence of that would be (i.e. what is that value that the current call_id has? - It's not a conference ID, it's something different, or maybe it's a leftover from a previous implementation for 1:1s where call_id meant something?)

Copy link

@daniel-abramov daniel-abramov Dec 7, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Really?

Yes 🙂 That's something that I discovered a week ago when deploying the first iteration of refactored SFU. I've just tried to join the SFU and the conf_id field is equal to 1668002318158qFQmBWgVHHXTZsPA, while the call_id is 1670443502134bOWVqa3btIfDQMjJ.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

conf_id and call_id from where though? There will also be call_id in the individual calls which will definitely be different. Otherwise we need to work out what's going on here.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

conf_id and call_id from where though?

From To-Device messages that the participants of the conference send to the SFU. We then reply with To-Device messages back (e.g. when we generate an answer), in which case we also set both conf_id or call_id.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you mean https://github.com/matrix-org/matrix-js-sdk/blob/develop/src/webrtc/call.ts#L2252? conf_id is the ID of the conference call (state key of the m.call event), call_id is the ID of the 1:1 call between the individual group call participants.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, seems like this. But the thing is that, from the SFUs standpoint, the call_id does not have any semantics, but currently we're obligated to store both conf_id and call_id (which have different values), where the call_id is only used in order to send To-Device messages to the clients, i.e. when I e.g. want to send messages from the SFU to the client, I have to set both the conf_id (the ID of a conference) and the call_id (the ID of the 1:1 call between individual group call participants).

So my point is that we probably want to get rid of mandating call_id for the SFU calls since they don't seem any semantic value for this use case. And only use the conf_id instead?

"m.foci": [
{ "user_id": "@sfu-lon:matrix.org", "device_id": "FS5F589EF" },
{ "user_id": "@sfu-nyc:matrix.org", "device_id": "VT4GA35VS" },
],
"m.devices": [...]
"m.devices": [{
"device_id": "U738KDF9WJ",
"m.foci.active": [
{ "user_id": "@sfu-lon:matrix.org", "device_id": "FS5F589EF" }
],
"m.foci.preferred": [
{ "user_id": "@sfu-bon:matrix.org", "device_id": "3FSF589EF" },
{ "user_id": "@sfu-mon:matrix.org", "device_id": "GFSDH93EF" },
]
}]
}
],
"m.expires_ts": 1654616071686
}
}
```

### Choosing an SFU

**TODO: How does a client discover SFUs** **TODO: Is SFU identified by just
`user_id` or `(user_id, device_id)`?**

* When initiating a group call, we need to decide which devices to actually talk
to.
* If the client has no SFU configured, we try to use the `m.foci` in the
`m.call` event.
* If there are multiple `m.foci`, we select the closest one based on
latency, e.g. by trying to connect to all of them simultaneously and
discarding all but the first call to answer.
* If there are no `m.foci` in the `m.call` event, then we look at which foci
in `m.call.member` that are already in use by existing participants, and
select the most common one. (If the focus is overloaded it can reject us
and we should then try the next most populous one, etc).
* If there are no `m.foci` in the `m.call.member`, then we connect full
mesh.
* If subsequently `m.foci` are introduced into the conference, then we
should transfer the call to them (effectively doing a 1:1->group call
upgrade).
* If the client does have an SFU configured, then we decide whether to use it.
* If other conf participants are already using it, then we use it.
* If there are other users from our homeserver in the conference, then we
use it (as presumably they should be using it too)
* If there are no other `m.foci` (either in the `m.call` or in the
participant state) then we use it.
* Otherwise, we save bandwidth on our SFU by not cascading and instead
behaving as if we had no SFU configured.
* We do not recommend that users utilise an SFU to hide behind for privacy, but
instead use a TURN server, only providing relay candidates, rather than
consuming SFU resources and unnecessarily mandating the presence of an SFU.
#### `m.foci.active`

This field is a list of foci the user's device is publishing to. Usually, this
list will have a length of 1, yet a client might publish to multiple foci if
they are on different networks, for instance, or to simultaneously fan-out in
different directions from the client if there is no nearby focus. If the client
is participating full-mesh, it should either omit this field from the state
event or leave the list empty.

#### `m.foci.preferred`

This field is a list of foci the client would prefer to switch to from the
current active focus, if any other client also starts using the given focus. If
the client is already using one of its preferred foci, it should either omit
this field from the state event or leave the list empty.
dbkr marked this conversation as resolved.
Show resolved Hide resolved

### Choosing a focus

#### Discovering foci

**TODO: How does a client discover foci? We could use well-known or a custom endpoint**

Foci are identified by a tuple of `user_id` and `device_id`.

#### Determining the best focus

There are many ways to determine the best focus; this MSC recommends the
following:

- Is the quickest to respond to `m.call.invite` with `m.call.answer`.
- Is the quickest to rapidly reject a spurious HTTPS request to a high-numbered
port on the SFU's IP address, if the SFU exposes its IP somewhere - similar to
the [apenwarr/blip](https://github.com/apenwarr/blip) trick, in order to
measure media-path latency rather than signalling path latency.
- Has the best latency of data-channel traffic flows.
- Has the best latency and bandwidth determined by sending a small splurge of
media down the pipe to probe.

#### Joining a call

The following diagram explains how a client chooses a focus when joining a call.

```mermaid
flowchart TD;
wantsToJoin[Wants to join a call];
hasPreferred(Has preferred focus?);
callPreferred[Calls preferred foci without media to grab a slot];
publishPreferred[Publishes `m.foci.preferred`];
checkMembers(Call has more than 2 members including the client itself?);
callFullMesh[Calls other member full-mesh];
callMembersFoci[Tries calling foci from `m.call.member` events];
orderFoci[Orders foci from best to worst];
findFocusPreferredByOtherMember(Goes through ordered foci to find one which is preferred by at least one other member);
callBestPreferred[Calls the focus];
callBestActive[Calls the best active focus in room];
publishActive[Publishes `m.foci.active`];

wantsToJoin-->hasPreferred;
hasPreferred--->|Yes|callPreferred;
hasPreferred--->|No|checkMembers;
callPreferred--->publishPreferred;
publishPreferred--->checkMembers;
checkMembers--->|Yes|callMembersFoci;
checkMembers--->|No|callFullMesh;
callMembersFoci--->orderFoci;
orderFoci--->findFocusPreferredByOtherMember;
findFocusPreferredByOtherMember--->|Found|callBestPreferred;
callBestPreferred--->publishActive;
findFocusPreferredByOtherMember--->|Not found|callBestActive;
callBestActive--->publishActive;
```

#### Mid-call changes

Once in a call, the client listens for changes to `m.call.member` state events
and if another member starts using one of the client's preferred foci, the client
switches to that focus.

### Initial offer/answer dance

Expand Down Expand Up @@ -269,6 +290,14 @@ participants leave, a new megolm session is created and shared with all
participants over Olm. The new session is only used once all participants have
received it.

### Notes

#### Hiding behind foci

We do not recommend that users utilise a focus to hide behind for privacy, but
instead use a TURN server, only providing relay candidates, rather than
consuming focus resources and unnecessarily mandating the presence of a focus.

## Potential issues

The SFUs participating in a conference end up in a full mesh. Rather than
Expand Down