Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add spec for MSC2746 #1511

Merged
merged 38 commits into from
May 23, 2023
Merged
Show file tree
Hide file tree
Changes from 21 commits
Commits
Show all changes
38 commits
Select commit Hold shift + click to select a range
4b1b6dd
Change version field to a string
dbkr May 2, 2023
ef15055
Add spec requiring tracks to be within streams.
dbkr May 2, 2023
0a8362f
Put streams spec in its own section
dbkr May 2, 2023
e49a85c
Add 'invitee' field
dbkr May 2, 2023
2abbc64
Add party_id
dbkr May 3, 2023
774968b
Remember how JSON works
dbkr May 3, 2023
93dd4e5
Add m.call.select_answer
dbkr May 3, 2023
b75850f
Update examples
dbkr May 3, 2023
d8dd3e0
Add select_answer to call flow example diagram
dbkr May 3, 2023
ebed260
Add m.call.reject
dbkr May 3, 2023
efdb1ec
Make party_id required in other events
dbkr May 3, 2023
f4b6c62
Add possible ways for client to handle an invite
dbkr May 3, 2023
d9bd32d
Convert hangup & reject events to YAML
dbkr May 4, 2023
78719b4
Add new reason codes to hangup & reject
dbkr May 4, 2023
ecb3070
Add m.call.negotiate
dbkr May 4, 2023
2801c6f
Add other sections
dbkr May 4, 2023
6a058d0
Revert changes to package lock
dbkr May 4, 2023
e741e3a
Typos
dbkr May 4, 2023
a7f5b8f
Fix type of other version fields, fix anchor.
dbkr May 4, 2023
c079875
Add newsfragment
dbkr May 4, 2023
af22989
Fix reason in hangup/reject
dbkr May 4, 2023
1fda1ec
Change tense
dbkr May 17, 2023
a6c2cf2
Tense, typos & grammar
dbkr May 17, 2023
ea3cbec
Linkify
dbkr May 17, 2023
70d7c36
Remove unnecessary parts from link
dbkr May 17, 2023
5b8e3ef
Capitalise
dbkr May 17, 2023
1023399
Fix hangup reasons
dbkr May 17, 2023
ea056a7
Clarify who can answer
dbkr May 17, 2023
1855205
Linkify
dbkr May 17, 2023
25ea0df
Remove reference to 'this MSC'.
dbkr May 17, 2023
5d53d67
Move common VoIP fields into a call event type.
dbkr May 17, 2023
ffa0a6f
Move common voip events to the content, not the actual event
dbkr May 17, 2023
a5c9911
Remove reason from reject event
dbkr May 17, 2023
d1b4c72
Failure to YAML
dbkr May 17, 2023
163605c
Fix number of room members allowed when sending voip events.
dbkr May 23, 2023
c6ae1a4
Add 'added in' version
dbkr May 23, 2023
d230788
Another added-in
dbkr May 23, 2023
492d052
Add missing comma
turt2live May 23, 2023
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions changelogs/client_server/newsfragments/1511.feature
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
Update VoIP spec for [MSC2746](https://github.com/matrix-org/matrix-spec-proposals/pull/2746).
149 changes: 149 additions & 0 deletions content/client-server-api/modules/voip_events.md
Original file line number Diff line number Diff line change
Expand Up @@ -9,6 +9,117 @@ communication is supported (e.g. between two peers, or between a peer
and a multi-point conferencing unit). This means that clients MUST only
send call events to rooms with exactly two participants.
dbkr marked this conversation as resolved.
Show resolved Hide resolved

All VoIP events have a `version` field. This will be used to determine whether
dbkr marked this conversation as resolved.
Show resolved Hide resolved
devices support this new version of the protocol. For example, clients can use
this field to know whether to expect an `m.call.select_answer` event from their
opponent. If clients see events with `version` other than `0` or `"1"`
(including, for example, the numeric value `1`), they should treat these the
same as if they had `version` == `"1"`.

Note that this implies any and all future versions of VoIP events should be
backwards-compatible. If it does become necessary to introduce a non
backwards-compatible VoIP spec, the intention would be for it to simply use a
separate set of event types.

#### Party Identifiers
Whenever a client first participates in a new call, it generates a `party_id` for itself to use for the
duration of the call. This needs to be long enough that the chance of a collision between multiple devices
both generating an answer at the same time generating the same party ID is vanishingly small: 8 uppercase +
lowercase alphanumeric characters is recommended. Parties in the call are identified by the tuple of
`(user_id, party_id)`.

The client adds a `party_id` field containing this ID to the top-level of the content of all VoIP events
it sends on the call, including `m.call.invite`. Clients use this to identify remote echo of their own
events: since a user may now call themselves, they can no longer ignore events from their own user. This
dbkr marked this conversation as resolved.
Show resolved Hide resolved
field also identifies different answers sent by different clients to an invite, and matches `m.call.candidates`
events to their respective answer/invite.

A client implementation may choose to use the device ID used in end-to-end cryptography for this purpose,
or it may choose, for example, to use a different one for each call to avoid leaking information on which
devices were used in a call (in an unencrypted room) or if a single device (ie. access token) were used to
send signalling for more than one call party.

A grammar for `party_id` is defined [below](#grammar-for-voip-ids).

#### Politeness
In line with WebRTC perfect negotiation (https://w3c.github.io/webrtc-pc/#perfect-negotiation-example)
dbkr marked this conversation as resolved.
Show resolved Hide resolved
there are rules to establish which party is polite in the process of renegotiation. The callee is
always the polite party. In a glare situation, the politenes of a party is therefore determined by
whether the inbound or outbound call is used: if a client discards its outbound call in favour of
an inbound call, it becomes the polite party.

#### Call Event Liveness
`m.call.invite` contains a `lifetime` field that indicates how long the offer is valid for. When
a client receives an invite, it should use the event's `age` field in the sync response plus the
time since it received the event from the homeserver to determine whether the invite is still valid.
The use of the `age` field ensures that incorrect clocks on client devices don't break calls.

If the invite is still valid *and will remain valid for long enough for the user to accept the call*,
it should signal an incoming call. The amount of time allowed for the user to accept the call may
vary between clients. For example, it may be longer on a locked mobile device than on an unlocked
desktop device.

The client should only signal an incoming call in a given room once it has completed processing the
entire sync response and, for encrypted rooms, attempted to decrypt all encrypted events in the
sync response for that room. This ensures that if the sync response contains subsequent events that
indicate the call has been hung up, rejected, or answered elsewhere, the client does not signal it.

If on startup, after processing locally stored events, the client determines that there is an invite
that is still valid, it should still signal it but only after it has completed a sync from the homeserver.

The minimal recommended lifetime is 90 seconds - this should give the user enough time to actually pick
up the call.

#### ICE Candidate Batching
Clients should aim to send a small number of candidate events, with guidelines:
* ICE candidates which can be discovered immediately or almost immediately in the invite/answer
event itself (eg. host candidates). If server reflexive or relay candidates can be gathered in
a sufficiently short period of time, these should be sent here too. A delay of around 200ms is
suggested as a starting point.
* The client should then allow some time for further candidates to be gathered in order to batch them,
rather than sending each candidate as it arrives. A starting point of 2 seconds after sending the
invite or 500ms after sending the answer is suggested as a starting point (since a delay is natural
anyway after the invite whilst the client waits for the user to accept it).

#### End-of-candidates
An ICE candidate whose value is the empty string means that no more ICE candidates will
be sent. Clients must send such a candidate in an `m.call.candidates` message.
The WebRTC spec requires browsers to generate such a candidate, however note that at time of writing,
not all browsers do (Chrome does not, but does generate an `icegatheringstatechange` event). The
client should send any remaining candidates once candidate generation finishes, ignoring timeouts above.
This allows bridges to batch the candidates together when bridging to protocols that don't support
trickle ICE.

#### DTMF
Matrix clients can send DTMF as specified by WebRTC. The WebRTC standard as of August
2020 does not support receiving DTMF but a Matrix client can receive and interpret the DTMF sent
in the RTP payload.

#### Grammar for VoIP IDs
`call_id`s and `party_id` are explicitly defined to be between 1 and 255 characters long, consisting
of the characters `[0-9a-zA-Z._~-]`.

(Note that this matches the grammar of 'opaque IDs' from
[MSC1597](https://github.com/matrix-org/matrix-spec-proposals/blob/rav/proposals/id_grammar/proposals/1597-id-grammar.md#opaque-ids),
and that of the `id` property of the
[`m.login.sso` flow schema](https://spec.matrix.org/v1.5/client-server-api/#definition-mloginsso-flow-schema).)
dbkr marked this conversation as resolved.
Show resolved Hide resolved

#### Behaviour on Room Leave
If the client sees the user it is in a call with leave the room, the client should treat this
as a hangup event for any calls that are in progress. No specific requirement is given for the
situation where a client has sent an invite and the invitee leaves the room, but the client may
wish to treat it as a rejection if there are no more users in the room who could answer the call
(eg. the user is now alone or the `invitee` field was set on the invite).

The same behaviour applies when a client is looking at historic calls.

#### Supported Codecs
The Matrix spec does not mandate particular audio or video codecs, but instead defers to the
WebRTC spec. A compliant matrix VoIP client will behave in the same way as a supported 'browser'
dbkr marked this conversation as resolved.
Show resolved Hide resolved
in terms of what codecs it supports and what variants thereof. The latest WebRTC specification
applies, so clients should keep up to date with new versions of the WebRTC specification whether
or not there have been any changes to the Matrix spec.

#### Events

{{% event-group group_name="m.call" %}}
Expand All @@ -25,6 +136,7 @@ A call is set up with message events exchanged as follows:
[..candidates..] -------->
[Answers call]
<--------------- m.call.answer
m.call.select_answer ----------->
[Call is active and ongoing]
<--------------- m.call.hangup
```
Expand All @@ -42,6 +154,43 @@ Or a rejected call:

Calls are negotiated according to the WebRTC specification.

In response to an invoming invite, a client may do one of several things:
dbkr marked this conversation as resolved.
Show resolved Hide resolved
* Attempt to accept the call by sending an `m.call.answer`.
* Actively reject the call everywhere: send an `m.call.reject` as per above, which will stop the call from
ringing on all the user's devices and the caller's client will inform them that the user has
rejected their call.
* Ignore the call: send no events, but stop alerting the user about the call. The user's other
devices will continue to ring, and the caller's device will continue to indicate that the call
is ringing, and will time the call out in the normal way if no other device responds.

##### Streams

Clients are expected to send one stream with one track of kind `audio` (creating a
voice call). They can optionally send a second track in the same stream of kind
`video` (creating a video call).

Clients implementing this specification use the first stream and will ignore
any streamless tracks. Note that in the Javascript WebRTC API, this means
dbkr marked this conversation as resolved.
Show resolved Hide resolved
`addTrack()` must be passed two parameters: a track and a stream, not just a
track, and in a video call the stream must be the same for both audio and video
track.

A client may send other streams and tracks but the behaviour of the other party
with respect to presenting such streams and tracks is undefined.

##### Invitees
The `invitee` field should be added whenever the call is intended for one
specific user , and should be set to the Matrix user ID of that user. Invites
dbkr marked this conversation as resolved.
Show resolved Hide resolved
without an `invitee` field are defined to be intended for any member of the
room other than the sender of the event.

Clients should consider an incoming call if they see a non-expired invite event where the `invitee` field is either
absent or equal to their user's Matrix ID, however they should evaluate whether or not to ring based on their
user's trust relationship with the callers and/or where the call was placed. As a starting point, it is
suggested that clients ignore call invites from users in public rooms. It is strongly recommended that
when clients do not ring for an incoming call invite, they still display the call invite in the room and
annotate that it was ignored.

##### Glare

"Glare" is a problem which occurs when two users call each other at
Expand Down
3 changes: 2 additions & 1 deletion data/event-schemas/examples/m.call.answer.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,8 @@
"$ref": "core/room_event.json",
"type": "m.call.answer",
"content": {
"version" : 0,
"version" : "1",
"party_id": "67890",
"call_id": "12345",
"answer": {
"type" : "answer",
Expand Down
3 changes: 2 additions & 1 deletion data/event-schemas/examples/m.call.candidates.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,8 @@
"$ref": "core/room_event.json",
"type": "m.call.candidates",
"content": {
"version" : 0,
"version" : "1",
"party_id": "67890",
"call_id": "12345",
"candidates": [
{
Expand Down
6 changes: 4 additions & 2 deletions data/event-schemas/examples/m.call.hangup.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,9 @@
"$ref": "core/room_event.json",
"type": "m.call.hangup",
"content": {
"version" : 0,
"call_id": "12345"
"version" : "1",
"party_id": "67890",
"call_id": "12345",
"reason": "user_hangup"
}
}
3 changes: 2 additions & 1 deletion data/event-schemas/examples/m.call.invite.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,8 @@
"$ref": "core/room_event.json",
"type": "m.call.invite",
"content": {
"version" : 0,
"version" : "1",
"party_id": "67890",
"call_id": "12345",
"lifetime": 60000,
"offer": {
Expand Down
14 changes: 14 additions & 0 deletions data/event-schemas/examples/m.call.negotiate.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,14 @@
{
"$ref": "core/room_event.json",
"type": "m.call.negotiate",
"content": {
"version" : "1",
"party_id": "67890",
"call_id": "12345",
"lifetime": 10000,
"offer": {
"type" : "offer",
"sdp" : "v=0\r\no=- 6584580628695956864 2 IN IP4 127.0.0.1[...]"
}
}
}
9 changes: 9 additions & 0 deletions data/event-schemas/examples/m.call.reject.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,9 @@
{
"$ref": "core/room_event.json",
"type": "m.call.reject",
"content": {
"version" : "1",
"party_id": "67890",
"call_id": "12345"
}
}
10 changes: 10 additions & 0 deletions data/event-schemas/examples/m.call.select_answer.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,10 @@
{
"$ref": "core/room_event.json",
"type": "m.call.select_answer",
"content": {
"version" : "1",
"call_id": "12345",
"party_id": "67890",
"selected_party_id": "111213"
}
}
10 changes: 7 additions & 3 deletions data/event-schemas/schema/m.call.answer.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -30,11 +30,15 @@
"required": ["type", "sdp"]
},
"version": {
"type": "number",
"description": "The version of the VoIP specification this messages adheres to. This specification is version 0."
"type": "string",
"description": "The version of the VoIP specification this message adheres to. This specification is version 1. This field is a string such that experimental implementations can use non-integer versions. This field was an integer in the previous spec version and implementations must accept an integer 0"
},
"party_id": {
"type": "string",
"description": "This identifies the party that sent this event. A client may choose to re-use the device ID from end-to-end cryptography for the value of this field. "
}
},
"required": ["call_id", "answer", "version"]
"required": ["call_id", "answer", "version", "party_id"]
},
"type": {
"type": "string",
Expand Down
10 changes: 7 additions & 3 deletions data/event-schemas/schema/m.call.candidates.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -35,12 +35,16 @@
"required": ["candidate", "sdpMLineIndex", "sdpMid"]
}
},
"party_id": {
"type": "string",
"description": "This identifies the party that sent this event. A client may choose to re-use the device ID from end-to-end cryptography for the value of this field. "
},
"version": {
"type": "integer",
"description": "The version of the VoIP specification this messages adheres to. This specification is version 0."
"type": "string",
"description": "The version of the VoIP specification this message adheres to. This specification is version 1. This field is a string such that experimental implementations can use non-integer versions. This field was an integer in the previous spec version and implementations must accept an integer 0."
}
},
"required": ["call_id", "candidates", "version"]
"required": ["call_id", "candidates", "version", "party_id"]
},
"type": {
"type": "string",
Expand Down
101 changes: 66 additions & 35 deletions data/event-schemas/schema/m.call.hangup.yaml
Original file line number Diff line number Diff line change
@@ -1,35 +1,66 @@
{
"type": "object",
"description": "Sent by either party to signal their termination of the call. This can be sent either once the call has has been established or before to abort the call.",
"allOf": [{
"$ref": "core-event-schema/room_event.yaml"
}],
"properties": {
"content": {
"type": "object",
"properties": {
"call_id": {
"type": "string",
"description": "The ID of the call this event relates to."
},
"version": {
"type": "integer",
"description": "The version of the VoIP specification this message adheres to. This specification is version 0."
},
"reason": {
"type": "string",
"description": "Optional error reason for the hangup. This should not be provided when the user naturally ends or rejects the call. When there was an error in the call negotiation, this should be `ice_failed` for when ICE negotiation fails or `invite_timeout` for when the other party did not answer in time.",
"enum": [
"ice_failed",
"invite_timeout"
]
}
},
"required": ["call_id", "version"]
},
"type": {
"type": "string",
"enum": ["m.call.hangup"]
}
}
}
---
type: object
description: |
Sent by either party to signal their termination of the call. This can
be sent either once the call has has been established or before to abort the call.

The meanings of the `reason` field are as follows:
* `ice_timeout`: The connection failed after some media was exchanged (as opposed to current
* `ice_failed` which means no media connection could be established). Note that, in the case of
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's kind of weird to have the bullet point start in the middle of the parenthetical. Also, I'm failing to parse what the "current" at the end of the previous line is supposed to mean.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mm, yes, fairly sure this didn't come out as intended.

an ICE renegotiation, a client should be sure to send `ice_timeout` rather than `ice_failed` if
media had previously been received successfully, even if the ICE renegotiation itself failed.
* `invite_timeout`: The other party did not answer in time.
* `user_hangup`: Clients must now send this code when the user chooses to end the call, although
for backwards compatibility with version 0, a clients should treat an absence of the `reason`
field as `user_hangup`.
* `user_media_failed`: The client was unable to start capturing media in such a way that it is unable
to continue the call.
* `user_busy`: The user is busy. Note that this exists primarily for bridging to other networks such
as the PSTN. A Matrix client that receives a call whilst already in a call would not generally reject
the new call unless the user had specifically chosen to do so.
* `unknown_error`: Some other failure occurred that meant the client was unable to continue the call
rather than the user choosing to end it.
allOf:
- "$ref": core-event-schema/room_event.yaml
properties:
content:
type: object
properties:
call_id:
type: string
description: The ID of the call this event relates to.
version:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if it would be worthwhile to factor out the call_id, version, and party_id fields into a file that could be included in all the voip events. Maybe a core-event-schema/voip_event.yaml file.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, this is neater. The only downside is that it repeats all the fields from a room event and I'm not sure how to make it not do that.

type: string
description: The version of the VoIP specification this message adheres to.
This specification is version 1. This field is a string such that experimental
implementations can use non-integer versions. This field was an integer
in the previous spec version and implementations must accept an integer
0.
party_id:
type: string
description: This identifies the party that sent this event. A client may
choose to re-use the device ID from end-to-end cryptography for the value
of this field.
reason:
type: string
description: Reason for the hangup. Note that this was optional in
previous previous versions of the spec, so a missing value should be
treated as `user_hangup`.
dbkr marked this conversation as resolved.
Show resolved Hide resolved
enum:
- ice_timeout
- ice_failed
- invite_timeout
- user_hangup
- user_media_failed
- user_busy
- unknown_error
required:
- call_id
- version
- party_id
- reason
type:
type: string
enum:
- m.call.hangup

Loading