Skip to content

Commit 0e6997d

Browse files
committed
MSC3970: Scope transaction IDs to devices
1 parent 13935ec commit 0e6997d

File tree

1 file changed

+198
-0
lines changed

1 file changed

+198
-0
lines changed

proposals/3970-device-scope-txnid.md

+198
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,198 @@
1+
# MSC3970: Scope transaction IDs to devices
2+
3+
Transaction identifiers in the Client-Server API are currently scoped to the concept of a "client session" which, when
4+
refresh tokens are used, can span a sequence of access tokens.
5+
6+
The spec [reads](https://spec.matrix.org/v1.6/client-server-api/#transaction-identifiers):
7+
8+
> The scope of a transaction ID is a “client session”, where that session is identified by a particular access token.
9+
> When refreshing an access token, the transaction ID’s scope is retained. This means that if a client with token `A`
10+
> uses `TXN1` as their transaction ID, refreshes the token to `B`, and uses `TXN1` again it’ll be assumed to be a
11+
> duplicate request and ignored. If the client logs out and back in between the `A` and `B` tokens, `TXN1` could be used
12+
> once for each.
13+
14+
The transaction IDs appear in two parts of the Client-Server API spec:
15+
16+
1. As a identifier to allow the homeserver to make some `PUT` endpoints [idempotent](https://spec.matrix.org/v1.6/client-server-api/#transaction-identifiers)
17+
2. An unsigned field in the event data model for a client to tell if it sent an event or not. a.k.a. solving the ["local echo"](https://spec.matrix.org/v1.6/client-server-api/#local-echo) problem
18+
19+
For reference, the `PUT` endpoitns that have the a `{txnId}` param are:
20+
21+
- [`PUT /_matrix/client/v3/rooms/{roomId}/send/{eventType}/{txnId}`](https://spec.matrix.org/v1.6/client-server-api/#put_matrixclientv3roomsroomidsendeventtypetxnid)
22+
- [`PUT /_matrix/client/v3/rooms/{roomId}/redact/{eventId}/{txnId}`](https://spec.matrix.org/v1.6/client-server-api/#put_matrixclientv3roomsroomidredacteventidtxnid)
23+
- [`PUT /_matrix/client/v3/sendToDevice/{eventType}/{txnId}`](https://spec.matrix.org/v1.6/client-server-api/#put_matrixclientv3sendtodeviceeventtypetxnid)
24+
25+
## Proposal
26+
27+
It is proposed that the scope of transaction identifiers be changed from a "client session" to a "device".
28+
29+
A "device" is typically represented by a `device_id` elsewhere in the spec.
30+
31+
For idempotency, this means the homeserver changing the method of identifying a request from:
32+
33+
- (`client session`, `HTTP path of request which includes the transaction ID`)
34+
35+
to:
36+
37+
- (`device_id`, `HTTP path of request which includes the transaction ID`)
38+
39+
For local echo, the homeserver would now include the `transaction_id` in the event data when it is serving a sync
40+
request from the same `device_id` as determined from the access token.
41+
42+
## Potential issues
43+
44+
### This is technically a breaking change to the spec
45+
46+
The main "issue" I see with this proposal is that this is technically a breaking change to the spec.
47+
48+
Because a device ID could have multiple sequences of access tokens associated with it, this proposal would widen the
49+
scope of the transaction ID.
50+
51+
Therefore it could potentially lead to a request being treated as "new" where before it would have been identified as a
52+
retransmission and deduplicated.
53+
54+
However, the evidence suggests that nothing would be impacted in reality.
55+
56+
Some evidences have been collated to support this claim:
57+
58+
#### 1. Data from the matrix.org Homeserver suggests the change would have no impact
59+
60+
The matrix.org Homeserver is a reasonable size deployment and could be considered reasonably representative of the
61+
diversity of Matrix client.
62+
63+
The Synapse homeserver that runs matrix.org maintains a `event_txn_id` table that contains a rolling 24 hour window of
64+
(`user_id`, `token_id`, `room_id`, `txn_id`) tuples.
65+
66+
Having analysed the contents of the table, it appears that there are no repeated transaction IDs for a given user, token
67+
and room.
68+
69+
There are other `PUT` endpoints for which transaction IDs are not in the `event_txn_id` table, however because the event
70+
71+
As such, the widening of the scope from token to device would not have caused any issues during the periods sampled.
72+
73+
For reference the following is the schema of the `event_txn_id` table:
74+
75+
```sql
76+
Table "matrix.event_txn_id"
77+
Column | Type | Collation | Nullable | Default
78+
-------------+--------+-----------+----------+---------
79+
event_id | text | | not null |
80+
room_id | text | | not null |
81+
user_id | text | | not null |
82+
token_id | bigint | | not null |
83+
txn_id | text | | not null |
84+
inserted_ts | bigint | | not null |
85+
Indexes:
86+
"event_txn_id_token_id" btree (token_id)
87+
"event_txn_id_event_id" UNIQUE, btree (event_id)
88+
"event_txn_id_ts" btree (inserted_ts)
89+
"event_txn_id_txn_id" UNIQUE, btree (room_id, user_id, token_id, txn_id)
90+
Foreign-key constraints:
91+
"event_txn_id_event_id_fkey" FOREIGN KEY (event_id) REFERENCES matrix.events(event_id) ON DELETE CASCADE
92+
"event_txn_id_token_id_fkey" FOREIGN KEY (token_id) REFERENCES matrix.access_tokens(id) ON DELETE CASCADE
93+
```
94+
95+
And the query to look for repeated transaction IDs:
96+
97+
```sql
98+
SELECT e1.txn_id, LEFT(e1.user_id, 5) AS user_id, e1.token_id, e2.token_id, e1.inserted_ts, e2.inserted_ts FROM matrix.event_txn_id e1, matrix.event_txn_id e2 WHERE e1.txn_id = e2.txn_id AND e1.event_id <> e2.event_id AND e1.event_id < e2.event_id AND e1.user_id = e2.user_id AND e1.room_id = e2.room_id ORDER BY e1.token_id;
99+
txn_id | user_id | token_id | token_id | inserted_ts | inserted_ts
100+
--------+---------+----------+----------+-------------+-------------
101+
(0 rows)
102+
```
103+
104+
#### 2. Conduit homeserver already scopes transaction IDs to devices
105+
106+
As highlighted by the new Complement [tests](https://github.com/matrix-org/complement/pull/613) the Conduit homeserver
107+
is already scoping transaction IDs to devices.
108+
109+
I can't find a related issue [listed](https://gitlab.com/famedly/conduit/-/issues), so presumably this non-compliant
110+
behaviour isn't causing a known issue for admins and users of the Conduit homeserver?
111+
112+
#### 3. Synapse homeserver only checks for retransmits over a 30-60 minute window
113+
114+
The Synapse homeserver uses an in-memory cache to check for idempotency of requests. The cache is vacated after
115+
[30-60 minutes](https://github.com/matrix-org/synapse/blob/adac949a417d064958039ae0918b97388413c824/synapse/rest/client/transactions.py#L50-L52)
116+
and is not persisted between restarts (or across a cluster).
117+
118+
This means that for many existing deployments idempotency is only actually enforced over a 30-60 minutes and not
119+
eternally as the spec might suggest.
120+
121+
#### 4. Synapse homeserver only supports local echo for the previous 24 hours
122+
123+
The Synapse homeserver only supports local echo (by the presence of `transaction_id` on sync responses) for the previous
124+
24 hours. This is because the `event_txn_id` table is only kept for 24 hours.
125+
126+
Again, this suggests that in reality the local echo semantics are not preserved eternally as the spec might suggest.
127+
128+
### Is the "device" concept the right level of abstraction to use?
129+
130+
One way to look at it is that device is already widely used in the end-to-end encryption parts of the spec and so why
131+
isn't it suitable for this use case too?
132+
133+
### What about two clients masquerading as a single device ID?
134+
135+
I don't know if this actually works in practice. If this was a concern then it could be mitigated by clarifying in the
136+
spec that if a client wishes to submit requests using the same `device_id` as another client session that it should
137+
choose transaction identifiers that are unique to that client session.
138+
139+
## Alternatives
140+
141+
### Do nothing
142+
143+
We could leave the transaction ID scope as is.
144+
145+
However, it makes it difficult to implement a features like [MSC3861: Matrix architecture change to delegate authentication via OIDC](https://github.com/matrix-org/matrix-spec-proposals/pull/3861)
146+
as the concept of a "client session" doesn't really exist in OIDC.
147+
148+
As noted above, at least one homeserver implementation is also not implementing the spec as it is today.
149+
150+
It also turns out that the current implementation of refresh tokens in Synapse breaks the transaction ID semantics
151+
already and needs to be [fixed](https://github.com/matrix-org/synapse/issues/15141).
152+
153+
### Make a backwards compatible change
154+
155+
A backwards compatible alternative could be something like:
156+
157+
1. For idempotency have clients opt-in to a new scope of transaction ID, but support the current semantics too for compatibility
158+
2. Have clients opt-in (e.g. request param on the sync endpoint) to receiving transaction ID for all events in the sync response and make the client responsible for identifying which messages they sent
159+
160+
The disadvantage of this is that we create a load of extra maintenance work to support both semantics for a period of
161+
time for (empirically) no gain in return.
162+
163+
## Security considerations
164+
165+
A malicious client can adopt an existing device ID of a user. This could possibly allow some kind of denial of service attack.
166+
167+
However, if such an attack where possible it would be possible to do so without this MSC as device IDs are crucial to
168+
the implementation of end-to-end encryption.
169+
170+
## Other recommendations
171+
172+
I'm not suggesting that these recommendations are address in this proposal, but more notes for future proposals or spec clarifications.
173+
174+
### Clarification on idempotency semantics
175+
176+
I have separately prepared a [spec PR](https://github.com/matrix-org/matrix-spec/pull/1449) to clarify some of the
177+
idempotency semantics that doesn't modify the spec but is useful to understand the context of this proposal.
178+
179+
### Clarification on transaction ID time scope
180+
181+
I also suggest that the spec be clarified over what time periods the transaction ID is scoped for such that clients can
182+
be aware. This cloud simply be to say that the time period is not defined and so may vary by implementation.
183+
184+
### Suggest a lease naive transaction ID format
185+
186+
I think we should sense to update the recommendation on the format of transaction IDs from:
187+
188+
> how is not specified; a monotonically increasing integer is recommended
189+
190+
To something more unique (in my research I have found some clients use a seconds since epoch which doesn't seem ideal)
191+
192+
## Unstable prefix
193+
194+
None needed.
195+
196+
## Dependencies
197+
198+
None.

0 commit comments

Comments
 (0)