Skip to content
This repository has been archived by the owner on Apr 26, 2024. It is now read-only.

Faster joins: filter out non local events when a room doesn't have its full state #14404

Merged
merged 9 commits into from
Nov 21, 2022

Conversation

MatMaul
Copy link
Contributor

@MatMaul MatMaul commented Nov 10, 2022

Part of #12997.

This is an alternate to #14057, where we filter out non local events of various endpoints instead of explicitly calculating a safe chain of events. The filtered out events will either be omitted or redacted depending of the endpoint.

We then rely on logic where the other server should try out other participating servers if it received a redacted event. This logic may not yet be available/correct, TBC.

Endpoints covered by filter_events_for_server:

  • get_missing_events
  • backfill
  • catching up federation

Normal streaming federation is already only sending out local events so this should be fine already.

state and state_ids should already bail out for all events of a partially joined room with #13823, and we think that this is fine.

Pull Request Checklist

  • Pull request is based on the develop branch
  • Pull request includes a changelog file. The entry should:
    • Be a short description of your change which makes sense to users. "Fixed a bug that prevented receiving messages from other servers." instead of "Moved X method from EventStore to EventWorkerStore.".
    • Use markdown where necessary, mostly for code blocks.
    • End with either a period (.) or an exclamation mark (!).
    • Start with a capital letter.
    • Feel free to credit yourself, by adding a sentence "Contributed by @github_username." or "Contributed by [Your Name]." to the end of the entry.
  • Pull request includes a sign off
  • Code style is correct
    (run the linters)

@MatMaul MatMaul force-pushed the partial-join-filter-non-local branch from 491c1f7 to c38abfd Compare November 10, 2022 11:14
Signed-off-by: Mathieu Velten <mathieuv@matrix.org>
@MatMaul MatMaul self-assigned this Nov 10, 2022
@MatMaul
Copy link
Contributor Author

MatMaul commented Nov 10, 2022

I am requesting a review mainly to check that it looks globally sane and that my assumptions in the top comment are fine.

I'll take care of having a look at the failing tests next week.

Copy link
Contributor

@squahtx squahtx left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right now won't assert_host_in_room reject most remote homeservers in partial state rooms?

To resolve #14059, if we send an event out to a remote homeserver, we must be prepared to answer some of these queries that come back. Remote homeservers may either be in the servers_in_room list, or have a membership in the room that we know about. The latter case looks fine, but the former will be rejected I believe. There'll be some extra changes we'll have to make to handle those homeservers.

synapse/visibility.py Outdated Show resolved Hide resolved
@squahtx squahtx requested a review from a team November 11, 2022 17:26
@squahtx
Copy link
Contributor

squahtx commented Nov 11, 2022

Putting this back in the queue in case anyone familiar with faster room joins has extra opinions.

@reivilibre
Copy link
Contributor

Right now won't assert_host_in_room reject most remote homeservers in partial state rooms?

Correct, indeed in the default case it won't allow any remotes to talk to you about partial-state rooms.

To resolve #14059
, if we send an event out to a remote homeserver, we must be prepared to answer some of these queries that come back. Remote homeservers may either be in the servers_in_room list, or have a membership in the room that we know about. The latter case looks fine, but the former will be rejected I believe. There'll be some extra
changes we'll have to make to handle those homeservers.

From memory, you can get away without answering most of them most of the time.

  • You need /get_missing_events (at least partially) — you would have to divulge the prev_events of the PDU you're sending at the very least. This is something we need to spec but is otherwise existing behaviour, so it should be a spec clarification rather than fresh spec.
  • Unsure about /backfill... I'm not sure it plays any part in what happens when a homeserver receives a PDU (worth checking though)
  • /state and/or /state_ids are technically things you need to answer, but the whole point of a partial-state room is that we don't have the state, so we couldn't answer if we wanted to. However, making the simplifying assumption that the remotes we're sending our PDUs to are always up to date, then we can get away without answering these queries!
    • In discussion with Rich, we are happy to accept that, whilst you have a partial-state room, the idea of sending PDUs to remote homeservers who have not caught up is something we can accept is broken. In real cases, the partial-stateness of a room is likely to be shortlived and it's unlikely that a given remote destination falls out of date in that timespan.

@MatMaul
Copy link
Contributor Author

MatMaul commented Nov 14, 2022

Right now won't assert_host_in_room reject most remote homeservers in partial state rooms?

Indeed thanks, I've added allow_partial_state_rooms=True on get_missing_events and backfill since we would do the filtering in filter_events_for_server now.

I also fixed the race I believe, thanks for catching it.

@reivilibre
Copy link
Contributor

reivilibre commented Nov 14, 2022

As a note: to prevent divulging our events from before the most recent partial join, you could do the following:

  • look at partial_state_rooms and get the event ID of the partial join.
  • get the depth of the partial join.
  • filter out events whose depth is less than the depth of the partial join.

That said, I don't think depth is bulletproof, so maybe this is just security theatre 🤔.

This is fine: events from a prior full-state portion of the room will be filtered so only the servers in the room at that time can see them, as far as I can tell.

@MatMaul MatMaul marked this pull request as ready for review November 21, 2022 14:42
@MatMaul MatMaul merged commit 1526ff3 into develop Nov 21, 2022
@MatMaul MatMaul deleted the partial-join-filter-non-local branch November 21, 2022 15:46
MatMaul added a commit that referenced this pull request Nov 21, 2022
Copy link
Contributor

@squahtx squahtx left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good!

assert_host_in_room may be missing logic to check the servers_in_room list for partial state rooms.

@MatMaul
Copy link
Contributor Author

MatMaul commented Nov 21, 2022

First sorry for the merge before final validation, it's a mistake :/

I think we don't really need to do so ?

When allow_partial_state_rooms=True, assert_host_in_room should not raise for local events, since it checks membership and we have our own join event so it should be fine. It should raise for most non-local events, but they would be filtered later on anyway by filter_events_for_server.

@squahtx
Copy link
Contributor

squahtx commented Nov 21, 2022

First sorry for the merge before final validation, it's a mistake :/

I think we don't really need to do so ?

When allow_partial_state_rooms=True, assert_host_in_room should not raise for local events, since it checks membership and we have our own join event so it should be fine. It should raise for most non-local events, but they would be filtered later on anyway by filter_events_for_server.

I'm very confused. assert_host_in_room looks like

async def assert_host_in_room(
self, room_id: str, host: str, allow_partial_state_rooms: bool = False
) -> None:
"""
Asserts that the host is in the room, or raises an AuthError.
If the room is partial-stated, we raise an AuthError with the
UNABLE_DUE_TO_PARTIAL_STATE error code, unless `allow_partial_state_rooms` is true.
If allow_partial_state_rooms is True and the room is partial-stated,
this function may return an incorrect result as we are not able to fully
track server membership in a room without full state.
"""
if not allow_partial_state_rooms and await self._store.is_partial_state_room(
room_id
):
raise AuthError(
403,
"Unable to authorise you right now; room is partial-stated here.",
errcode=Codes.UNABLE_DUE_TO_PARTIAL_STATE,
)
if not await self.is_host_in_room(room_id, host):
raise AuthError(403, "Host not in room.")

and we call it with host=the remote server. is_host_in_room is implemented using room memberships, which we don't have for most remote servers in the room.

@MatMaul
Copy link
Contributor Author

MatMaul commented Nov 22, 2022

Oh yeah I don't know where my brain went but I though we were checking the event origin and not the "origin" server as of the source of the backfill/get_missing_events request.

I'll send another PR to fix that.

H-Shay pushed a commit that referenced this pull request Dec 13, 2022
…s full state (#14404)


Signed-off-by: Mathieu Velten <mathieuv@matrix.org>
Fizzadar added a commit to beeper/synapse-legacy-fork that referenced this pull request Dec 15, 2022
Synapse 1.73.0 (2022-12-06)
===========================

Please note that legacy Prometheus metric names have been removed in this release; see [the upgrade notes](https://github.com/matrix-org/synapse/blob/release-v1.73/docs/upgrade.md#legacy-prometheus-metric-names-have-now-been-removed) for more details.

No significant changes since 1.73.0rc2.

Synapse 1.73.0rc2 (2022-12-01)
==============================

Bugfixes
--------

- Fix a regression in Synapse 1.73.0rc1 where Synapse's main process would stop responding to HTTP requests when a user with a large number of devices logs in. ([\matrix-org#14582](matrix-org#14582))

Synapse 1.73.0rc1 (2022-11-29)
==============================

Features
--------

- Speed-up `/messages` with `filter_events_for_client` optimizations. ([\matrix-org#14527](matrix-org#14527))
- Improve DB performance by reducing amount of data that gets read in `device_lists_changes_in_room`. ([\matrix-org#14534](matrix-org#14534))
- Adds support for handling avatar in SSO OIDC login. Contributed by @ashfame. ([\matrix-org#13917](matrix-org#13917))
- Move MSC3030 `/timestamp_to_event` endpoints to stable `v1` location (`/_matrix/client/v1/rooms/<roomID>/timestamp_to_event?ts=<timestamp>&dir=<direction>`, `/_matrix/federation/v1/timestamp_to_event/<roomID>?ts=<timestamp>&dir=<direction>`). ([\matrix-org#14471](matrix-org#14471))
- Reduce database load of [Client-Server endpoints](https://spec.matrix.org/v1.5/client-server-api/#aggregations) which return bundled aggregations. ([\matrix-org#14491](matrix-org#14491), [\matrix-org#14508](matrix-org#14508), [\matrix-org#14510](matrix-org#14510))
- Add unstable support for an Extensible Events room version (`org.matrix.msc1767.10`) via [MSC1767](matrix-org/matrix-spec-proposals#1767), [MSC3931](matrix-org/matrix-spec-proposals#3931), [MSC3932](matrix-org/matrix-spec-proposals#3932), and [MSC3933](matrix-org/matrix-spec-proposals#3933). ([\matrix-org#14520](matrix-org#14520), [\matrix-org#14521](matrix-org#14521), [\matrix-org#14524](matrix-org#14524))
- Prune user's old devices on login if they have too many. ([\matrix-org#14038](matrix-org#14038), [\matrix-org#14580](matrix-org#14580))

Bugfixes
--------

- Fix a long-standing bug where paginating from the start of a room did not work. Contributed by @gnunicorn. ([\matrix-org#14149](matrix-org#14149))
- Fix a bug introduced in Synapse 1.58.0 where a user with presence state `org.matrix.msc3026.busy` would mistakenly be set to `online` when calling `/sync` or `/events` on a worker process. ([\matrix-org#14393](matrix-org#14393))
- Fix a bug introduced in Synapse 1.70.0 where a receipt's thread ID was not sent over federation. ([\matrix-org#14466](matrix-org#14466))
- Fix a long-standing bug where the [List media admin API](https://matrix-org.github.io/synapse/latest/admin_api/media_admin_api.html#list-all-media-in-a-room) would fail when processing an image with broken thumbnail information. ([\matrix-org#14537](matrix-org#14537))
- Fix a bug introduced in Synapse 1.67.0 where two logging context warnings would be logged on startup. ([\matrix-org#14574](matrix-org#14574))
- In application service transactions that include the experimental `org.matrix.msc3202.device_one_time_key_counts` key, include a duplicate key of `org.matrix.msc3202.device_one_time_keys_count` to match the name proposed by [MSC3202](matrix-org/matrix-spec-proposals#3202). ([\matrix-org#14565](matrix-org#14565))
- Fix a bug introduced in Synapse 0.9 where Synapse would fail to fetch server keys whose IDs contain a forward slash. ([\matrix-org#14490](matrix-org#14490))

Improved Documentation
----------------------

- Fixed link to 'Synapse administration endpoints'. ([\matrix-org#14499](matrix-org#14499))

Deprecations and Removals
-------------------------

- Remove legacy Prometheus metrics names. They were deprecated in Synapse v1.69.0 and disabled by default in Synapse v1.71.0. ([\matrix-org#14538](matrix-org#14538))

Internal Changes
----------------

- Improve type hinting throughout Synapse. ([\matrix-org#14055](matrix-org#14055), [\matrix-org#14412](matrix-org#14412), [\matrix-org#14529](matrix-org#14529), [\matrix-org#14452](matrix-org#14452)).
- Remove old stream ID tracking code. Contributed by Nick @beeper (@Fizzadar). ([\matrix-org#14376](matrix-org#14376), [\matrix-org#14468](matrix-org#14468))
- Remove the `worker_main_http_uri` configuration setting. This is now handled via internal replication. ([\matrix-org#14400](matrix-org#14400), [\matrix-org#14476](matrix-org#14476))
- Refactor `federation_sender` and `pusher` configuration loading. ([\matrix-org#14496](matrix-org#14496))
([\matrix-org#14509](matrix-org#14509), [\matrix-org#14573](matrix-org#14573))
- Faster joins: do not wait for full state when creating events to send. ([\matrix-org#14403](matrix-org#14403))
- Faster joins: filter out non local events when a room doesn't have its full state. ([\matrix-org#14404](matrix-org#14404))
- Faster joins: send events to initial list of servers if we don't have the full state yet. ([\matrix-org#14408](matrix-org#14408))
- Faster joins: use servers list approximation received during `send_join` (potentially updated with received membership events) in `assert_host_in_room`. ([\matrix-org#14515](matrix-org#14515))
- Fix type logic in TCP replication code that prevented correctly ignoring blank commands. ([\matrix-org#14449](matrix-org#14449))
- Remove option to skip locking of tables when performing emulated upserts, to avoid a class of bugs in future. ([\matrix-org#14469](matrix-org#14469))
- `scripts-dev/federation_client`: Fix routing on servers with `.well-known` files. ([\matrix-org#14479](matrix-org#14479))
- Reduce default third party invite rate limit to 216 invites per day. ([\matrix-org#14487](matrix-org#14487))
- Refactor conversion of device list changes in room to outbound pokes to track unconverted rows using a `(stream ID, room ID)` position instead of updating the `converted_to_destinations` flag on every row. ([\matrix-org#14516](matrix-org#14516))
- Add more prompts to the bug report form. ([\matrix-org#14522](matrix-org#14522))
- Extend editorconfig rules on indent and line length to `.pyi` files. ([\matrix-org#14526](matrix-org#14526))
- Run Rust CI when `Cargo.lock` changes. This is particularly useful for dependabot updates. ([\matrix-org#14571](matrix-org#14571))
- Fix a possible variable shadow in `create_new_client_event`. ([\matrix-org#14575](matrix-org#14575))
- Bump various dependencies in the `poetry.lock` file and in CI scripts. ([\matrix-org#14557](matrix-org#14557), [\matrix-org#14559](matrix-org#14559), [\matrix-org#14560](matrix-org#14560), [\matrix-org#14500](matrix-org#14500), [\matrix-org#14501](matrix-org#14501), [\matrix-org#14502](matrix-org#14502), [\matrix-org#14503](matrix-org#14503), [\matrix-org#14504](matrix-org#14504), [\matrix-org#14505](matrix-org#14505)).

# -----BEGIN PGP SIGNATURE-----
#
# iQIzBAABCgAdFiEE8SRSDO7gYkSP4chELS76LzL74EcFAmOPLnYACgkQLS76LzL7
# 4Edwpg/+KXpg2ZdiJ0Yaly9VHVeiqdHRi5D7WPS6n8YBsdRx9EQHzOBkD5HAW8hE
# oz0c+zDS01ORlEWD825NYXjgaE1ijtZFvGxsftYTVuTYlVRR2m+r9jhDv9pVHT53
# TKtQVKpG0IUsuyukRBrweDcEeO0MA0nGpvaaQUhmftzWgy4yD3AjZyIgx0Ckg8pg
# OwgrzGqA7FQs4MEeOxmk1H39fZg4dlo4nmI4whvAodgaGeS9sU8t+3Qj4PVod8v/
# AkVesJcruaTHuVMb+Xp8JKezb09SsIR94gmHalC5sL+41+6XAy9BtQ/cRDfCReG3
# U1I1x1h1+EQjTP6XzMmjQHLbfI2gUJBC4I2p3e2gZ4cMm9rVz94R1dBiRk8ZgRIC
# cJFD9BvaAtb2PSTvyFBoHsrrn/u12i8fYFWu4Z4rO6dOGI83dZHeZzVw4UsVeqIK
# 5+njQwcwQsrwL3AKLjbbdqmbmhXcF6LchIK2L+NuuvdiOfvXvkO0bdjBryVEbMqB
# IOtAAWzwYaoUwVucMbBtXt/EqQS7biGkbDxsL8CDvaBwM/JSsUWXBafsV1FmxF2A
# q6KAeKpfelefoegosTYD0Md+l39xdF8Z19XaKV3GeHZEY+HE3RJXJm+Pa8SJ+IF8
# Y1od9cB/H+fYSsWCWj1OJNqTIAozh6f1Pe2nFuFDxdBwABXc/pg=
# =IBEL
# -----END PGP SIGNATURE-----
# gpg: Signature made Tue Dec  6 11:58:46 2022 GMT
# gpg:                using RSA key F124520CEEE062448FE1C8442D2EFA2F32FBE047
# gpg: Can't check signature: No public key

# Conflicts:
#	poetry.lock
#	synapse/push/bulk_push_rule_evaluator.py
#	synapse/storage/databases/main/account_data.py
#	synapse/storage/databases/main/receipts.py
realtyem added a commit to realtyem/synapse-unraid that referenced this pull request Dec 18, 2022
Synapse 1.73.0 (2022-12-06)
===========================

Please note that legacy Prometheus metric names have been removed in this release; see [the upgrade notes](https://github.com/matrix-org/synapse/blob/release-v1.73/docs/upgrade.md#legacy-prometheus-metric-names-have-now-been-removed) for more details.

No significant changes since 1.73.0rc2.

Synapse 1.73.0rc2 (2022-12-01)
==============================

Bugfixes
--------

- Fix a regression in Synapse 1.73.0rc1 where Synapse's main process would stop responding to HTTP requests when a user with a large number of devices logs in. ([\#14582](matrix-org/synapse#14582))

Synapse 1.73.0rc1 (2022-11-29)
==============================

Features
--------

- Speed-up `/messages` with `filter_events_for_client` optimizations. ([\#14527](matrix-org/synapse#14527))
- Improve DB performance by reducing amount of data that gets read in `device_lists_changes_in_room`. ([\#14534](matrix-org/synapse#14534))
- Adds support for handling avatar in SSO OIDC login. Contributed by @ashfame. ([\#13917](matrix-org/synapse#13917))
- Move MSC3030 `/timestamp_to_event` endpoints to stable `v1` location (`/_matrix/client/v1/rooms/<roomID>/timestamp_to_event?ts=<timestamp>&dir=<direction>`, `/_matrix/federation/v1/timestamp_to_event/<roomID>?ts=<timestamp>&dir=<direction>`). ([\#14471](matrix-org/synapse#14471))
- Reduce database load of [Client-Server endpoints](https://spec.matrix.org/v1.5/client-server-api/#aggregations) which return bundled aggregations. ([\#14491](matrix-org/synapse#14491), [\#14508](matrix-org/synapse#14508), [\#14510](matrix-org/synapse#14510))
- Add unstable support for an Extensible Events room version (`org.matrix.msc1767.10`) via [MSC1767](matrix-org/matrix-spec-proposals#1767), [MSC3931](matrix-org/matrix-spec-proposals#3931), [MSC3932](matrix-org/matrix-spec-proposals#3932), and [MSC3933](matrix-org/matrix-spec-proposals#3933). ([\#14520](matrix-org/synapse#14520), [\#14521](matrix-org/synapse#14521), [\#14524](matrix-org/synapse#14524))
- Prune user's old devices on login if they have too many. ([\#14038](matrix-org/synapse#14038), [\#14580](matrix-org/synapse#14580))

Bugfixes
--------

- Fix a long-standing bug where paginating from the start of a room did not work. Contributed by @gnunicorn. ([\#14149](matrix-org/synapse#14149))
- Fix a bug introduced in Synapse 1.58.0 where a user with presence state `org.matrix.msc3026.busy` would mistakenly be set to `online` when calling `/sync` or `/events` on a worker process. ([\#14393](matrix-org/synapse#14393))
- Fix a bug introduced in Synapse 1.70.0 where a receipt's thread ID was not sent over federation. ([\#14466](matrix-org/synapse#14466))
- Fix a long-standing bug where the [List media admin API](https://matrix-org.github.io/synapse/latest/admin_api/media_admin_api.html#list-all-media-in-a-room) would fail when processing an image with broken thumbnail information. ([\#14537](matrix-org/synapse#14537))
- Fix a bug introduced in Synapse 1.67.0 where two logging context warnings would be logged on startup. ([\#14574](matrix-org/synapse#14574))
- In application service transactions that include the experimental `org.matrix.msc3202.device_one_time_key_counts` key, include a duplicate key of `org.matrix.msc3202.device_one_time_keys_count` to match the name proposed by [MSC3202](matrix-org/matrix-spec-proposals#3202). ([\#14565](matrix-org/synapse#14565))
- Fix a bug introduced in Synapse 0.9 where Synapse would fail to fetch server keys whose IDs contain a forward slash. ([\#14490](matrix-org/synapse#14490))

Improved Documentation
----------------------

- Fixed link to 'Synapse administration endpoints'. ([\#14499](matrix-org/synapse#14499))

Deprecations and Removals
-------------------------

- Remove legacy Prometheus metrics names. They were deprecated in Synapse v1.69.0 and disabled by default in Synapse v1.71.0. ([\#14538](matrix-org/synapse#14538))

Internal Changes
----------------

- Improve type hinting throughout Synapse. ([\#14055](matrix-org/synapse#14055), [\#14412](matrix-org/synapse#14412), [\#14529](matrix-org/synapse#14529), [\#14452](matrix-org/synapse#14452)).
- Remove old stream ID tracking code. Contributed by Nick @beeper (@Fizzadar). ([\#14376](matrix-org/synapse#14376), [\#14468](matrix-org/synapse#14468))
- Remove the `worker_main_http_uri` configuration setting. This is now handled via internal replication. ([\#14400](matrix-org/synapse#14400), [\#14476](matrix-org/synapse#14476))
- Refactor `federation_sender` and `pusher` configuration loading. ([\#14496](matrix-org/synapse#14496))
([\#14509](matrix-org/synapse#14509), [\#14573](matrix-org/synapse#14573))
- Faster joins: do not wait for full state when creating events to send. ([\#14403](matrix-org/synapse#14403))
- Faster joins: filter out non local events when a room doesn't have its full state. ([\#14404](matrix-org/synapse#14404))
- Faster joins: send events to initial list of servers if we don't have the full state yet. ([\#14408](matrix-org/synapse#14408))
- Faster joins: use servers list approximation received during `send_join` (potentially updated with received membership events) in `assert_host_in_room`. ([\#14515](matrix-org/synapse#14515))
- Fix type logic in TCP replication code that prevented correctly ignoring blank commands. ([\#14449](matrix-org/synapse#14449))
- Remove option to skip locking of tables when performing emulated upserts, to avoid a class of bugs in future. ([\#14469](matrix-org/synapse#14469))
- `scripts-dev/federation_client`: Fix routing on servers with `.well-known` files. ([\#14479](matrix-org/synapse#14479))
- Reduce default third party invite rate limit to 216 invites per day. ([\#14487](matrix-org/synapse#14487))
- Refactor conversion of device list changes in room to outbound pokes to track unconverted rows using a `(stream ID, room ID)` position instead of updating the `converted_to_destinations` flag on every row. ([\#14516](matrix-org/synapse#14516))
- Add more prompts to the bug report form. ([\#14522](matrix-org/synapse#14522))
- Extend editorconfig rules on indent and line length to `.pyi` files. ([\#14526](matrix-org/synapse#14526))
- Run Rust CI when `Cargo.lock` changes. This is particularly useful for dependabot updates. ([\#14571](matrix-org/synapse#14571))
- Fix a possible variable shadow in `create_new_client_event`. ([\#14575](matrix-org/synapse#14575))
- Bump various dependencies in the `poetry.lock` file and in CI scripts. ([\#14557](matrix-org/synapse#14557), [\#14559](matrix-org/synapse#14559), [\#14560](matrix-org/synapse#14560), [\#14500](matrix-org/synapse#14500), [\#14501](matrix-org/synapse#14501), [\#14502](matrix-org/synapse#14502), [\#14503](matrix-org/synapse#14503), [\#14504](matrix-org/synapse#14504), [\#14505](matrix-org/synapse#14505)).
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants