Skip to content
This repository has been archived by the owner on Apr 26, 2024. It is now read-only.

Fetch fewer events when getting hosts in room #14962

Merged
Merged
Show file tree
Hide file tree
Changes from 3 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions changelog.d/14962.feature
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
Improve performance when joining or sending an event large rooms.
53 changes: 51 additions & 2 deletions synapse/storage/databases/main/roommember.py
Original file line number Diff line number Diff line change
Expand Up @@ -13,6 +13,7 @@
# See the License for the specific language governing permissions and
# limitations under the License.
import logging
from itertools import chain
from typing import (
TYPE_CHECKING,
AbstractSet,
Expand Down Expand Up @@ -1131,12 +1132,33 @@ async def _get_joined_hosts(
else:
# The cache doesn't match the state group or prev state group,
# so we calculate the result from first principles.
#
# We need to fetch all hosts joined to the room according to `state` by
# inspecting all join memberships in `state`. However, if the `state` is
# relatively recent then many of its events are likely to be held in
# the current state of the room, which is easily available and likely
# cached.
#
# We therefore compute the set of `state` events not in the
# current state and only fetch those.
current_memberships = (
await self._get_approximate_current_memberships_in_room(room_id)
)
DMRobertson marked this conversation as resolved.
Show resolved Hide resolved
unknown_state_events = {}
joined_users_in_current_state = []

for (type, state_key), event_id in state.items():
if event_id not in current_memberships:
unknown_state_events[type, state_key] = event_id
elif current_memberships[event_id] == Membership.JOIN:
joined_users_in_current_state.append(state_key)

joined_user_ids = await self.get_joined_user_ids_from_state(
room_id, state
room_id, unknown_state_events
)

cache.hosts_to_joined_users = {}
for user_id in joined_user_ids:
for user_id in chain(joined_user_ids, joined_users_in_current_state):
host = intern_string(get_domain_from_id(user_id))
cache.hosts_to_joined_users.setdefault(host, set()).add(user_id)

Expand All @@ -1147,6 +1169,33 @@ async def _get_joined_hosts(

return frozenset(cache.hosts_to_joined_users)

# TODO: this _might_ turn out to need caching, let's see
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we remove this todo?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure.

Erik pointed out that we should(?) only hit this function if there was a cache miss for get_joined_hosts; an extra layer of caching isn't likely to get us anything.

DMRobertson marked this conversation as resolved.
Show resolved Hide resolved
async def _get_approximate_current_memberships_in_room(
self, room_id: str
) -> Mapping[str, Optional[str]]:
"""Build a map from event id to membership, for all events in the current state.

The event ids of non-memberships events (e.g. `m.room.power_levels`) are present
in the result, mapped to values of `None`.

The result is approximate for partially-joined rooms. It is fully accurate
for fully-joined rooms.
"""

def f(txn: LoggingTransaction) -> List[Tuple[str, str]]:
sql = """
SELECT event_id, membership
FROM current_state_events
WHERE room_id = ?;
"""
txn.execute(sql, (room_id,))
return txn.fetchall() # type: ignore[return-value]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you could replace this with a simple select, but doesnt' really matter.

Or:

Suggested change
return txn.fetchall() # type: ignore[return-value]
return {row[0]: row[1] for row in txn} # type: ignore[return-value]

to save building a list then a dict

but much of a muchness


rows = await self.db_pool.runInteraction(
"_get_approimate_current_memberships_in_room", f
)
return {row[0]: row[1] for row in rows}

@cached(max_entries=10000)
def _get_joined_hosts_cache(self, room_id: str) -> "_JoinedHostsCache":
return _JoinedHostsCache()
Expand Down