Skip to content
This repository has been archived by the owner on Apr 12, 2024. It is now read-only.

Redesigning user discovery in Tchap #125

Closed
babolivier opened this issue Mar 7, 2022 · 9 comments
Closed

Redesigning user discovery in Tchap #125

babolivier opened this issue Mar 7, 2022 · 9 comments

Comments

@babolivier
Copy link
Contributor

babolivier commented Mar 7, 2022

Current context

Problem

Consider Alice, a user on server A, and Bob, a user on server B. Also consider that both servers have search_all_users set to true. Currently, Alice can only discover Bob (through the user directory) if server A is in a room that Bob is joined to.

There is no guarantee that this will always be the case, and there is a desire for Alice to always be able to look for Bob, or any other user on the platform.

Current solution

Tchap's fork of Synapse (i.e. this repo) automatically replicates profiles to a centralised Sydent instance (see matrix-org/synapse#3112 and matrix-org/sydent#56), and then delegate user directory search to this central Sydent instance (see matrix-org/synapse#3123 and matrix-org/sydent#57). Users can also opt out of the replication by adding a im.vector.hide_profile to their account data (see matrix-org/synapse#4148).

When working on porting this feature as a module, I realised it was fairly complex, since it needs to handle incremental replication and replicate metadata on top of the profile changes (e.g. deactivation, hiding, etc). Plus it means it will require future non-trivial work to support future features related to profiles (e.g. extensible profiles) when they come round.

Proposed alternative

Each homeserver has search_all_users turned on, and has a single, special-cased room. Every user is joined to their server's room, new users are automatically added to it using Synapse's auto join feature. To each server's room, we also join a dummy user from every other homeserver on the platform. This means, considering servers A, B and C, server A's room is joined by every user on server A as well as by dummy users from servers B and C. This ensures servers B and C know every user on server A.

Due to the special aspect of these rooms, we also add a config option to Synapse to hide a room (or a list of rooms, to make it potentially more future-proof) from sync responses (so clients don't know about it, otherwise users would wonder what that weird room is). We tweak the power levels so that only membership events can be sent to that room (to limit the amount of data associated with it), and (optionally) add rules (either in Synapse or in the reverse proxy, TBD) to prevent users from changing their membership via /join|leave (or equivalent federation APIs) or by sending m.room.member events directly into it.

We also build a small, simple module that reacts to account data update (we'll need to create a callback for this), specifically looking for changes to im.vector.hide_profile events, and join/part the user to the room accordingly. It will also need to maintain a local table of users on the homeservers whose profile need hiding, and filter them out using the check_username_for_spam (which filters out users from user directory search).

The end result is:

  • every user on the platform can be looked up from any homeserver
  • users can hide and un-hide themselves from the user directory the same way they could before
  • we don't have to support deactivation because a deactivated user will automatically leave the room
  • for the most part, we use built-in Matrix features rather than complex custom ones (so less maintenance load)

I've discussed this idea with @erikjohnston and @MatMaul, and the potential issues they have raised are:

  • possible impact on performance, but we agreed that it would be negligible
  • we might want to have a centralised service be the source of truth for a user's profile in the future, in which case we can implement the ability for modules to override user directory search results. This was supposed to be part of the work involved in the profile replication/user directory mainlining, and would fall out of scope if the proposed alternative solution is implemented, but we can always bring that idea back to life in the future if needed.

Other alternatives considered

A first draft of this idea involved using just one massive room for the whole platform with every single Tchap user in it, but that would probably be quite bad in terms of performance and would probably make it really annoying to add a new homeserver to this system should we need it (granted, the current works on fast room joins should eliminate most of this pain, but I'd rather we don't create too many potential issues for ourselves here).

@babolivier
Copy link
Contributor Author

For the paper trail: I've discussed this with @giomfo this morning and he agrees this is a good idea.

@dklimpel
Copy link
Contributor

dklimpel commented Mar 23, 2022

Can you solve the following use case with the idea?

  • On my own homeserver I want to be found in the address book
  • Users of foreign homeservers are not allowed to find me
  • Server administrators decide which foreign homserver users are allowed to find (room ACL?)

How is it differentiated in the search whether a user searches for the user name or display name or the thrid party id (email, msisdn).

@babolivier
Copy link
Contributor Author

babolivier commented Mar 23, 2022

  • On my own homeserver I want to be found in the address book
  • Users of foreign homeservers are not allowed to find me
  • Administrators decide which foreign homserver users are allowed to find

It wouldn't solve this use case on its own. However, you could record users you don't want to be looked up on remote servers on a service central to your platform, and have a module read from it and exclude users using the check_username_for_spam callback.

Notes on ACLs (since I've seen your edit while writing this): using server ACLs in discovery rooms would only work if the discovery room is the only room the remote server shares with the users from the server you want to hide. If they share another room they would still show up in the results, hence why you'd need a module to filter them out.

How is it differentiated in the search whether a user searches for the user name or display name or the thrid party id (email, msisdn).

These are two different kind of lookups, so 3PIDs will need a different solution. As far as I can tell quite a lot of looking up 3PIDs in the process of searching for users happens at the client level so you'll probably need a change there.

@babolivier
Copy link
Contributor Author

I forgot to update this ticket:

For people with access to the Element shared Drive, https://docs.google.com/document/d/1wQw1OMgqn0blbv4Jt7ApZhR8mtN39ZLkNHKuNY-Xd_M/edit# has more information about the steps that need to be taken on the ops side of things to enable this to work.

@babolivier
Copy link
Contributor Author

Closing this issue since I think there isn't much else to do here (apart setting up the discovery rooms in the Tchap infra).

@odelcroi
Copy link

odelcroi commented Feb 16, 2023

Can you check this phrase: "Currently, Alice can only discover Bob (through the user directory) if server A is in a room that Bob is joined to."

would it be "If server A is aware of a room that Bob has joined" ?

@odelcroi
Copy link

From my understanding :
Alice is in server A, dummy user A is in the discovery-room B of server B, then server A knows about the discovery-room B and all its participant (synapse federation behaviour)
Then as a side effect, server A knows all users of server B thus Alice knows about Bob.

@babolivier
Copy link
Contributor Author

(disclaimer: I'm not an Element employee anymore so am not longer directly involved with Tchap, but I got the notification and the repo is public so I thought I might as well answer)

would it be "If server A is aware of a room that Bob has joined" ?

Currently, the only way server A can be aware of a room that Bob has joined is if server A is in the room (i.e. has a local user joined to the room). When a Synapse server fully leaves a room (i.e. the last remaining local user leaves the room), then it's removed from that server's room directory. And if the server never joined it in the first place, it doesn't even know about it. So a lot of words to say that those two sentences are equivalent 🙂

From my understanding :
Alice is in server A, dummy user A is in the discovery-room B of server B, then server A knows about the discovery-room B and all its participant (synapse federation behaviour)
Then as a side effect, server A knows all users of server B thus Alice knows about Bob.

This looks correct.

@odelcroi
Copy link

Crystal clear, thanks for your contribution :)

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants