Skip to content
This repository was archived by the owner on Apr 26, 2024. It is now read-only.

Commit bf82b56

Browse files
authored
Add more user information to export-data command. (#14894)
* The user's profile information. * The user's devices. * The user's connections / IP address information.
1 parent 1958f9d commit bf82b56

File tree

6 files changed

+206
-20
lines changed

6 files changed

+206
-20
lines changed

.ci/scripts/test_export_data_command.sh

Lines changed: 6 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -23,8 +23,9 @@ poetry run python -m synapse.app.admin_cmd -c .ci/sqlite-config.yaml export-dat
2323
--output-directory /tmp/export_data
2424

2525
# Test that the output directory exists and contains the rooms directory
26-
dir="/tmp/export_data/rooms"
27-
if [ -d "$dir" ]; then
26+
dir_r="/tmp/export_data/rooms"
27+
dir_u="/tmp/export_data/user_data"
28+
if [ -d "$dir_r" ] && [ -d "$dir_u" ]; then
2829
echo "Command successful, this test passes"
2930
else
3031
echo "No output directories found, the command fails against a sqlite database."
@@ -43,8 +44,9 @@ poetry run python -m synapse.app.admin_cmd -c .ci/postgres-config.yaml export-d
4344
--output-directory /tmp/export_data2
4445

4546
# Test that the output directory exists and contains the rooms directory
46-
dir2="/tmp/export_data2/rooms"
47-
if [ -d "$dir2" ]; then
47+
dir_r2="/tmp/export_data2/rooms"
48+
dir_u2="/tmp/export_data2/user_data"
49+
if [ -d "$dir_r2" ] && [ -d "$dir_u2" ]; then
4850
echo "Command successful, this test passes"
4951
else
5052
echo "No output directories found, the command fails against a postgres database."

changelog.d/14894.feature

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1 @@
1+
Adds profile information, devices and connections to the user data export via command line.

docs/usage/administration/admin_faq.md

Lines changed: 65 additions & 15 deletions
Original file line numberDiff line numberDiff line change
@@ -2,13 +2,19 @@
22

33
How do I become a server admin?
44
---
5-
If your server already has an admin account you should use the [User Admin API](../../admin_api/user_admin_api.md#change-whether-a-user-is-a-server-administrator-or-not) to promote other accounts to become admins.
5+
If your server already has an admin account you should use the
6+
[User Admin API](../../admin_api/user_admin_api.md#change-whether-a-user-is-a-server-administrator-or-not)
7+
to promote other accounts to become admins.
68

7-
If you don't have any admin accounts yet you won't be able to use the admin API, so you'll have to edit the database manually. Manually editing the database is generally not recommended so once you have an admin account: use the admin APIs to make further changes.
9+
If you don't have any admin accounts yet you won't be able to use the admin API,
10+
so you'll have to edit the database manually. Manually editing the database is
11+
generally not recommended so once you have an admin account: use the admin APIs
12+
to make further changes.
813

914
```sql
1015
UPDATE users SET admin = 1 WHERE name = '@foo:bar.com';
1116
```
17+
1218
What servers are my server talking to?
1319
---
1420
Run this sql query on your db:
@@ -36,8 +42,38 @@ How can I export user data?
3642
---
3743
Synapse includes a Python command to export data for a specific user. It takes the homeserver
3844
configuration file and the full Matrix ID of the user to export:
45+
3946
```console
40-
python -m synapse.app.admin_cmd -c <config_file> export-data <user_id>
47+
python -m synapse.app.admin_cmd -c <config_file> export-data <user_id> --output-directory <directory_path>
48+
```
49+
50+
If you uses [Poetry](../../development/dependencies.md#managing-dependencies-with-poetry)
51+
to run Synapse:
52+
53+
```console
54+
poetry run python -m synapse.app.admin_cmd -c <config_file> export-data <user_id> --output-directory <directory_path>
55+
```
56+
57+
The directory to store the export data in can be customised with the
58+
`--output-directory` parameter; ensure that the provided directory is
59+
empty. If this parameter is not provided, Synapse defaults to creating
60+
a temporary directory (which starts with "synapse-exfiltrate") in `/tmp`,
61+
`/var/tmp`, or `/usr/tmp`, in that order.
62+
63+
The exported data has the following layout:
64+
65+
```
66+
output-directory
67+
├───rooms
68+
│ └───<room_id>
69+
│ ├───events
70+
│ ├───state
71+
│ ├───invite_state
72+
│ └───knock_state
73+
└───user_data
74+
├───connections
75+
├───devices
76+
└───profile
4177
```
4278

4379
Manually resetting passwords
@@ -50,21 +86,29 @@ I have a problem with my server. Can I just delete my database and start again?
5086
---
5187
Deleting your database is unlikely to make anything better.
5288

53-
It's easy to make the mistake of thinking that you can start again from a clean slate by dropping your database, but things don't work like that in a federated network: lots of other servers have information about your server.
89+
It's easy to make the mistake of thinking that you can start again from a clean
90+
slate by dropping your database, but things don't work like that in a federated
91+
network: lots of other servers have information about your server.
5492

55-
For example: other servers might think that you are in a room, your server will think that you are not, and you'll probably be unable to interact with that room in a sensible way ever again.
93+
For example: other servers might think that you are in a room, your server will
94+
think that you are not, and you'll probably be unable to interact with that room
95+
in a sensible way ever again.
5696

57-
In general, there are better solutions to any problem than dropping the database. Come and seek help in https://matrix.to/#/#synapse:matrix.org.
97+
In general, there are better solutions to any problem than dropping the database.
98+
Come and seek help in https://matrix.to/#/#synapse:matrix.org.
5899

59100
There are two exceptions when it might be sensible to delete your database and start again:
60-
* You have *never* joined any rooms which are federated with other servers. For instance, a local deployment which the outside world can't talk to.
61-
* You are changing the `server_name` in the homeserver configuration. In effect this makes your server a completely new one from the point of view of the network, so in this case it makes sense to start with a clean database.
101+
* You have *never* joined any rooms which are federated with other servers. For
102+
instance, a local deployment which the outside world can't talk to.
103+
* You are changing the `server_name` in the homeserver configuration. In effect
104+
this makes your server a completely new one from the point of view of the network,
105+
so in this case it makes sense to start with a clean database.
62106
(In both cases you probably also want to clear out the media_store.)
63107

64108
I've stuffed up access to my room, how can I delete it to free up the alias?
65109
---
66110
Using the following curl command:
67-
```
111+
```console
68112
curl -H 'Authorization: Bearer <access-token>' -X DELETE https://matrix.org/_matrix/client/r0/directory/room/<room-alias>
69113
```
70114
`<access-token>` - can be obtained in riot by looking in the riot settings, down the bottom is:
@@ -75,19 +119,25 @@ Access Token:\<click to reveal\>
75119
How can I find the lines corresponding to a given HTTP request in my homeserver log?
76120
---
77121

78-
Synapse tags each log line according to the HTTP request it is processing. When it finishes processing each request, it logs a line containing the words `Processed request: `. For example:
122+
Synapse tags each log line according to the HTTP request it is processing. When
123+
it finishes processing each request, it logs a line containing the words
124+
`Processed request: `. For example:
79125

80126
```
81127
2019-02-14 22:35:08,196 - synapse.access.http.8008 - 302 - INFO - GET-37 - ::1 - 8008 - {@richvdh:localhost} Processed request: 0.173sec/0.001sec (0.002sec, 0.000sec) (0.027sec/0.026sec/2) 687B 200 "GET /_matrix/client/r0/sync HTTP/1.1" "Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/69.0.3497.100 Safari/537.36" [0 dbevts]"
82128
```
83129

84-
Here we can see that the request has been tagged with `GET-37`. (The tag depends on the method of the HTTP request, so might start with `GET-`, `PUT-`, `POST-`, `OPTIONS-` or `DELETE-`.) So to find all lines corresponding to this request, we can do:
130+
Here we can see that the request has been tagged with `GET-37`. (The tag depends
131+
on the method of the HTTP request, so might start with `GET-`, `PUT-`, `POST-`,
132+
`OPTIONS-` or `DELETE-`.) So to find all lines corresponding to this request, we can do:
85133

86-
```
134+
```console
87135
grep 'GET-37' homeserver.log
88136
```
89137

90-
If you want to paste that output into a github issue or matrix room, please remember to surround it with triple-backticks (```) to make it legible (see [quoting code](https://help.github.com/en/articles/basic-writing-and-formatting-syntax#quoting-code)).
138+
If you want to paste that output into a github issue or matrix room, please
139+
remember to surround it with triple-backticks (```) to make it legible
140+
(see [quoting code](https://help.github.com/en/articles/basic-writing-and-formatting-syntax#quoting-code)).
91141

92142

93143
What do all those fields in the 'Processed' line mean?
@@ -127,7 +177,7 @@ This is normally caused by a misconfiguration in your reverse-proxy. See [the re
127177

128178

129179
Help!! Synapse is slow and eats all my RAM/CPU!
130-
-----------------------------------------------
180+
---
131181

132182
First, ensure you are running the latest version of Synapse, using Python 3
133183
with a [PostgreSQL database](../../postgres.md).
@@ -169,7 +219,7 @@ in the Synapse config file: [see here](../configuration/config_documentation.md#
169219

170220

171221
Running out of File Handles
172-
---------------------------
222+
---
173223

174224
If Synapse runs out of file handles, it typically fails badly - live-locking
175225
at 100% CPU, and/or failing to accept new TCP connections (blocking the

synapse/app/admin_cmd.py

Lines changed: 31 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -35,6 +35,7 @@
3535
ApplicationServiceTransactionWorkerStore,
3636
ApplicationServiceWorkerStore,
3737
)
38+
from synapse.storage.databases.main.client_ips import ClientIpWorkerStore
3839
from synapse.storage.databases.main.deviceinbox import DeviceInboxWorkerStore
3940
from synapse.storage.databases.main.devices import DeviceWorkerStore
4041
from synapse.storage.databases.main.event_federation import EventFederationWorkerStore
@@ -43,6 +44,7 @@
4344
)
4445
from synapse.storage.databases.main.events_worker import EventsWorkerStore
4546
from synapse.storage.databases.main.filtering import FilteringWorkerStore
47+
from synapse.storage.databases.main.profile import ProfileWorkerStore
4648
from synapse.storage.databases.main.push_rule import PushRulesWorkerStore
4749
from synapse.storage.databases.main.receipts import ReceiptsWorkerStore
4850
from synapse.storage.databases.main.registration import RegistrationWorkerStore
@@ -54,7 +56,7 @@
5456
from synapse.storage.databases.main.stream import StreamWorkerStore
5557
from synapse.storage.databases.main.tags import TagsWorkerStore
5658
from synapse.storage.databases.main.user_erasure_store import UserErasureWorkerStore
57-
from synapse.types import StateMap
59+
from synapse.types import JsonDict, StateMap
5860
from synapse.util import SYNAPSE_VERSION
5961
from synapse.util.logcontext import LoggingContext
6062

@@ -63,6 +65,7 @@
6365

6466
class AdminCmdSlavedStore(
6567
FilteringWorkerStore,
68+
ClientIpWorkerStore,
6669
DeviceWorkerStore,
6770
TagsWorkerStore,
6871
DeviceInboxWorkerStore,
@@ -82,6 +85,7 @@ class AdminCmdSlavedStore(
8285
EventsWorkerStore,
8386
RegistrationWorkerStore,
8487
RoomWorkerStore,
88+
ProfileWorkerStore,
8589
):
8690
def __init__(
8791
self,
@@ -192,6 +196,32 @@ def write_knock(
192196
for event in state.values():
193197
print(json.dumps(event), file=f)
194198

199+
def write_profile(self, profile: JsonDict) -> None:
200+
user_directory = os.path.join(self.base_directory, "user_data")
201+
os.makedirs(user_directory, exist_ok=True)
202+
profile_file = os.path.join(user_directory, "profile")
203+
204+
with open(profile_file, "a") as f:
205+
print(json.dumps(profile), file=f)
206+
207+
def write_devices(self, devices: List[JsonDict]) -> None:
208+
user_directory = os.path.join(self.base_directory, "user_data")
209+
os.makedirs(user_directory, exist_ok=True)
210+
device_file = os.path.join(user_directory, "devices")
211+
212+
for device in devices:
213+
with open(device_file, "a") as f:
214+
print(json.dumps(device), file=f)
215+
216+
def write_connections(self, connections: List[JsonDict]) -> None:
217+
user_directory = os.path.join(self.base_directory, "user_data")
218+
os.makedirs(user_directory, exist_ok=True)
219+
connection_file = os.path.join(user_directory, "connections")
220+
221+
for connection in connections:
222+
with open(connection_file, "a") as f:
223+
print(json.dumps(connection), file=f)
224+
195225
def finished(self) -> str:
196226
return self.base_directory
197227

synapse/handlers/admin.py

Lines changed: 43 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -30,6 +30,7 @@
3030
class AdminHandler:
3131
def __init__(self, hs: "HomeServer"):
3232
self.store = hs.get_datastores().main
33+
self._device_handler = hs.get_device_handler()
3334
self._storage_controllers = hs.get_storage_controllers()
3435
self._state_storage_controller = self._storage_controllers.state
3536
self._msc3866_enabled = hs.config.experimental.msc3866.enabled
@@ -247,6 +248,21 @@ async def export_user_data(self, user_id: str, writer: "ExfiltrationWriter") ->
247248
)
248249
writer.write_state(room_id, event_id, state)
249250

251+
# Get the user profile
252+
profile = await self.get_user(UserID.from_string(user_id))
253+
if profile is not None:
254+
writer.write_profile(profile)
255+
256+
# Get all devices the user has
257+
devices = await self._device_handler.get_devices_by_user(user_id)
258+
writer.write_devices(devices)
259+
260+
# Get all connections the user has
261+
connections = await self.get_whois(UserID.from_string(user_id))
262+
writer.write_connections(
263+
connections["devices"][""]["sessions"][0]["connections"]
264+
)
265+
250266
return writer.finished()
251267

252268

@@ -297,6 +313,33 @@ def write_knock(
297313
"""
298314
raise NotImplementedError()
299315

316+
@abc.abstractmethod
317+
def write_profile(self, profile: JsonDict) -> None:
318+
"""Write the profile of a user.
319+
320+
Args:
321+
profile: The user profile.
322+
"""
323+
raise NotImplementedError()
324+
325+
@abc.abstractmethod
326+
def write_devices(self, devices: List[JsonDict]) -> None:
327+
"""Write the devices of a user.
328+
329+
Args:
330+
devices: The list of devices.
331+
"""
332+
raise NotImplementedError()
333+
334+
@abc.abstractmethod
335+
def write_connections(self, connections: List[JsonDict]) -> None:
336+
"""Write the connections of a user.
337+
338+
Args:
339+
connections: The list of connections / sessions.
340+
"""
341+
raise NotImplementedError()
342+
300343
@abc.abstractmethod
301344
def finished(self) -> Any:
302345
"""Called when all data has successfully been exported and written.

tests/handlers/test_admin.py

Lines changed: 60 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -38,6 +38,7 @@ class ExfiltrateData(unittest.HomeserverTestCase):
3838

3939
def prepare(self, reactor: MemoryReactor, clock: Clock, hs: HomeServer) -> None:
4040
self.admin_handler = hs.get_admin_handler()
41+
self._store = hs.get_datastores().main
4142

4243
self.user1 = self.register_user("user1", "password")
4344
self.token1 = self.login("user1", "password")
@@ -236,3 +237,62 @@ def test_knock(self) -> None:
236237
self.assertEqual(args[0], room_id)
237238
self.assertEqual(args[1].content["membership"], "knock")
238239
self.assertTrue(args[2]) # Assert there is at least one bit of state
240+
241+
def test_profile(self) -> None:
242+
"""Tests that user profile get exported."""
243+
writer = Mock()
244+
245+
self.get_success(self.admin_handler.export_user_data(self.user2, writer))
246+
247+
writer.write_events.assert_not_called()
248+
writer.write_profile.assert_called_once()
249+
250+
# check only a few values, not all available
251+
args = writer.write_profile.call_args[0]
252+
self.assertEqual(args[0]["name"], self.user2)
253+
self.assertIn("displayname", args[0])
254+
self.assertIn("avatar_url", args[0])
255+
self.assertIn("threepids", args[0])
256+
self.assertIn("external_ids", args[0])
257+
self.assertIn("creation_ts", args[0])
258+
259+
def test_devices(self) -> None:
260+
"""Tests that user devices get exported."""
261+
writer = Mock()
262+
263+
self.get_success(self.admin_handler.export_user_data(self.user2, writer))
264+
265+
writer.write_events.assert_not_called()
266+
writer.write_devices.assert_called_once()
267+
268+
args = writer.write_devices.call_args[0]
269+
self.assertEqual(len(args[0]), 1)
270+
self.assertEqual(args[0][0]["user_id"], self.user2)
271+
self.assertIn("device_id", args[0][0])
272+
self.assertIsNone(args[0][0]["display_name"])
273+
self.assertIsNone(args[0][0]["last_seen_user_agent"])
274+
self.assertIsNone(args[0][0]["last_seen_ts"])
275+
self.assertIsNone(args[0][0]["last_seen_ip"])
276+
277+
def test_connections(self) -> None:
278+
"""Tests that user sessions / connections get exported."""
279+
# Insert a user IP
280+
self.get_success(
281+
self._store.insert_client_ip(
282+
self.user2, "access_token", "ip", "user_agent", "MY_DEVICE"
283+
)
284+
)
285+
286+
writer = Mock()
287+
288+
self.get_success(self.admin_handler.export_user_data(self.user2, writer))
289+
290+
writer.write_events.assert_not_called()
291+
writer.write_connections.assert_called_once()
292+
293+
args = writer.write_connections.call_args[0]
294+
self.assertEqual(len(args[0]), 1)
295+
self.assertEqual(args[0][0]["ip"], "ip")
296+
self.assertEqual(args[0][0]["user_agent"], "user_agent")
297+
self.assertGreater(args[0][0]["last_seen"], 0)
298+
self.assertNotIn("access_token", args[0][0])

0 commit comments

Comments
 (0)