SSS: slow incremental sync when an event backlog builds up (i.e. if you're offline for a while) #3223

ara4n · 2024-09-03T11:56:17Z

Steps to reproduce

Go offline for a few hours/days/weeks in a busy account
Launch EX
Observe incr sync takes tens of seconds, proportional to the time spent offline.

Outcome

What did you expect?

Incremental sync should be O(1) not O(N) with time spent offline.

Specifically, the server should reset the SSS connection after 30m offline (or after 2000 events stack up in the backlog) to force the client to do a paginated initial sync when it next launches rather than a slow unpaginated incr sync.

In future, we should probably paginate the incr sync instead so it syncs rapidly (to avoid overloading the server with lots of unnecessary full initial syncs after every 30m of idleness), but that's a separate MSC.

What happened instead?

Slow incr sync. (In theory, although I haven't actually had a chance to spot & check this in practice - this is a theoretical vuln)

Your phone model

No response

Operating system version

No response

Application version

697

Homeserver

No response

Will you send logs?

No

erikjohnston · 2024-09-03T12:02:15Z

From backend point of view we should add logic to reset the connection if it looks like there are "a lot" of updates to send in response to a request. It's sub-optimal to have to do this, as we end up sending down all the old rooms all over again, wasting server and client resources and bandwidth. Though its a good stop-gap.

From a client/SDK point of view I think it'd be good to reduce the range back down to [0-19] after some time of inactivity (but not reset the connection, like we do when we see a connection error/timeout). This will then allow the server to (hopefully) respond quickly to the first request and for the client to fetch the rest of the updates in when the list grows. I don't think it really matters too much if we reduce to [0,19] relatively quickly, so I'd probably suggest a 30m timer would be a good first try.

erikjohnston · 2024-09-03T13:36:08Z

Sounds like since the SDK doesn't persist the pos tokens, we think this case will unlikely to be hit in practice, though we should still something here.

erikjohnston · 2024-09-03T14:01:21Z

Reduce range issue: matrix-org/matrix-rust-sdk#3935
Persist position across restart issue: matrix-org/matrix-rust-sdk#3936

Reset connection server side: element-hq/synapse#17653

ara4n added the T-Defect label Sep 3, 2024

erikjohnston mentioned this issue Sep 3, 2024

SSS: Enter "recovering" mode if the app has been offline for "a while" matrix-org/matrix-rust-sdk#3935

Closed

erikjohnston closed this as completed Sep 3, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

SSS: slow incremental sync when an event backlog builds up (i.e. if you're offline for a while) #3223

SSS: slow incremental sync when an event backlog builds up (i.e. if you're offline for a while) #3223

ara4n commented Sep 3, 2024

erikjohnston commented Sep 3, 2024 •

edited

Loading

erikjohnston commented Sep 3, 2024

erikjohnston commented Sep 3, 2024

SSS: slow incremental sync when an event backlog builds up (i.e. if you're offline for a while) #3223

SSS: slow incremental sync when an event backlog builds up (i.e. if you're offline for a while) #3223

Comments

ara4n commented Sep 3, 2024

Steps to reproduce

Outcome

What did you expect?

What happened instead?

Your phone model

Operating system version

Application version

Homeserver

Will you send logs?

erikjohnston commented Sep 3, 2024 • edited Loading

erikjohnston commented Sep 3, 2024

erikjohnston commented Sep 3, 2024

erikjohnston commented Sep 3, 2024 •

edited

Loading