Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

All E2E history taking ~15 minutes to decrypt on launch #13266

Closed
ara4n opened this issue Apr 18, 2020 · 24 comments
Closed

All E2E history taking ~15 minutes to decrypt on launch #13266

ara4n opened this issue Apr 18, 2020 · 24 comments

Comments

@ara4n
Copy link
Member

ara4n commented Apr 18, 2020

...actually, as i filed this, they finally decrypted. But it took about 5 minutes.

@ara4n
Copy link
Member Author

ara4n commented Apr 23, 2020

i'm seeing this every time i launch Nightly.

@ara4n ara4n changed the title All E2E messages refusing to decrypt on latest Nightly. All E2E history taking ~5 minutes to decrypt on launch on Desktop Apr 23, 2020
@ara4n
Copy link
Member Author

ara4n commented Apr 26, 2020

Just had another instance of this - i launched Nightly at 19:16:56, and my E2EE messages only became visible at 19:22:07. The server is relatively unloaded, and i'm on good internet.

@turt2live
Copy link
Member

@ara4n is this messages being invisible or saying the words "unable to decrypt"?

@ara4n
Copy link
Member Author

ara4n commented May 4, 2020

this is the messages being invisible (unless you have invisible events being displayed, and then you see m.room.encrypted placeholders until they finally decrypt).

I just had it again, except this time it literally took 10 minutes before history was visible.

@turt2live
Copy link
Member

This is a long-standing bug then: #7116

@ara4n
Copy link
Member Author

ara4n commented May 4, 2020

yes, except it used to be a rare occurrence, whereas now it is 100% reproducable and really impacts usability.

@turt2live
Copy link
Member

I've seen it every single day since reporting 2 years ago :(

@ara4n
Copy link
Member Author

ara4n commented May 4, 2020

well, this has got to be fixable - during the 10 minutes, incremental /syncs are happily bouncing away - it's very unclear what it's actually blocking on...

@dbkr dbkr self-assigned this May 4, 2020
@dbkr
Copy link
Member

dbkr commented May 4, 2020

So this is because at startup we load 50 events worth of history for each room from disk and decrypt any that are encrypted. This takes some time but is also slowed down quite a lot by it firing Event.decrypted for every one which causes things like breadcrumbs to recalculate badges and the timeline to update.

Things we could do to make this smarter:

  • Decrypt events in a priority queue of some kind, prioritising events for the room that's visible (would obviously need a way for the app to tell js-sdk which to prioritise, and a reasonably invasive change since right now we just start decrypting them all async and leave them to it).
  • Decrypt events lazily: tricker as we'd only be able to do this for events where we knew, without decrypting them, that they wouldn't affect badges & such (ie. work out if they'd already been read).
  • Not show the UI until we've decrypted the history. This would mean we wouldn't need to worry about doing UI update work in Event.decrypted until the initial load had finished, plus we'd only show the UI once it was actually correct. It could make loading feel slower though if you're staring at a spinner or whatever loading UI we use.
  • Actually make the decryption faster: right now each decryption is separate and starts a separate indexeddb txn to get the keys. Batching them up together may save us some time.

@turt2live
Copy link
Member

some of the async store work from the room list rewrite might help the event loop speed a bit - theory is that it would be able to listen for Event.decrypted and not block the js-sdk with breadcrumbs calculations (as the breadcrumbs would be handling updates async and render the changes when ready).

@dbkr dbkr removed their assignment May 5, 2020
@ara4n ara4n changed the title All E2E history taking ~5 minutes to decrypt on launch on Desktop All E2E history taking ~15 minutes to decrypt on launch on Desktop May 18, 2020
@ara4n
Copy link
Member Author

ara4n commented May 18, 2020

This is now literally taking my top-end MBP 15 minutes before i can send E2E messages after launch in Riot Nightly.

I am really surprised the decryption is going this slowly - i can see the messages flickering into the timeline; it feels at a rate of about 10Hz or so. Some kind of batching (whether that's to avoid dispatches, or to avoid hammering indexddb) seems preferable. Or caching badge/unread state somehow so we don't have to decrypt msgs for rooms we're not in?

@dbkr dbkr self-assigned this May 19, 2020
@dbkr
Copy link
Member

dbkr commented May 26, 2020

I did some experiments on a branch to speed up the initial decryption of events: https://github.com/matrix-org/matrix-js-sdk/compare/dbkr/bulk_decryption_experiment_hack (see commit message for more details).

Testing on my account this roughly halves the time taken between launch and the last message in a given room being decrypted. It is fairly invasive though and makes a lot of assumptions, ie. it reads megolm sessions, unpickles them and uses them without pickling them and saving them back. In practice we always keep the earliest point in the megolm ratchet anyway but this is certainly a departure from the intended use.

@dbkr dbkr removed their assignment May 26, 2020
@Half-Shot
Copy link
Member

Half-Shot commented Jul 19, 2020

It's not clear from the issue if this blocks the client completely? I'm finding that after starting desktop, it renders a broken looking Element (userId rather than displayname in the profile spot, no rooms in room list, blank timeline etc) for about 5 minutes and then everything comes to life. I'm unsure if it's the same cause, but feels similar.

EDIT: Looks like I have two different causes of slowness. The first one (which locks up the client) is my incremental syncs taking a while after leaving the client closed for ~36 hours (which is #9198). The second one appears to be the blank timeline problems ala this issue. The attached bug report is still relevant for the second problem.

@ara4n
Copy link
Member Author

ara4n commented Jul 20, 2020

My problem tends not to be entirely blank timeline, but instead stale timeline which takes ~15 minutes to refresh; meanwhile messages slowly decrypt and trickle into the RoomList previews - I see them coming in at roughly 10 per second. Given I'm probably in ~200 encrypted rooms, each with ~50 messages of scrollback, this means 10,000 messages to decrypt, which at 10 per second means waiting up to 16 minutes until the rooms are usable.

Quick fixes could be to:

  • only load the most recent message of each room by default, pulling in the remainder on demand
  • only decrypt the most recent message of rooms
  • don't fire events for every decrypted message during startup

@turt2live
Copy link
Member

On a flip side to this: The full page spinner that takes ~5-10 minutes to go away (then falling into the encryption history taking forever) appears to be too many things listening to startup spam, leading to it slowing down.

@dbkr
Copy link
Member

dbkr commented Jul 24, 2020

Hmm - I may have been reproducing an entirely different bug then, since I was watching all of the e2e messages decrypt and so if it started up looking at an e2e room, that room was blank (apart from any state/unencrypted events). I don't understand how the room is stale - when are the stale messages from? Are they in the set of 200*50 messages that are decrypting?

On the solutions:

  • only load the most recent message of each room by default, pulling in the remainder on demand
    • What do you mean by 'load' here? They come from the sync accumulator which gives us one big blob, so we can't pick & choose what's loaded from disk.
  • only decrypt the most recent message of rooms
    • this sounds like the second of my ideas above I think
  • don't fire events for every decrypted message during startup
    • I think this is my 3rd one and is one of the optimisations I applied on my experimental branch. It didn't help a great deal for me.

@jryans jryans changed the title All E2E history taking ~15 minutes to decrypt on launch on Desktop All E2E history taking ~15 minutes to decrypt on launch Jul 24, 2020
@jryans
Copy link
Collaborator

jryans commented Jul 24, 2020

@ara4n confirms both desktop and browser versions are affected, so clarifying the title.

@ara4n
Copy link
Member Author

ara4n commented Sep 24, 2020

Something seems to have improved significantly on this at some point - circumstantially it's gone back to taking tens of seconds to load e2ee history (at least in the rooms i expect to see) after upgrading Element Nightly. Is anyone else (@turt2live, @dbkr) seeing it still? Or am I just getting lucky that the rooms I try to view on launch are the first to decrypt?

One thought: is there a way to to prioritise decrypting rooms as you view them (whilst keeping the big background decrypt chugging away)?

@turt2live
Copy link
Member

I'm still seeing it, and it's still because we decrypt all rooms on startup

@ara4n
Copy link
Member Author

ara4n commented Sep 26, 2020

So i just measured on launch, picking the arbitrary encrypted room I wanted to read, and it took 3 minutes 13 seconds from launch. As it happens my target room was one of the last ones to decrypt. So, the good news is that it's definitely not 15 minutes any more. But it's still getting on for 5 minutes on a large account.

@AmandineLP
Copy link
Contributor

+1

@ara4n
Copy link
Member Author

ara4n commented Jan 18, 2021

it's back up to 9 minutes on my account now on Nightly (and this is on a faster comp than in Sept).

@jryans
Copy link
Collaborator

jryans commented May 12, 2021

@ara4n @gsouquet Great to see that matrix-org/matrix-js-sdk#1684 is merged! Do we think this is sufficient to close this issue? We can of course still keep exploring more performance improvements anyway. I'm just wondering if we've made "enough" improvement to assuage the main user impact here.

@jryans
Copy link
Collaborator

jryans commented May 17, 2021

Sounds like this is working well and addresses the main performance impact here, so let's close this. We can track further performance tweaks separately.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

8 participants