Skip to content
This repository has been archived by the owner on Apr 26, 2024. It is now read-only.

Add compression to Synapse in-memory data caches for decrease total RAM usage #8990

Closed
MurzNN opened this issue Dec 25, 2020 · 5 comments
Closed

Comments

@MurzNN
Copy link

MurzNN commented Dec 25, 2020

Description:

Synapse memory caches becomes very large on many setups (after manual tuning of cache levels to make Synapse more responsive), and eats a lot of RAM. But cached information have a very good compression ratio (because it contains an ASCII text mostly), so implementing some level of compression (even fastest) must significantly decrease whole Synapse memory usage. What do you think about this idea?

@MurzNN MurzNN changed the title Add compression to Synapse RAM caches data to decrease total RAM usage Add compression to Synapse in-memory caches data to decrease total RAM usage Dec 25, 2020
@MurzNN MurzNN changed the title Add compression to Synapse in-memory caches data to decrease total RAM usage Add compression to Synapse in-memory data caches for decrease total RAM usage Dec 25, 2020
@clokep
Copy link
Member

clokep commented Dec 28, 2020

But cached information have a very good compression ratio (because it contains an ASCII text mostly)

I'm not sure this is really true -- I think most of the caches in Synapse cache Python objects, not just raw strings. Which caches were you looking at in particular to come to this conclusion?

@MurzNN
Copy link
Author

MurzNN commented Jan 6, 2021

Python objects can be compressed using this trick https://stackoverflow.com/a/19500651 - as I understand, most of caches works like key-value storage, so we can simply store values as compressed strings, and instantly decompress on access. This will increase CPU load a bit, but decrease RAM usage.

Main candidates for compressing are *getEvent*, _get_joined_profile_from_event_id, _event_auth_cache (because for decrease cache misses I must increase them by 20-40 times from default values in my homeserver), but I can't measure the exact sizes of each cache in bytes, here is issue about this #8811 (comment)

Alternatively we can compress only large text data from objects (eg text body of message), but this make compressing task more complex.

@ptman
Copy link
Contributor

ptman commented Jan 7, 2021

I think if someone could prove that compressed cache improves performance it could be considered. But (un/)compressing can take time and caching is supposed to save time. Of course (un/)compressing can be much faster than I/O. But numbers will tell.

@MurzNN
Copy link
Author

MurzNN commented Jan 7, 2021

Yes, will be good to measure before implementing!

Average compression ratio of text data is usually more than 4:1, so with compression we will can keep in same RAM size about 4 times more items, than without it, this will can significantly decrease SQL queries rate.

For example, I manage public homeserver ru-matrix.org, that have 16gb of RAM limit. Increasing Synapse cache sizes is decrease cache misses from 1000+ rps to ~50 rps, but Synapse starts swapping because of filled RAM, and all becomes slow again :(

@clokep
Copy link
Member

clokep commented Jan 11, 2021

Note that we do some interning of the strings that get cached (see uses of intern_string and intern_dict), so there's some effort to re-use memory which could go away with compression.

Additionally pickling should generally not be done with any users provided data, so we would need to add some safe guards there. With the additional complexity this would add I think that this would not be worthwhile, unless numbers were provided showing a dramatic reduction in RAM usage.

I'm going to close this for now, but please experiment and let us know if this seems worthwhile!

@clokep clokep closed this as completed Jan 11, 2021
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants