-
-
Notifications
You must be signed in to change notification settings - Fork 2.1k
Using an external cache server #2123
Comments
Some thoughts on external caches which arose as a result of discussions around #3798. First of all, it's certainly not true that you could just put two synapse masters behind a load balancer and point them at the same database and cache. That said, there are certainly some advantages to an external cache, including:
However, there are also some potential problems with an external cache:
A likely scenario is therefore that we would end up with an in-memory cache as well as an external cache, which leaves us with twice as many problems in terms of cache invalidation. On the other hand, our biggest, and most latency-sensitive caches (eg, the event cache) are never actually invalidated (they are simple LRU caches). A plausible compromise might be to drop invalidation support from the in-memory caches, and for things that might care about invalidation, instead either go straight to the external cache / db, or use a short TTL. However, all of that is a non-trivial amount of work. |
Hey I wanna bump this issue because time spent in accessing memory is/was by my calculation the majority of time spent in the synapse event loop. Here's my logic: Every time a program has to hit main memory, the CPU essentially stalls for what is said to be around 300 nanoseconds. If there is another thread then the processor can schedule it (hyperthreading) but this is too short a timespan for the kernel to be able to swap the thread out so it appears as though the thread is using 100% CPU in this time. However, we're in a single thread event loop so there's no other thread that can be swapped in, the process is stalled. If you're accessing a dictionary then you're looking at Now lets say you were to shove this off on memcached. First advantage memcached has is threads, it can use many of them to access memory so when one stalls, it's not blocking the main event loop, but furthermore, when one memcached thread has 100 requests to process, it can trigger the CPU's prefetch instruction on all of them before it actually tries to access any of them. The prefetch instruction causes the CPU's memory controller to copy the relevant memory location into the processor cache so that when the memory access is requested, it's a 20ns L3 hit rather than a 300ns main memory access. The memcached and redis people have no doubt spent many sleepless nights juicing every possible cycle of performance out of the processor because that's their raison d'être. Now I want to also bust some myths about an external cache:
So now you might be thinking "yeah yeah, maybe there's a few percent lost here but it can't be serious", so I'll show you some data. This goes back to an old version, I believe 0.28, which is when I was admining synapse. This is a [CPU Flame Graph](http://www.brendangregg.com/FlameGraphs/cpuflamegraphs.html#OtherLanguages) which is a compilation of stack traces indicating where synapse spends most of it's time, the mouse is over the `intern_dict` function which is interning strings and thus needs to walk over a very large internal dictionary, easily millions or 10s of millions, and it's doing it hundreds of times because it's interning every result from the database. According to the flamegraph, **synapse spends 57% of its time inside of the cursor_to_dict function**. This might nolonger be true but I highly recommend the use of flamegraphs because my experience has been that whenever there is a noticable performance issue, it's always a very small piece of the code which is easily improved once one knows about it.As a final comment, moving state to memcached (or redis, I don't actually have any opinion on them) is a very good thing to do, because you will eventually reach a point where synapse becomes entirely stateless, and at that point there's really nothing preventing you from pointing multiple synapse instances at the same backend. There may be a few places where you'll want to take out a lock, but this is also doable. And once multiple synapse instances can be pointed at the same backend, you can reduce development effort by discontinuing the code for the worker model. |
@cjdelisle thanks for the detailed thoughts on this. Broadly I agree with you - I think there could be real benefits from using an external cache. A couple of things though:
|
On point 1 you're absolutely right, I'm not admining synapse at this moment so I don't have anything to collect new ones from. If you are collecting them on matrix.org then that's excellent, if you're not then please consider it, with On point 2, the number of lookups is not what's important, what's important is the length of the critical path. As per my example, doing a lookup to get the list of rooms for a user and then a lookup for each room that they're in might be hundreds of lookups but it has a critical path of 2. Of course you would know better than me what are the lengths of typical critical paths... |
we've collected plenty of flame graphs on matrix.org using pyflame, though I don't have one to hand at the moment, I'm afraid. I'd be interested if you have a mechanism for producing them with |
Ahh, I think I misremembered and in fact it was pyflame that I was using. Anyway if you're on top of the profiling game then I guess I'm just blowing smoke with now-ancient performance issues in which case I'm sorry for the bother. |
in #9198 we implemented the ability to share some information between workers using Redis as a cache. I'm not sure that this issue is "done" or not, however. |
I don't think it's done done. There is a lot more stuff we could usefully put in an external cache. |
I found that synapse uses a cache, implemented its own means (synapse/util/caches).
This creates some difficulties:
I believe that using an external cache server like memcached or reddis could solve these problems.
The advantages of an external cache:
The text was updated successfully, but these errors were encountered: