-
-
Notifications
You must be signed in to change notification settings - Fork 2.1k
Experimental Federation Speedup #9702
Experimental Federation Speedup #9702
Conversation
…peedup # Conflicts: # synapse/federation/sender/__init__.py
(Note: Some CI tests are failing with |
fix error with async fix "frozenset has no discard error"
# Handle all events, skip if handle_event returns None | ||
handled_events = ( | ||
ret | ||
for ret in [await handle_event(event) for event in events] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
small warning: if we change this []
comprehension into a generator (()
), python will start throwing errors about async iterators, because to work with an async iterator/generator in a sub-comprehension, you have to prefix every for
with async for
.
I had split it before i realised it didn't solve the problem, and then i set this small generator to []
, which fixed it, but kept the split.
This'll be a small memory tradeoff for readability, as this'd create a list of len(events)
, but then the surrounding generator will filter out all None
values, which will have itself take up no space until it is consumed by the dict comprehension
so this is a len(events)
list of Optional[Tuple[Eventbase, Collection[str]]
, then comprehended down into Dict[EventBase, Collection[str]]
) | ||
|
||
# Send corresponding events to each destination queue | ||
self._distribute_pdus(event_to_dests) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
distribution happens after database access, because then a database-access-crash-loop will cause no events to be spammed to other servers, but a crash loop will also cause federation to stall, which is much more noticeable in the logs (as only the errors will be visible)
also, i think it's "safer" to not have a race condition where destination_room
values are set after pdus are actually distributed
anywho, if either of these functions throw, it wont reach update_federation_out_pos
, and it'll loop around with finally:
and the next poke from a database sequence
room_with_dest_stream_ordering = {} # type: Dict[Tuple[str, str], int] | ||
|
||
# List of events to send to each destination | ||
dest_to_events = {} # type: Dict[str, List[EventBase]] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is actually fairly cheap, the memory for EventBase
is already allocated, and it being an object, it's simply referenced.
Each list takes 64
bytes up-front*, and then 8
bytes with each pointer, this means that for the absolute worst-case scenario (100 events destined for #matrix
, with around 2k servers), it'll be about 1728000
bytes (2000 * (64 + (8*100))
), 1.7
Mb, excluding memory use for destination strings (which are immutably referenced) and dict
internals for mapping.
(sending 1 event to 2000 servers that way is 144000
bytes, 144kb, btw)
(* the 64
bytes is the up-front cost for a python list, python's objects are quite large that way. That number is for python 3.6, and in python 3.7 they're 72
bytes, but in python 3.8 they're 58
bytes. Lists use C pointers, which are 8
bytes each.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
a few initial thoughts here. It seems generally plausible.
Co-authored-by: Richard van der Hoff <1389908+richvdh@users.noreply.github.com>
|
Its fixed on develop, merged it into here to ensure that CI runs again |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is looking good, save a few outstanding nits.
Have you tried it in practice? Does it make much difference?
I haven't, but i also need to set up a generic "dev server" somewhere, and i may do so after/during this PR |
Co-authored-by: Richard van der Hoff <1389908+richvdh@users.noreply.github.com>
Co-authored-by: Richard van der Hoff <1389908+richvdh@users.noreply.github.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this looks good other than the below. I'm going to try running it on sw1v.org.
Co-authored-by: Richard van der Hoff <1389908+richvdh@users.noreply.github.com>
🤷♂️ |
This reverts commit 05e8c70.
This reverts commit 05e8c70.
This reverts commit 05e8c70.
We reverted this PR before cutting 1.33.0rc1 due to some performance regressions in sending PDUs over federation. The details are in #9863. |
Bit of a continuation from #9639's speedup intent and #9651's confusion.
This basically speeds up federation by "squeezing" each individual dual database call (to
destinations
anddestination_rooms
), which previously happened per every event, into one call for an entire batch (100 max).Pull Request Checklist
EventStore
toEventWorkerStore
.".code blocks
.Signed-off-by: Jonathan de Jong <jonathan@automatia.nl>