Skip to content
This repository has been archived by the owner on Apr 26, 2024. It is now read-only.

Consider storing events as pickled binary blobs in the DB #12959

Open
DMRobertson opened this issue Jun 5, 2022 · 3 comments
Open

Consider storing events as pickled binary blobs in the DB #12959

DMRobertson opened this issue Jun 5, 2022 · 3 comments
Labels
A-Database DB stuff like queries, migrations, new/remove columns, indexes, unexpected entries in the db T-Enhancement New features, changes in functionality, improvements in performance, or user-facing enhancements.

Comments

@DMRobertson
Copy link
Contributor

Folklore has it that Synapse spends a lot of time parsing json from text columns in the DB. It might be quicker to store and load a pickled version of the event. The event content is already an opaque blob AFAIK; I'm not sure we'd lose much by making an opaque binary blob rather than an opaque text blob. The first step would be to make some kind of vaguely representative benchmark and see how long depickling takes compared to parsing.

@jellykells
Copy link

Possibly related: would storing JSON objects as jsonb in postgres bring any performance improvements?

@DMRobertson
Copy link
Contributor Author

Possibly related: would storing JSON objects as jsonb in postgres bring any performance improvements?

That would allow us to make queries on the JSON within the DB, but AFAIK we'd still have to parse json within Python. I think jsonb might also have problems storing strings containing null code points, see e.g. #9341

@reivilibre
Copy link
Contributor

There are definitely much faster ways to do this than JSON, yes.
We've discussed writing the event type in Rust; some interesting benchmarks are available in https://github.com/djkoloski/rust_serialization_benchmark (although not all formats are equal; some aren't self-describing, etc).
Plus there are space advantages. I've wanted to be able to compress Synapse events for a long time; a Zstd dictionary or something would go a long way to reduce disk usage. For my homeserver it hasn't been enough of a problem to justify putting the effort in though :).

We could also use something like orjson to do the deserialisation as a more 'short-term' gain if this is a concern.

The real question is... is this just folklore, or is this actually worth looking into? I would naïvely have thought that some graphs would show us the time taken to deserialise events.

@reivilibre reivilibre added the T-Enhancement New features, changes in functionality, improvements in performance, or user-facing enhancements. label Jun 13, 2022
@MadLittleMods MadLittleMods added the A-Database DB stuff like queries, migrations, new/remove columns, indexes, unexpected entries in the db label Apr 25, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
A-Database DB stuff like queries, migrations, new/remove columns, indexes, unexpected entries in the db T-Enhancement New features, changes in functionality, improvements in performance, or user-facing enhancements.
Projects
None yet
Development

No branches or pull requests

5 participants