Can the FirehoseSubscribeReposClient run faster? #375

emilyhunt · 2024-09-01T14:13:14Z

emilyhunt
Sep 1, 2024

Since the recent wave of Bluesky signups, the firehose of posts from the network is topping out at around 1300 ops/second. This is a lot of commits! And a lot of infra is already struggling to keep up. I wanted to start a discussion about how we could future-proof firehose consumer code in case the network gets even larger.

As I understand it, the maximum amount of ops/second that the FirehoseSubscribeReposClient can keep up with is about 6000, after which it becomes CPU bound based on the benchmark from way back in v0.0.26 when libipld was added. Today, I moved the astronomy feeds to new hosting, and after some benchmarking, I can get a peak of 5100 ops/second in processing speed (which is close to that original benchmark.) Again, it seems the FirehoseSubscribeReposClient gets CPU bound around that point.

However, this begs the question of what happens if Bluesky continues to grow and gets ~10x larger. Currently, the library can only manage about a ~4x-greater size increase, which isn't so much. I know the Bluesky devs have talked about plans for things like filtered firehoses, but that doesn't help if someone is running a feed that uses all types of record.

Thought 1: I wonder whether or not the decoding steps of FirehoseSubscribeReposClient could be multithreaded, as I believe that's the main bottleneck right now.

Thought 2: Python's base multiprocessing library looks like it's also a bit of a limiting factor. When running at peak speed (5000 ops/second), about 20% of the compute time of the firehose subscription process in my feed is spent just sending things to the worker process with a multiprocessing.Pipe (line 63 in this file)*, due to Python's lack of a way to handle shared memory nicely. If the FirehoseSubscribeReposClient was faster, then this would become an even larger limit. Is there a way to multithread firehose code in a way that avoids copies? For instance, maybe if FirehoseSubscribeReposClient was multithreaded, maybe each thread would call on_message_handler itself, instead of putting it all into a single Pipe/Queue at the end and causing a different bottleneck instead.

Thanks for your help as ever, @MarshalX!

N.B.: my implementation is a bit different from the example one in this repo as AWS is a pain and multiprocessing.Queue doesn't work on it. However, I think multiprocessing.Queue still uses multiprocessing.Pipes internally to communicate with workers, so I (think?) it will have the same bottleneck as my single-Pipe solution has.

MarshalX · 2024-09-01T15:19:00Z

MarshalX
Sep 1, 2024
Maintainer

IMHO libipld is not a bottleneck here. Since the first version libipld received a bunch of performance improvements. So comparing to v0.0.26 (where the first version of libipld has been integrated) is not relevant to the last version. You can find benchmark results in comparison with libipld versions here: https://github.com/MarshalX/python-libipld/releases.

It is a shame to say but ~6k per second is my local cap. But it goes neither from pydantic or libipld! Today I updated the https://github.com/MarshalX/atproto/blob/main/examples/firehose/process_commits.py example. And I use it as a real benchmarking tool. You can start it locally by simply replacing start_cursor=None with 0, and comment the 72th line to remove printing noise about new posts.

To exclude pydantic and libipld from the benchmark I put continue right after getting a message from the queue

    while True:
        message = pool_queue.get()
        continue

And the cap is still ~6k! I did not investigate it deeper but invite you to start from here :)

My points to check:

my network speed is shit (I don't think so)
I do real shit inside FirehoseSubscribeReposClient while decoding frames (possible, but hard to believe XD)
websockets library does brrrr and we need to replace it with something faster. I mean it uses threads, but Python ones... so if we want to speed up sync we should choose C/Rust websockets library that will not be blocked by GIL.
bsky server limit backfilling speed

P.S. We should definitely try AsyncFirehoseSubscribeReposClient since it should be truly asynced
P.P.S. To benchmark pydantic + libipld processing we could save events to disk and run benchmarking without websockets and network depend at all. To see the real performance decoding capabilities

UPD. I found something looking good to be a candidate for the new WebSocket client https://github.com/cirospaciari/socketify.py (I hope it has clients)
UPD 2. Nope no client in socketify ;( new candidate: https://github.com/tarasko/picows

15 replies

MarshalX Sep 2, 2024
Maintainer

@emilyhunt do we have 25k ops with disabled frame decoding in the firehose client? pls say that it is enabled ;((

upd. i see that it is enable in all speedcore files

emilyhunt Sep 2, 2024
Author

This is the script I ran - it's almost the same as sub_repos_async.py in the examples, but without printing the message in on_message_handler and with a counter to calculate ops/second instead (which in practice should have almost 0 overhead.)

Update: believe it should be enabled, yeah. Why would it be bad otherwise? =o

MarshalX Sep 2, 2024
Maintainer

awesome! yeah if you did not modify the internals of atproto SDK it is enabled. it would be bad because it would do nothing except .recv(). which is rly nothing to benchmark expect websocket impl and bandwidth

MarshalX Sep 2, 2024
Maintainer

The recipe is pretty simple to scale up to 25k:

run async firehose client
save messages to the local queue
flush queue every N seconds/messages to Redis/Kafka
scale as hard as you want using the count of workers that consume Redis/Kafka

The worker consumes raw messages, parses to commit, decodes CAR of blocks, decides to add to the feed or not, and updates some DB which stores the feed`s posts. You can have as many workers as you want. Even on the different servers.

emilyhunt Sep 2, 2024
Author

This sounds good! I'll have a go at some of the optimization strategies you mentioned at some point in the next few days - thanks for all the help so far!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Can the FirehoseSubscribeReposClient run faster? #375

{{title}}

Replies: 1 comment 15 replies

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{title}}

Select a reply

Can the FirehoseSubscribeReposClient run faster? #375

emilyhunt Sep 1, 2024

Replies: 1 comment · 15 replies

MarshalX Sep 1, 2024 Maintainer

MarshalX Sep 2, 2024 Maintainer

emilyhunt Sep 2, 2024 Author

MarshalX Sep 2, 2024 Maintainer

MarshalX Sep 2, 2024 Maintainer

emilyhunt Sep 2, 2024 Author

emilyhunt
Sep 1, 2024

Replies: 1 comment 15 replies

MarshalX
Sep 1, 2024
Maintainer

MarshalX Sep 2, 2024
Maintainer

emilyhunt Sep 2, 2024
Author

MarshalX Sep 2, 2024
Maintainer

MarshalX Sep 2, 2024
Maintainer

emilyhunt Sep 2, 2024
Author