Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Concurrency issue when using over Grpc/Http #266

Open
kaniosm opened this issue Nov 17, 2024 · 1 comment
Open

Concurrency issue when using over Grpc/Http #266

kaniosm opened this issue Nov 17, 2024 · 1 comment

Comments

@kaniosm
Copy link

kaniosm commented Nov 17, 2024

I tried version 1.7.2 by using a singleton service for the model...
When sending multiple requests simultaneously, the transcriptions are mixed.

I used a SemaphoreSlim to ensure that requests are handled sequentially, but compared to a similar service running in Python with ctrasnlate2, the times are way worse when having multiple concurrent requests, which is not the case for single requests.

Any advice?

@sandrohanea
Copy link
Owner

sandrohanea commented Nov 17, 2024

Hello @kaniosm ,

Not sure which service you put in the singleton, but I expect you just stored the WhisperProcessor.

That is wrong and WhisperProcessor is not supposed to be used by multiple threads concurrently.

However, you can load the model only once (without any additional SemaphoreSlim) if you store the
WhisperFactory or the WhisperProcessorBuilder as a singleton and create a new WhisperProcessor for every new inference request.

This way, you'll load the model only once (it is loaded on the WhisperFactory) and you can create a lightweight WhisperProcessor for every new (parallel) request.

Hope that makes sense.

Can you, please, confirm if that was the case (that you used the same WhisperProcessor for multiple requests)?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants