Concurrency issue when using over Grpc/Http #266

kaniosm · 2024-11-17T17:30:27Z

I tried version 1.7.2 by using a singleton service for the model...
When sending multiple requests simultaneously, the transcriptions are mixed.

I used a SemaphoreSlim to ensure that requests are handled sequentially, but compared to a similar service running in Python with ctrasnlate2, the times are way worse when having multiple concurrent requests, which is not the case for single requests.

Any advice?

sandrohanea · 2024-11-17T19:01:18Z

Hello @kaniosm ,

Not sure which service you put in the singleton, but I expect you just stored the WhisperProcessor.

That is wrong and WhisperProcessor is not supposed to be used by multiple threads concurrently.

However, you can load the model only once (without any additional SemaphoreSlim) if you store the
WhisperFactory or the WhisperProcessorBuilder as a singleton and create a new WhisperProcessor for every new inference request.

This way, you'll load the model only once (it is loaded on the WhisperFactory) and you can create a lightweight WhisperProcessor for every new (parallel) request.

Hope that makes sense.

Can you, please, confirm if that was the case (that you used the same WhisperProcessor for multiple requests)?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Concurrency issue when using over Grpc/Http #266

Concurrency issue when using over Grpc/Http #266

kaniosm commented Nov 17, 2024

sandrohanea commented Nov 17, 2024 •

edited

Loading

Concurrency issue when using over Grpc/Http #266

Concurrency issue when using over Grpc/Http #266

Comments

kaniosm commented Nov 17, 2024

sandrohanea commented Nov 17, 2024 • edited Loading

sandrohanea commented Nov 17, 2024 •

edited

Loading