You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I tried version 1.7.2 by using a singleton service for the model...
When sending multiple requests simultaneously, the transcriptions are mixed.
I used a SemaphoreSlim to ensure that requests are handled sequentially, but compared to a similar service running in Python with ctrasnlate2, the times are way worse when having multiple concurrent requests, which is not the case for single requests.
Any advice?
The text was updated successfully, but these errors were encountered:
Not sure which service you put in the singleton, but I expect you just stored the WhisperProcessor.
That is wrong and WhisperProcessor is not supposed to be used by multiple threads concurrently.
However, you can load the model only once (without any additional SemaphoreSlim) if you store the WhisperFactory or the WhisperProcessorBuilder as a singleton and create a new WhisperProcessor for every new inference request.
This way, you'll load the model only once (it is loaded on the WhisperFactory) and you can create a lightweight WhisperProcessor for every new (parallel) request.
Hope that makes sense.
Can you, please, confirm if that was the case (that you used the same WhisperProcessor for multiple requests)?
I tried version 1.7.2 by using a singleton service for the model...
When sending multiple requests simultaneously, the transcriptions are mixed.
I used a SemaphoreSlim to ensure that requests are handled sequentially, but compared to a similar service running in Python with ctrasnlate2, the times are way worse when having multiple concurrent requests, which is not the case for single requests.
Any advice?
The text was updated successfully, but these errors were encountered: