-
Notifications
You must be signed in to change notification settings - Fork 48
Description
As promised, here's the thread I'm making for this.
RE: pre-processing:
In pywhispercpp/model.py
we have transcribe
and it can take a numpy ndarray. What I was thinking is, rather than load in audio, crush it to mono, set it to 16khz, why not pre-process all that and generate binary blob files that we can feed in that just contain the numpy ndarray?
It's not a big performance increase, but anything we can do outside of Python land ahead of time will give us a win. And I'm ok chasing micro-optimizations in Python land. I'm useless in C++ land.
Also let's put all logging behind a flag to disable it. If possible, lets add a flag to disable whisper.cpp
's incessant logging info to stderr. I know it has no impact on the transcription audio, but it should be controllable.
RE: copy.deepcopy
We need to drop @statimethod everywhere, and implement the deep copy methods on the C++ side. This is a minor request from me, it would just let us initialize the model in memory and create a deep copy that we can treat as a completely independent instance.
The other option is I can write a helper class using BytesIO to hold the model in memory and we can feed that to the Model class I guess? It would still be better than re-initializing the model to create a sterile instance.
RE: micro-optimizations
Under _get_segments
we have assert end <= n, f"{end} > {n}: `End` index must be less or equal than the total number of segments"
but I have to ask, is it even possible to end up in a situation where this assert would come true?
RE: features
Lets make the model usable in a context manager so we can do quick and dirty things like:
with Model("base.en", n_threads=6) as model:
for segments in model.transcribe("file.mp3")
for segment in segments:
print(segment)
Not really necessary, just gives a more pleasant way of interacting with the model class.