Performance Improvement ideas / feature requests

As promised, here's the thread I'm making for this.

RE: pre-processing:

In ```pywhispercpp/model.py``` we have ```transcribe``` and it can take a numpy ndarray. What I was thinking is, rather than load in audio, crush it to mono, set it to 16khz, why not pre-process all that and generate binary blob files that we can feed in that just contain the numpy ndarray? 

It's not a *big* performance increase, but anything we can do outside of Python land ahead of time will give us a win. And I'm ok chasing micro-optimizations in Python land. I'm useless in C++ land.

Also let's put all logging behind a flag to disable it. If possible, lets add a flag to disable ```whisper.cpp```'s incessant logging info to stderr. I know it has no impact on the transcription audio, but it should be controllable.

RE: copy.deepcopy

We need to drop @statimethod everywhere, and implement the deep copy methods on the C++ side. This is a minor request from me, it would just let us initialize the model in memory and create a deep copy that we can treat as a completely independent instance.

The other option is I can write a helper class using BytesIO to hold the model in memory and we can feed that to the Model class I guess? It would still be better than re-initializing the model to create a sterile instance.

RE: micro-optimizations

Under ```_get_segments``` we have ```assert end <= n, f"{end} > {n}: `End` index must be less or equal than the total number of segments"``` but I have to ask, is it even possible to end up in a situation where this assert would come true?

RE: features

Lets make the model usable in a context manager so we can do quick and dirty things like:


```py

with Model("base.en", n_threads=6) as model:
    for segments in model.transcribe("file.mp3")
        for segment in segments:
            print(segment)
````

Not really necessary, just gives a more pleasant way of interacting with the model class.







Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Performance Improvement ideas / feature requests #49

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Performance Improvement ideas / feature requests #49

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions