GitHub - EtienneAb3d/Whisper4LQR: Whisper for Low-Quality Recordings

Whisper for Low-Quality Recordings

Whisper4LQR is an adaptation of Faster-Whisper.

Whisper4LQR's purpose is to provide better recognition capabilities on low-quality recordings, even if it may take significantly more computation time than Faster-Whisper.

Transcription optimisation:

Pre-processing to extract, clean, and enhance voices with minimal noise.
VAD (Voice Activity Detection) used to segment sentences as effectively as possible.
Prompts applied to each segment to ensure a better vocabulary recognition across the entire recording.
Intensive generation of hypotheses selected based on word count and compression (redundancy) criteria.
Combined parameters, balanced for the whole process optimisation.

To have a look at Faster-Whisper modifications, search for #CBX in the repository.

Developed with the help of our partner Feedae

Installation

Check ffmpeg version >=4.4

ffmpeg -version

Output should be:
=================
ffmpeg version 4.4.3-0ubuntu1~20.04.sav2 Copyright (c) 2000-2022 the FFmpeg developers
[...]

Install latest:
===============
sudo add-apt-repository -y ppa:savoury1/ffmpeg4
sudo apt-get -qq install -y ffmpeg

Whisper4LQR installation

git clone https://github.com/EtienneAb3d/Whisper4LQR.git
cd Whisper4LQR
pip install -r requirements.txt

File pre-processing

from CbxPre import CbxPre

recording_paths = [...path list...]

cbxPre = CbxPre()

for recording_path in recording_paths:
    #Pre-processing (output a ".cbx.wav" file)
    cbxPre.process(recording_path=recording_path)

Transcription

from CbxSTT import CbxSTT

initial_prompt=(""
            # Insert your vocabulary as short expressions, using comma and points.
            # Max 224 tokens, counted while running.
            # Example:
            +" Run the software, save the file, a secure configuration."
            +" Linux, Mac OS X, Windows."
            # Keep these two last lines to ensure a small distance between 
            # the first future transcribed words and the above prompt words
            +" Please be patient."
            +" Beh Hein Ouais Heu Hmm Hum Ok..."
            )

recording_paths = [...path list...]

cbxSTT = CbxSTT(language="en")

for recording_path in recording_paths:
    #Transcribe (needs a ".cbx.wav" file)
    cbxSTT.process(initial_prompt=initial_prompt,recording_path=recording_path)
    #Compare with a previous transcription (".txt" vs ".cbx.txt")
    cbxSTT.align(recording_path=recording_path)

This tool is a demonstration of our know-how.
If you are interested in a commercial/industrial AI linguistic project, contact us:
https://cubaix.com

Name		Name	Last commit message	Last commit date
Latest commit History 234 Commits
.github/workflows		.github/workflows
benchmark		benchmark
docker		docker
faster_whisper		faster_whisper
tests		tests
.gitignore		.gitignore
CONTRIBUTING.md		CONTRIBUTING.md
CbxAligner.py		CbxAligner.py
CbxDemucsWrapper.py		CbxDemucsWrapper.py
CbxPre.py		CbxPre.py
CbxSTT.py		CbxSTT.py
CbxTokenizer.py		CbxTokenizer.py
CbxUtils.py		CbxUtils.py
LICENSE		LICENSE
MANIFEST.in		MANIFEST.in
README.md		README.md
requirements.conversion.txt		requirements.conversion.txt
requirements.txt		requirements.txt
setup.cfg		setup.cfg
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Whisper for Low-Quality Recordings

Installation

File pre-processing

Transcription

About

Packages

Languages

License

EtienneAb3d/Whisper4LQR

Folders and files

Latest commit

History

Repository files navigation

Whisper for Low-Quality Recordings

Installation

File pre-processing

Transcription

About

Topics

Resources

License

Stars

Watchers

Forks

Packages 0

Languages

Packages