This is a simple Python wrapper for RNNoise noise reduction . Only Python 3 is supported. The code is based on an issue from snakers4 from the RNNoise repository, for which special thanks to him.
RNNoise is a recurrent neural network with GRU cells designed for real-time audio denoising (even works on Raspberry Pi). The standard model is trained on 6.4GB of noisy audio recordings and is completely ready for use.
RNNoise is written in C and has methods for denoising a single 10 millisecond frame. The frame must have a sampling frequency of 48000Hz, mono, 16 bit.
RNNoise_Wrapper makes working with RNNoise easy:
- eliminates the need to extract frames from the audio recording yourself
- removes restrictions on the parameters of the processed wav audio recording
- hides all the nuances of working with a C library
- eliminates the need to manually compile RNNoise (Linux only)
- adds 2 new binaries with higher quality models that come with the package (Linux only)
RNNoise_Wrapper contains 2 new higher quality models (trained weights and compiled RNNoise binaries for Linux). A dataset from Microsoft DNS Challenge was used for training .
-
librnnoise_5h_ru_500k - trained on 5 hours of Russian speech (mixed with emotional speech and singing in English), obtained by a script from a repository with a dataset. Trained weights are in
train_logs/weights_5h_ru_500k.hdf5, compiled by RNNoise inrnnoise_wrapper/libs/librnnoise_5h_ru_500k.so.0.4.1(Linux only) -
librnnoise_5h_b_500k - trained in 5 hours of mixed speech in English, Russian, German, French, Italian, Spanish and Mandarin Chinese (mixed with emotional speech and singing in English). The dataset for each language was previously trimmed to the smallest of them (the least amount of data is for the Russian language, about 47 hours). The final training sample was obtained by a script from the repository with the dataset. Trained weights are in
train_logs/weights_5h_b_500k.hdf5, compiled by RNNoise inrnnoise_wrapper/libs/librnnoise_5h_b_500k.so.0.4.1(Linux only) -
librnnoise_default - standard model from the authors of RNNoise
Models librnnoise_5h_ru_500k and librnnoise_5h_b_500k have almost
the same noise reduction quality. librnnoise_5h_ru_500k is most
suitable for working with Russian speech , and
librnnoise_5h_b_500k is most suitable for mixed speech or speech
in a non-Russian language, it is more universal.
Comparative examples of the new models working with the standard ones
are available in
test_audio/comparative_tests .
This wrapper on the Intel i7-10510U CPU works 28-30 times faster than real time when denoising an entire audio recording, and 18-20 times faster than real time when working in streaming mode (i.e. processing audio fragments 20 ms long). In this case, only 1 core was used, the load on which was about 80-100%.
This wrapper has the following dependencies: pydub and numpy .
Installation using pip:
pip install git+https://github.com/Desklop/RNNoise_Wrapper
ATTENTION! Before using the wrapper, RNNoise must be compiled. If
you are using Linux or Mac , you can use the pre-compiled
RNNoise (on Ubuntu 19.10 64 bit OS) that comes with the package
(it also works on Google Colaboratory). If the standard binary doesn't
work for you, try compiling RNNoise manually. To do this you need to
first prepare your OS (assuming gcc is already installed):
sudo apt-get install autoconf libtool
And execute:
git clone https://github.com/Desklop/RNNoise_Wrapper
cd RNNoise_Wrapper
./compile_rnnoise.sh
After this, the librnnoise_default.so.0.4.1 file will appear in the
rnnoise_wrapper/libs folder . The path to this binary file must be
passed when creating an object of the RNNoise class from this wrapper
(see below for more details).
If you are using Windows , then you need to manually compile RNNoise . The above instructions will not work, use these links : one , two . After compilation, the path to the binary file must be passed when creating an object of the RNNoise class from this wrapper (see below for more details).
Reduce noise in audio recordings test.wav and saving the result as
test_denoised.wav :
from rnnoise_wrapper import RNNoise
denoiser = RNNoise()
audio = denoiser.read_wav( 'test.wav' )
denoised_audio = denoiser. filter (audio)
denoiser.write_wav( 'test_denoised.wav' , denoised_audio)
Noise reduction in streaming audio (buffer size is 20 milliseconds,
i.e. 2 frames) (the example uses stream simulation by processing the
test.wav audio recording in parts and saving the result as
test_denoised_stream.wav ):
audio = denoiser.read_wav('test.wav')
denoised_audio = b''
buffer_size_ms = 20
for i in range(buffer_size_ms, len(audio), buffer_size_ms):
denoised_audio += denoiser.filter(audio[i-buffer_size_ms:i].raw_data, sample_rate=audio.frame_rate)
if len(audio) % buffer_size_ms != 0:
denoised_audio += denoiser.filter(audio[len(audio)-(len(audio)%buffer_size_ms):].raw_data, sample_rate=audio.frame_rate)
denoiser.write_wav('test_denoised_stream.wav', denoised_audio, sample_rate=audio.frame_rate)
More examples of working with a wrapper can be found in
rnnoise_wrapper_functional_tests.py and
rnnoise_wrapper_comparative_test.py .
RNNoise class contains the following methods:
read_wav(): takes the name of a .wav audio recording, converts it to a supported format (16-bit, mono) and returns apydub.AudioSegment objectcontaining the audio recordingwrite_wav() : takes the .wav name of an audio recording, apydub.AudioSegmentobject (or a byte string of audio data without wav headers) and saves the audio recording under the passed namefilter(): takes apydub.AudioSegment object(or a byte string of audio data without wav headers), resamples it to 48000 Hz, splits the audio into frames (10 milliseconds long), denoises them, and returns apydub.AudioSegmentobject (or byte string without wav headers) preserving the original sampling ratefilter_frame(): clear only one frame (10 ms long, 16 bit, mono, 48000 Hz) from noise (accessing the RNNoise library binary directly)
Detailed information about the supported arguments and operation of each method can be found in the comments in the source code of those methods.
The default model is librnnoise_5h_b_500k . When creating an
object of the RNNoise class from a wrapper, using the
f_name_lib argument , you can specify a different model (RNNoise
binary):
librnnoise_5h_ru_500korlibrnnoise_defaultto use one of the bundled models- full/partial name/path to the compiled RNNoise binary file
<!-- -->
denoiser_def = RNNoise(f_name_lib = 'librnnoise_5h_ru_500k' )
denoiser_new = RNNoise(f_name_lib = 'path/to/librnnoise.so.0.4.1' )
Features of the main filter() method :
- For the highest quality work, you need an audio recording of at least 1 second in length, which contains both voice and noise (and noise should ideally be before and after the voice). Otherwise, the quality of noise reduction will be worse
- in case parts of one audio recording are transmitted (streaming
audio noise reduction), then their length must be at least
10ms and a multiple of10(since the RNNoise library only supports frames with a length of10ms). This option does not affect the quality of noise reduction. - if the last frame of the transmitted audio recording is less than
10ms (or a part of the audio less than10ms long is transmitted), then it is padded with zeros to the required size. Because of this, there may be a slight increase in the length of the final audio recording after noise reduction - The RNNoise library additionally returns for each frame the
probability of having a voice in this frame (as a number from
0to1) and using thevoice_prob_threshold argumentyou can filter frames by this value. If the probability is lower thanvoice_prob_threshold, then the frame will be removed from the audio recording
python3 -m rnnoise_wrapper.cli -i input.wav -o output.wav
or
rnnoise_wrapper -i input.wav -o output.wav
Where:
input.wav- name of the original .wav audio recordingoutput.wav- the name of the .wav audio file into which the audio recording will be saved after noise reduction
Instructions for training RNNoise using your own data can be found at
TRAINING.md .
If you have any questions or want to collaborate, you can write to me by email: vladsklim@gmail.com or on LinkedIn .