Automatic Speech Recognition (ASR) - DeepSpeech German

This is the project for the paper German End-to-end Speech Recognition based on DeepSpeech published at KONVENS 2019.

This project aims to develop a working Speech to Text module using Mozilla DeepSpeech, which can be used for any Audio processing pipeline. Mozillla DeepSpeech is a state-of-the-art open-source automatic speech recognition (ASR) toolkit. DeepSpeech is using a model trained by machine learning techniques based on Baidu's Deep Speech research paper. Project DeepSpeech uses Google's TensorFlow to make the implementation easier.

Important Links:

Paper: https://www.researchgate.net/publication/336532830_German_End-to-end_Speech_Recognition_based_on_DeepSpeech

DeepSpeech-API: https://github.com/AASHISHAG/DeepSpeech-API

This Readme is written for DeepSpeech v0.5.0. Refer to Mozillla DeepSpeech for latest updates.

$ cd ..
$ ##Tuda-De
$ git clone https://github.com/AASHISHAG/deepspeech-german.git
$ deepspeech-german/pre-processing/prepare_data.py --tuda $tuda_corpus_path  $export_path_data_tuda

$ ##Voxforge
$ deepspeech-german/pre-processing/run_to_utf_8.sh
$ python3 deepspeech-german/prepare_data.py --voxforge $voxforge_corpus_path $export_path_data_voxforge

$ ##Mozilla Common Voice
$ python3 DeepSpeech/bin/import_cv2.py --filter_alphabet deepspeech-german/data/alphabet.txt $export_path_data_mozilla

NOTE: Change the path accordingly in run_to_utf_8.sh

Language Model

We used KenLM toolkit to train a 3-gram language model. It is Language Model inference code by Kenneth Heafield

Installation

$ git clone https://github.com/kpu/kenlm.git
$ cd kenlm
$ mkdir -p build
$ cd build
$ cmake ..
$ make -j `nproc`

Corpus

We used an open-source German Speech Corpus released by University of Hamburg.

Download the data

$ wget http://ltdata1.informatik.uni-hamburg.de/kaldi_tuda_de/German_sentences_8mil_filtered_maryfied.txt.gz
$ gzip -d German_sentences_8mil_filtered_maryfied.txt.gz

Pre-process the data

$ deepspeech-german/pre-processing/prepare_vocab.py $text_corpus_path $exp_path/clean_vocab.txt

Build the Language Model

$kenlm/build/bin/lmplz --text $exp_path/clean_vocab.txt --arpa $exp_path/words.arpa --o 3
$kenlm/build/bin/build_binary -T -s $exp_path/words.arpa $exp_path/lm.binary

NOTE: use -S memoryuse_in_%, if malloc expection occurs

Example:

$kenlm/build/bin/lmplz --text $exp_path/clean_vocab.txt --arpa $exp_path/words.arpa --o 3 -S 50%

Trie

To build Trie for the above trained Language Model.

Requirements

General TensorFlow requirements
libsox
SWIG >= 3.0.12
node-pre-gyp

Build Native Client.

# The DeepSpeech tools are used to create the trie
$ git clone https://github.com/mozilla/tensorflow.git
$ cd tensorflow
$ git checkout origin/r1.13
$ ./configure
$ ln -s ../DeepSpeech/native_client ./
$ bazel build --config=monolithic -c opt --copt=-O3 --copt="-D_GLIBCXX_USE_CXX11_ABI=0" --copt=-fvisibility=hidden //native_client:libdeepspeech.so //native_client:generate_trie --config=cuda

NOTE:

Flags used to configure TensorFlow

Do you wish to build TensorFlow with XLA JIT support? [Y/n]: n
Do you wish to build TensorFlow with OpenCL SYCL support? [y/N]: N
Do you wish to build TensorFlow with ROCm support? [y/N]: N
Do you wish to build TensorFlow with CUDA support? [y/N]: y
Do you wish to build TensorFlow with TensorRT support? [y/N]: N
Do you want to use clang as CUDA compiler? [y/N]: N
Do you wish to build TensorFlow with MPI support? [y/N]: N
Would you like to interactively configure ./WORKSPACE for Android builds? [y/N]: N

Refer Mozilla's documentation for updates. We used Bazel Build label: 0.19.2 with DeepSpeechV0.5.0

Build Trie

$ DeepSpeech/native_client/generate_trie $path/alphabet.txt $path/lm.binary $exp_path/trie

Training

Define the path of the corpus and the hyperparameters in deepspeech-german/train_model.sh file.

$ nohup deepspeech-german/train_model.sh &

Hyper-Paramter Optimization

Define the path of the corpus and the hyperparameters in deepspeech-german/hyperparameter_optimization.sh file.

$ nohup deepspeech-german/hyperparameter_optimization.sh &

Results

Some results from our findings.

Mozilla 79.7%
Voxforge 72.1%
Tuda-De 26.8%
Tuda-De+Mozilla 57.3%
Tuda-De+Voxforge 15.1%
Tuda-De+Voxforge+Mozilla 21.5%

NOTE: Refer our paper for more information.

Transfer Learning

1. German to German

Specify the checkpoint directory in transfer_model.sh

$ nohup deepspeech-german/transfer_model.sh &

2. English to German

Change all umlauts characters ä,ö,ü,ß to ae, oe, ue, ss
Re-build Language Model, Trie and Corpus
Specify the checkpoint directory in transfer_model.sh

$ nohup deepspeech-german/transfer_model.sh &

NOTE: The checkpoints should be from the same version to perform Transfer Learning

Trained Models (Language Model, Trie, Speech Model and Checkpoints)

The DeepSpeech model can be directly re-trained on new dataset. The required dependencies are available at:

1. v0.5.0

This model is trained on DeepSpeech v0.5.0 with Mozilla_v3+Voxforge+Tuda-De (please refer the paper for more details) https://drive.google.com/drive/folders/1nG6xii2FP6PPqmcp4KtNVvUADXxEeakk?usp=sharing

https://drive.google.com/file/d/1VN1xPH0JQNKK6DiSVgyQ4STFyDY_rle3/view

2. v0.6.0

This model is trained on DeepSpeech v0.6.0 with Mozilla_v4+Voxforge+Tuda-De+MAILABS(454+57+184+233h=928h)

https://drive.google.com/drive/folders/1BKblYaSLnwwkvVOQTQ5roOeN0SuQm8qr?usp=sharing

3. v0.7.4

This model is trained on DeepSpeech v0.7.4 using pre-trained English model released by Mozilla English+Mozilla_v5+MAILABS+Tuda-De+Voxforge (1700+750+233+184+57h=2924h)

https://drive.google.com/drive/folders/1PFSIdmi4Ge8EB75cYh2nfYOXlCIgiMEL?usp=sharing

3. v0.9.0

This model is trained on DeepSpeech v0.9.0 using pre-trained English model released by Mozilla English+Mozilla_v5+SWC+MAILABS+Tuda-De+Voxforge (1700+750+248+233+184+57h=3172h)

Thanks to @koh-osug for providing Tflite model.

Link: https://drive.google.com/drive/folders/1L7ILB-TMmzL8IDYi_GW8YixAoYWjDMn1?usp=sharing

Why being SHY to STAR the repository, if you use the resources? :D

TODO LIST

Realse model for DeepSpeech-v0.6.0
Realse model for DeepSpeech-v0.7.4
Realse model for DeepSpeech-v0.9.0
Add datasets - SWC

Acknowledgments

Prof. Dr.-Ing. Torsten Zesch - Co-Author
Dipl.-Ling. Andrea Horbach
Matthias

References

If you use our findings/scripts in your academic work, please cite:

@inproceedings{agarwal-zesch-2019-german,
    author = "Aashish Agarwal and Torsten Zesch",
    title = "German End-to-end Speech Recognition based on DeepSpeech",
    booktitle = "Preliminary proceedings of the 15th Conference on Natural Language Processing (KONVENS 2019): Long Papers",
    year = "2019",
    address = "Erlangen, Germany",
    publisher = "German Society for Computational Linguistics \& Language Technology",
    pages = "111--119"
}

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

Automatic Speech Recognition (ASR) - DeepSpeech German

Important Links:

Contents

Requirements

Installing Python bindings

Installing Linux dependencies

Mozilla DeepSpeech

Speech Corpus

Language Model

NOTE: use -S memoryuse_in_%, if malloc expection occurs

Trie

Requirements

Training

Hyper-Paramter Optimization

Results

Transfer Learning

Trained Models (Language Model, Trie, Speech Model and Checkpoints)

TODO LIST

Acknowledgments

References

Files

README.md

Latest commit

History

README.md

File metadata and controls

Automatic Speech Recognition (ASR) - DeepSpeech German

Important Links:

Contents

Requirements

Installing Python bindings

Installing Linux dependencies

Mozilla DeepSpeech

Speech Corpus

Language Model

NOTE: use -S memoryuse_in_%, if malloc expection occurs

Trie

Requirements

Training

Hyper-Paramter Optimization

Results

Transfer Learning

Trained Models (Language Model, Trie, Speech Model and Checkpoints)

TODO LIST

Acknowledgments

References