Skip to content
github-actions[bot] edited this page May 25, 2025 · 1 revision

TensorFlowASR ⚡

GitHub python tensorflow PyPI

Almost State-of-the-art Automatic Speech Recognition in Tensorflow 2

TensorFlowASR implements some automatic speech recognition architectures such as DeepSpeech2, Jasper, RNN Transducer, ContextNet, Conformer, etc. These models can be converted to TFLite to reduce memory and computation for deployment 😄

What's New?

Table of Contents

😋 Supported Models

Baselines

  • Transducer Models (End2end models using RNNT Loss for training, currently supported Conformer, ContextNet, Streaming Transducer)
  • CTCModel (End2end models using CTC Loss for training, currently supported DeepSpeech2, Jasper)

Publications

Installation

For training and testing, you should use git clone for installing necessary packages from other authors (ctc_decoders, rnnt_loss, etc.)

NOTE ONLY FOR APPLE SILICON: TensorFlowASR requires python >= 3.12

See the requirements.[extra].txt files for extra dependencies

git clone https://github.com/TensorSpeech/TensorFlowASR.git
cd TensorFlowASR
./setup.sh [apple|tpu|gpu] [dev]

Running in a container

docker-compose up -d

Training & Testing Tutorial

FYI: Keras builtin training uses infinite dataset, which avoids the potential last partial batch.

See examples for some predefined ASR models and results

Features Extraction

See features_extraction

Augmentations

See augmentations

TFLite Convertion

After converting to tflite, the tflite model is like a function that transforms directly from an audio signal to text and tokens

See tflite_convertion

Pretrained Models

See the results on each example folder, e.g. ./examples/models//transducer/conformer/results/sentencepiece/README.md

Corpus Sources

English

Name Source Hours
LibriSpeech LibriSpeech 970h
Common Voice https://commonvoice.mozilla.org 1932h

Vietnamese

Name Source Hours
Vivos https://ailab.hcmus.edu.vn/vivos 15h
InfoRe Technology 1 InfoRe1 (passwd: BroughtToYouByInfoRe) 25h
InfoRe Technology 2 (used in VLSP2019) InfoRe2 (passwd: BroughtToYouByInfoRe) 415h
VietBud500 https://huggingface.co/datasets/linhtran92/viet_bud500 500h

How to contribute

  1. Fork the project
  2. Install for development
  3. Create a branch
  4. Make a pull request to this repo

References & Credits

  1. NVIDIA OpenSeq2Seq Toolkit
  2. https://github.com/noahchalifour/warp-transducer
  3. Sequence Transduction with Recurrent Neural Network
  4. End-to-End Speech Processing Toolkit in PyTorch
  5. https://github.com/iankur/ContextNet

Contact

Huy Le Nguyen

Email: nlhuy.cs.16@gmail.com

Clone this wiki locally