Skip to content

Build n-gram language model for DeepSpeech2, and add inference interfaces insertable to CTC decoder. #2229

@xinghai-sun

Description

@xinghai-sun
  • Train an Engish language model (Kneser-Ney smoothed 5-gram, with pruning), with KenLM toolkit, on cleaned text from the Common Crawl Repository. For detailed requirements please refer to DS2 paper.
  • Add the training script into the DS2 trainer script.
  • Add inference interfaces for this n-gram language model, insertable to CTC-LM-beam-search for decoding.
  • Keep in mind that the interfaces should be compatible with both English (word-based LM) and Madarin (character-based LM).
  • Please work closely with the "Add CTC-LM-beam-search decoder" task.
  • Refer to the DS2 design doc and update it when necessary.

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions