Closed
Description
- Prepare one or more public English speech recognition data sets (e.g. LibriSpeech), and respective baselines.
- Convert all audio file formats to .wav format.
- Add a file manifest generator for each dataset, and add a merger if there exist more than one datasets. Make this interface unified across different datasets.
- Add spectrogram feature extractor, power normalizer etc.
- Add transcription text parser (tokenization, dictionary generation etc).
- Add batch data reader with SortaGrad.
- Refer to the DS2 design doc and update it when necessary.
- Please pull your codes and docs into PaddlePaddle/models.
Metadata
Metadata
Assignees
Labels
No labels