Skip to content

Add audio data provider and preprocessor for speech recognition datasets. #2226

Closed
PaddlePaddle/models
#55
@xinghai-sun

Description

@xinghai-sun
  • Prepare one or more public English speech recognition data sets (e.g. LibriSpeech), and respective baselines.
  • Convert all audio file formats to .wav format.
  • Add a file manifest generator for each dataset, and add a merger if there exist more than one datasets. Make this interface unified across different datasets.
  • Add spectrogram feature extractor, power normalizer etc.
  • Add transcription text parser (tokenization, dictionary generation etc).
  • Add batch data reader with SortaGrad.
  • Refer to the DS2 design doc and update it when necessary.
  • Please pull your codes and docs into PaddlePaddle/models.

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions