A data annotation pipeline to generate high-quality, large-scale speech datasets with machine pre-labeling and fully manual auditing.
-
Updated
Mar 25, 2023 - Forth
A data annotation pipeline to generate high-quality, large-scale speech datasets with machine pre-labeling and fully manual auditing.
The human speaks a language with an accent. A particular accent necessarily reflects a person's linguistic background. The model defines accent based audio record. The result of the model could be used to determine accents and help decrease accents to English learning students and improve accents by training.
This repo contains the reicpe to assemble a corpus for Foreign Accented English using the crowdsourced corpus Common Voice which contains (optional) accent labels.
Megathon 2022 code dump
Add a description, image, and links to the accent-detection topic page so that developers can more easily learn about it.
To associate your repository with the accent-detection topic, visit your repo's landing page and select "manage topics."