Feature extraction of speech signal is the initial stage of any speech recognition system.
-
Updated
Sep 3, 2020 - Python
Feature extraction of speech signal is the initial stage of any speech recognition system.
ManaTTS is the largest open Persian speech dataset with 114+ hours of transcribed audio. Includes data collection pipeline and tools. Suitable for Persian text-to-speech models.
A python library to generate speech dataset from Youtube videos
EmoTa is an open-access Tamil Speech Emotion Recognition dataset with 936 utterances from 22 native speakers, covering five emotions (anger, happiness, sadness, fear, and neutrality). It supports emotion classification tasks and advances Tamil language processing.
[T-IFS] RNN-SM: Fast Steganalysis of VoIP Streams Using Recurrent Neural Network
A transcribed speech dataset in Wolof, Pulaar and Sereer, to support agriculture. Funded by Lacuna Fund.
Deepfake cross-lingual evaluation dataset (DECRO) is constructed to evaluate the influence of language differences on deepfake detection.
Construct a speech dataset and implement an algorithm for trigger word detection (sometimes also called keyword detection, or wakeword detection).
Download speech datasets (English and non-English) for Automatic Speech Recognition
Voice activity detection and speaker gender segmentation audiovisual corpus
A Large-Scale Open Persian Speech Dataset
A free licensed Persian TTS dataset including 6+ hours of audio-text pairs with subject
Numpy-librosa implementation of Speech dataset pipeline
Easy access to speech data across 142 African languages for training TTS and ASR models.
🇧🇮 The first large-scale, open-source speech and text dataset for Kirundi language. Building AI models for 12M+ Kirundi speakers through community collaboration. Includes ASR, TTS, and MT capabilities.
A full-stack webapp for collecting and managing speech datasets.
Persian spoken digit recognition
A robust forced alignment tool for low-resource languages using multiple ASR models and CER-based matching. Built for noisy data and imperfect transcripts.
A simple CNN-LSTM deep neural model using Tensorflow to classify emotions from a speech dataset
Add a description, image, and links to the speech-dataset topic page so that developers can more easily learn about it.
To associate your repository with the speech-dataset topic, visit your repo's landing page and select "manage topics."