CNN音频分割(音乐/人声/性别)工具集 https://github.com/ina-foss/inaSpeechSegmenter
最强CNN语音识别算法开源了:词错率5%,训练超快,Facebook出品 https://github.com/facebookresearch/wav2letter
轻量语音识别解码框架 https://github.com/robin1001/xdecoder
Espresso:快速端到端神经网络语音识别工具集 https://github.com/freewym/espresso
PyTorch音频处理工具/数据集 https://github.com/audeering/audtorch
AI音源分离:Facebook AI的Demucs项目帮机器像人一样听音乐 https://github.com/facebookresearch/demucs
Youka:基于spleeter音源分离的卡拉OK生成工具 https://github.com/youkaclub/youka-desktop
基于卷积网络的基音检测 https://0xfe.blogspot.com/2020/02/pitch-detection-with-convolutional.html?m=1
Implementation of "FastSpeech: Fast, Robust and Controllable Text to Speech" https://github.com/Deepest-Project/FastSpeech
'基于Kaldi的aidatatang_200zh的训练之葵花宝典' https://github.com/datatang-ailab/aidatatang_200zh/blob/master/README.zh.md
https://github.com/xcmyz/lightspeech
DeepSpectrum:基于预训练图像CNN的音频数据特征抽取工具包 https://github.com/DeepSpectrum/DeepSpectrum
Landmark音频指纹 https://github.com/dpwe/audfprint
基于Kaldi/Tensorflow的神经网络说话人识别/鉴别系统 https://github.com/mycrazycracy/tf-kaldi-speaker
【PyTorch语音识别框架】’patter - speech-to-text framework in PyTorch with initial support for the DeepSpeech2 architecture https://github.com/ryanleary/patter
(语音)说话人分割相关资源大列表 https://github.com/wq2012/awesome-diarization
Audio samples from ICML2019 "Almost Unsupervised Text to Speech and Automatic Speech Recognition" https://github.com/SpeechResearch/speechresearch.github.io
(PyTorch)Seq2Seq普通话Transformer语音识别 https://github.com/ZhengkunTian/Speech-Tranformer-Pytorch
Deep neural network based speech enhancement toolkit https://github.com/jtkim-kaist/Speech-enhancement
音乐音频标记预训练深度网络模型 https://github.com/jordipons/musiCNN
End-to-End Automatic Speech Recognition on PyTorch https://github.com/gentaiscool/end2end-asr-pytorch
(Pytorch)音源分离语音信号提取 https://github.com/AppleHolic/source_separation
Code and models for evaluating a state-of-the-art lip reading network https://github.com/afourast/deep_lip_reading
https://github.com/CorentinJ/Real-Time-Voice-Cloning
Program to benchmark various speech recognition APIs https://github.com/Franck-Dernoncourt/ASR_benchmark
基于Transformer的TTS语音合成模型 https://github.com/xcmyz/Transformer-TTS
DIY智能音箱(资源列表) https://github.com/voice-engine/make-a-smart-speaker/blob/master/zh.md
用深度学习实时克隆别人的声音 https://towardsdatascience.com/you-can-now-speak-using-someone-elses-voice-with-deep-learning-8be24368fa2b
用卷积网络从立体声音乐中分离乐器 https://towardsdatascience.com/audio-ai-isolating-instruments-from-stereo-music-using-convolutional-neural-networks-584ababf69de
用卷积神经网络从立体声音乐中分离人声 https://towardsdatascience.com/audio-ai-isolating-vocals-from-stereo-music-using-convolutional-neural-networks-210532383785
面向下一代交互设备的开源语音交互操作系统 https://github.com/yodaos-project/yodaos
笑声检测器 https://github.com/ideo/LaughDetection
'ASRT_SpeechRecognition - A Deep-Learning-Based Chinese Speech Recognition System 基于深度学习的中文语音识别系统' by nl8590687 https://github.com/nl8590687/ASRT_SpeechRecognition
用卷积神经网络从立体声音乐中分离人声 https://towardsdatascience.com/audio-ai-isolating-vocals-from-stereo-music-using-convolutional-neural-networks-210532383785
A Pytorch Implementation of "Neural Speech Synthesis with Transformer Network" https://github.com/soobinseo/Transformer-TTS
This is research-code for Synthesizing Obama: Learning Lip Sync from Audio. https://github.com/supasorn/synthesizing_obama_network_training
Voice Operated Character Animation https://voca.is.tue.mpg.de/en https://github.com/TimoBolkart/voca
Deezer 的(Tensorflow)音源分离库,可用命令行直接提取音乐中的人声、钢琴、鼓声等 https://github.com/deezer/spleeter
【开源语音分离/增强库】 https://github.com/speechLabBcCuny/onssen
Feature extractor for DL speech processing. https://github.com/bepierre/SpeechVGG
Transforming Spectrum and Prosody for Emotional Voice Conversion with Non-Parallel Training Data https://github.com/KunZhou9646/Nonparallel-emotional-VC
This is a PyTorch re-implementation of Speech-Transformer: A No-Recurrence Sequence-to-Sequence Model for Speech Recognition. https://github.com/foamliu/Speech-Transformer
【Athena:开源端到端语音识别引擎】 https://github.com/athena-team/athena
PREDICTING EXPRESSIVE SPEAKING STYLE FROM TEXT IN END-TO-END SPEECH SYNTHESIS https://github.com/Yangyangii/TPGST-Tacotron
PyTorch implementation of LF-MMI for End-to-end ASR https://github.com/YiwenShaoStephen/pychain
Audio samples from ICML2019 "Almost Unsupervised Text to Speech and Automatic Speech Recognition" https://github.com/RayeRen/unsuper_tts_asr
Efficiently Trainable Text-to-Speech System Based on Deep Convolutional Networks with Guided Attention. https://github.com/CSTR-Edinburgh/ophelia
Efficient neural speech synthesis https://github.com/MlWoo/LPCNet
Code for Vision-Infused Deep Audio Inpainting (ICCV 2019) https://github.com/Hangz-nju-cuhk/Vision-Infused-Audio-Inpainter-VIAI
deep learning based speech enhancement using keras or pytorch https://github.com/yongxuUSTC/sednn
Multi-voice singing voice synthesis https://github.com/MTG/WGANSing
【用涂鸦“唱歌”:将图像合成为声音】 https://github.com/jeonghopark/SketchSynth-Simple
Feature extractor for DL speech processing. https://github.com/bepierre/SpeechVGG
【面向语音识别的中文/英文发音辞典】’ https://github.com/speech-io/BigCiDian
https://github.com/someonefighting/tf-kaldi-speaker-master
https://github.com/facebookresearch/wav2letter/wiki/Inference-Framework
【GridSound:在线数字音频编辑器】 https://github.com/GridSound/daw
【Asteroid:基于PyTorch的音源分离工具集】 https://github.com/mpariente/ASSteroid
【MelGAN 超快音频合成】 https://github.com/descriptinc/melgan-neurips
用深度学习生成钢琴音乐 https://github.com/haryoa/note_music_generator
音频分析/音乐检索相关数据集大列表 https://www.audiocontentanalysis.org/data-sets/
用WaveNet让语音受损用户重拾原声(少样本自适应自然语音合成) https://deepmind.com/blog/article/Using-WaveNet-technology-to-reunite-speech-impaired-users-with-their-original-voices
(C++)音频文件波形图生成 https://github.com/bbc/audiowaveform
【时域卷积DeepFake变音检测】 https://github.com/dessa-public/fake-voice-detection
Athena:(Tensorflow)端到端自动语音识别引擎开源实现 https://github.com/didi/athena
SV2TTS https://github.com/CorentinJ/Real-Time-Voice-Cloning
【(音频)数字信号处理入门(Notebooks)】 https://github.com/earthspecies/from_zero_to_DSP
【at16k:Python语音识别库】’at16k - Trained models for automatic speech recognition (ASR). A library to quickly build applications that require speech to text conversion.'
https://github.com/at16k/at16k
一维卷积网络音频处理 https://github.com/KinWaiCheuk/nnAudio
CRF数据高效端到端语音识别工具集 https://github.com/thu-spmi/CAT
【音乐波形域音源分离】’Music Source Separation in the Waveform Domain - source separation in the waveform domain for music' https://github.com/facebookresearch/demucs
【Python实时音频频谱分析器】’Realtime_PyAudio_FFT - Realtime audio analysis in Python, using PyAudio and Numpy to extract and visualize FFT features from streaming audio.' https://github.com/tr1pzz/Realtime_PyAudio_FFT
Flowtron: an Autoregressive Flow-based Generative Network for Text-to-Speech Synthesis https://github.com/NVIDIA/flowtron
【文本语音合成(TTS)文献集】 https://github.com/erogol/TTS-papers
【CPU高性能实时文本语音合成系统】 https://ai.facebook.com/blog/a-highly-efficient-real-time-text-to-speech-system-deployed-on-cpus/
【TensorFlow 2 实现的文本语音合成】 https://github.com/as-ideas/TransformerTTS
【语音增强/语音分离/音源分离相关资源大列表】 https://github.com/Wenzhe-Liu/awesome-speech-enhancement
【AudioMass:全功能网页版音频/波形编辑工具 https://github.com/pkalogiros/AudioMass
【CTC端到端语音识别&语料库】’CTC-based Automatic Speech Recogniton - CTC end -to-end ASR for timit and 863 corpus.' https://github.com/Diamondfan/CTC_pytorch
【TensorflowTTS:Tensorflow 2实现的最先进实时语音合成】 https://github.com/dathudeptrai/TensorflowTTS
【audio:面向语音行为检测、二值化、说话人识别、自动语音识别、情感识别等任务的音频标注工具】 https://github.com/midas-research/audino
【深度学习语音端点检测】 https://github.com/filippogiruzzi/voice_activity_detection
【用Kaldi快速训练语音识别系统】 https://github.com/JRMeyer/easy-kaldi
从一首歌的mp3中分离得到人声、谱子、各种乐器等,转化成符号表示 https://github.com/deezer/spleeter
'aukit - 语音处理工具箱,包含语音降噪、音频格式转换、特征频谱生成等模块' https://github.com/KuangDD/aukit
【Keras示例:说话人识别】《Speaker Recognition》 https://keras.io/examples/audio/speaker_recognition_using_cnn/
基于RNN-Transducer的在线语音识别系统 https://github.com/theblackcat102/Online-Speech-Recognition
https://github.com/lturing/tacotronv2_wavernn_chinese
【miniaudio:C语言单文件音频回放/采集库】 https://github.com/dr-soft/miniaudio https://github.com/irmen/pyminiaudio
【TiramisuASR:用Tensorflow 2实现的语音识别引擎】 https://github.com/usimarit/TiramisuASR
A PyTorch implementation of dual-path RNNs (DPRNNs) based speech separation described in "Dual-path RNN: efficient long sequence modeling for time-domain single-channel speech separation". https://github.com/ShiZiqiang/dual-path-RNNs-DPRNNs-based-speech-separation
Zero-Shot Multi-Speaker Text-To-Speech with State-of-the-art Neural Speaker Embeddings https://github.com/nii-yamagishilab/multi-speaker-tacotron
Code repo for ICME 2020 paper "Style-Conditioned Music Generation". VAE model that allows style-conditioned music generation. https://github.com/daQuincy/DeepMusicvStyle
streaming attention networks for end-to-end automatic speech recognition https://github.com/HaoranMiao/streaming-attention
[InterSpeech 2020] "AutoSpeech: Neural Architecture Search for Speaker Recognition" https://github.com/TAMU-VITA/AutoSpeech
Pytorch implementation of sparse_image_warp and an example of GoogleBrain's SpecAugment is given: A Simple Data Augmentation Method for Automatic Speech Recognition https://github.com/bobchennan/sparse_image_warp_pytorch
A Generative Flow for Text-to-Speech via Monotonic Alignment Search https://github.com/jaywalnut310/glow-tts
DeCoAR (self-supervised contextual representations for speech recognition) https://github.com/awslabs/speech-representations
A pytorch implementation of the EATS: End-to-End Adversarial Text-to-Speech https://github.com/yanggeng1995/EATS
Companion repository for the paper "A Comparison of Metric Learning Loss Functions for End-to-End Speaker Verification" https://github.com/juanmc2005/SpeakerEmbeddingLossComparison
Melody extraction using joint detection and classification network https://github.com/keums/melodyExtraction_JDC
Implementation of the AlignTTS https://github.com/Deepest-Project/AlignTTS
An implementation of Microsoft's "FastSpeech 2: Fast and High-Quality End-to-End Text to Speech" https://github.com/ming024/FastSpeech2
It's an naive implementation of Multi-band MelGAN: Faster Waveform Generation for High-Quality Text-to-Speech. https://github.com/AppleHolic/multiband_melgan
PyTorch Implementation of FastSpeech 2 : Fast and High-Quality End-to-End Text to Speech https://github.com/rishikksh20/FastSpeech2
GELP: GAN-Excited Linear Prediction https://github.com/ljuvela/GELP
https://github.com/lihanghang/CASR-DEMO
Unofficial PyTorch implementation of Multi-Band MelGAN paper https://github.com/rishikksh20/melgan
https://github.com/chenmingxiang110/Chinese-automatic-speech-recognition
https://github.com/nobody132/masr
Plover:开源跨平台速记引擎,每分钟可录入200+单词 http://www.openstenoproject.org/plover/
【“Python机器学习声源分离”源码】 https://github.com/masahitotogami/python_source_separation
【可移植的C语言声学指纹库】 https://github.com/JorenSix/Olaf
SpeedySpeech:师生网络高质量实时语音合成系统 https://github.com/janvainer/speedyspeech
音频/语音预训练模型集 https://github.com/balavenkatesh3322/audio-pretrained-model
Piano transcription:钢琴曲MIDI文件转写工具 https://arxiv.org/abs/2010.01815 https://github.com/bytedance/piano_transcription
CorentinJ/Real-Time-Voice-Cloning https://github.com/KuangDD/zhrtvc
工业级语音识别文献集(Streaming ASR / Non-autoregressive ASR / WFST based ASR ...) https://github.com/xingchensong/speech-recognition-papers
pyttsx3:Python离线语音合成库 https://github.com/nateshmbhat/pyttsx3
TensorflowASR:Tensorflow2实现的最先进语音识别 https://github.com/Z-yq/TensorflowASR
Cornell鸟鸣识别比赛第二名方案 https://github.com/vlomme/Birdcall-Identification-competition
micmon:从原始音频流分割创建音频数据集并训练声音检测模型的Python库 https://github.com/BlackLight/micmon
https://github.com/iceychris/LibreASR
Voicenet:语音和音频的综合Python处理库 https://github.com/Robofied/Voicenet
musicpy:音乐编程语言,用简洁的语法通过乐理逻辑写出优美音乐 https://github.com/Rainbow-Dreamer/musicpy
SOVA ASR:基于Wav2Letter架构的快速语音识别API https://github.com/sovaai/sova-asr
https://github.com/Jackiexiao/zhtts
SeeWav: 音频波形可视化包 https://github.com/adefossez/seewav
神经网络语音分离必读文献列表 https://github.com/JusperLee/Speech-Separation-Paper-Tutorial
PIKA: 基于Pytorch和(Py)Kaldi的轻量语音处理工具包 https://github.com/tencent-ailab/pika
https://github.com/espressif/esp-skainet
https://github.com/tulasiram58827/TTS_TFLite
Elpis (Accelerated Transcription):开发中的语音识别模型创建工具 https://github.com/CoEDL/elpis
MusicNet:带标注的古典音乐数据集(330+),标注了每个音符的精确时间,演奏每个音符的乐器,以及这些音符在乐曲韵律结构中的位置 https://homes.cs.washington.edu/~thickstn/musicnet.html
AI音乐生成 https://alxmamaev.medium.com/generating-music-with-ai-or-transformers-go-brrrr-3a3ac5a04126
Real Time Speech Enhancement in the Waveform Domain (Interspeech 2020)We provide a PyTorch implementation of the paper Real Time Speech Enhancement in the Waveform Domain. https://github.com/facebookresearch/denoiser
Implementation of MelNet in PyTorch to generate high-fidelity audio samples https://github.com/jgarciapueyo/MelNet-SpeechGeneration
PPSpeech: Phrase based Parallel End-to-End TTS System https://github.com/rishikksh20/PPSpeech
Implementation of Phase-aware speech enhancement with deep complex U-Net https://github.com/mhlevgen/DCUNetTorchSound
Tensorflow 2.0 implementation of the paper: A Fully Convolutional Neural Network for Speech Enhancement https://github.com/daitan-innovation/cnn-audio-denoiser
https://github.com/jackaduma/CycleGAN-VC2
HiFi-GAN: Generative Adversarial Networks for Efficient and High Fidelity Speech Synthesis https://github.com/jik876/hifi-gan
Real-Time High-Fidelity Speech Synthesis without GPU https://github.com/BogiHsu/WG-WaveNet
Official PyTorch implementation of Speaker Conditional WaveRNN https://github.com/dipjyoti92/SC-WaveRNN
Pytorch implementation of "Efficienttts: an efficient and high-quality text-to-speech architecture" https://github.com/liusongxiang/efficient_tts
HiFi-GAN: Generative Adversarial Networks for Efficient and High Fidelity Speech Synthesis https://github.com/rishikksh20/HiFi-GAN
An unofficial implementation of the paper "One-shot Voice Conversion by Separating Speaker and Content Representations with Instance Normalization". https://github.com/cyhuang-tw/AdaIN-VC
TFGAN: Time and Frequency Domain Based Generative Adversarial Network for High-fidelity Speech Synthesis https://github.com/rishikksh20/TFGAN
Efficient neural networks for analog audio effect modeling https://github.com/csteinmetz1/micro-tcn
End-to-End Multi-Channel Transformer for Speech Recognition https://arxiv.org/abs/2102.03951
Hugging Face的Transformers v4.3.0最新发布,hub模型库增加Facebook的Wav2Vec2自动语音识别模型 https://huggingface.co/facebook/wav2vec2-base-960h
LightSpeech: Lightweight and Fast Text to Speech with Neural Architecture Search https://arxiv.org/abs/2102.04040
End-to-end Audio-visual Speech Recognition with Conformers https://arxiv.org/abs/2102.06657
https://github.com/tulasiram58827/TTS_TFLite
Memory-efficient Speech Recognition on Smart Devices https://arxiv.org/abs/2102.11531
自监督学习语音识别,wav2vec 2.0框架封装版 https://github.com/mailong25/self-supervised-speech-recognition
音频自动描述相关资源列表 https://github.com/audio-captioning/audio-captioning-resources
PyTorch implementation of "Conformer: Convolution-augmented Transformer for Speech Recognition" (INTERSPEECH 2020)
https://github.com/sooftware/conformer
OpenASR:基于Pytorch的端到端语音识别系统 https://github.com/by2101/OpenASR
实时音频频谱生成(网页版) https://borismus.github.io/spectrogram/
【WavEncoder:PyTorch后端的原始音频编码库】 https://github.com/shangeth/wavencoder
【Picovoice:用于大规模语音产品构建的端到端平台】 github.com/Picovoice/picovoice
audlib:以深度学习为重点的Python语音信号处理库 https://github.com/raymondxyy/pyaudlib
Auto-Editor:命令行视频/音频自动编辑工具,自动切除静默部分 https://github.com/WyattBlue/auto-editor
The SpeechBrain Toolkit:PyTorch开源一体化语音工具包,可用来轻松开发最先进的语音系统,包括语音识别、讲话者识别、语音增强、多麦克风信号处理等 github.com/speechbrain/speechbrain
STT:用于训练和部署语音到文本模型的开源深度学习工具包 github.com/coqui-ai/STT
GigaSpeech:用于语音识别的大型现代数据集 github.com/SpeechColab/GigaSpeech
Desed dataset:家庭环境声音事件检测数据集与工具 github.com/turpaultn/DESED
github.com/binzhouchn/masr
端到端语音处理工具集 github.com/espnet/espnet
Spleeter语声分离Demo github.com/deezer/spleeter
基于Tacotron 2 & Waveglow的多说话人情感文本语音合成(TTS) github.com/ide8/tacotron2
《AutoVC: Zero-Shot Voice Style Transfer with Only Autoencoder Loss》(2019) github.com/cyhuang-tw/AutoVC
教程(Colab):从头开始语音识别 github.com/speechbrain/speechbrain/
Vosk-Browser:运行在浏览器里的语音识别库(基于WebAssembly) github.com/ccoreilly/vosk-browser ccoreilly.github.io/vosk-browser/
The SpeechBrain Toolkit:PyTorch开源一体化语音工具包,可用来轻松开发最先进的语音系统,包括语音识别、讲话者识别、语音增强、多麦克风信号处理等 speechbrain.github.io/
Speech Algorithms:语音算法集 github.com/Ryuk17/SpeechAlgorithms
torchsynth:面向音频机器学习研究的指出GPU的超快模块音频合成器,在GPU上合成音频的速度比实时(714mhz)快16200倍 github.com/torchsynth/torchsynth
MevonAI:语音情感识别 github.com/SuyashMore/MevonAI-Speech-Emotion-Recognition
github.com/ZDisket/TensorVox
Music Demixing Challenge - Starter Kit:音乐音源分离挑战入门工具包 github.com/AIcrowd/music-demixing-challenge-starter-kit
LEAF:轻量嵌入式音频框架,用于音频合成和处理的C语言库 github.com/spiricom/LEAF
Word2Wave:基于WaveGAN和COALA的文本音频生成框架 github.com/ilaria-manco/word2wave
基于深度学习的音-视语音增强和分离相关资源集 github.com/danmic/av-se
LAS_Mandarin_PyTorch:端到端的中文语音识别 github.com/jackaduma/LAS_Mandarin_PyTorch
基于PaddlePaddle实现的中文语音识别 github.com/yeyupiaoling/PaddlePaddle-DeepSpeech
PyTorch实现的DNN音源分离 github.com/tky823/DNN-based_source_separation
可在线演示(Colab)的企业级预训练多语言语音识别(STT)模型 https://pytorch.org/hub/snakers4_silero-models_stt/
openspeech:用PyTorch-Lightning和Hydra实现的的端到端语音识别开源工具包 github.com/sooftware/openspeech
Audio Augmentations:PyTorch音频增强库 github.com/Spijkervet/torchaudio-augmentations
FRILL:用TensorFlow-Lite实现设备端语音表示 https://arxiv.org/abs/2011.04609 https://ai.googleblog.com/2021/06/frill-on-device-speech-representations.html
DeepPhonemizer:基于Transformer模型的字音转换库,可用于高精度和高效率的文本语音转换生产系统 github.com/as-ideas/DeepPhonemizer
github.com/rishikksh20/multiband-hifigan
《VITS: Conditional Variational Autoencoder with Adversarial Learning for End-to-End Text-to-Speech》(2021) github.com/jaywalnut310/vits
基于wav2vec2的自动语音识别 github.com/oliverguhr/wav2vec2-live
CoreAudioML:音频效果处理机器学习库 github.com/Alec-Wright/CoreAudioML
SoundPy:面向研究的语音/声音Python开发包 github.com/a-n-rose/Python-Sound-Tool
kaldifeat:PyTorch的Kaldi兼容特征抽取,支持CUDA & autograd github.com/csukuangfj/kaldifeat
ttskit - 语音合成工具箱,Text To Speech Toolkit,多种音色可供选择的语音合成工具。 github.com/KuangDD/ttskit
Chinese mandarin text to speech (MTTS) 中文 (普通话) 语音 合成 , by fastspeech 2 , implemented in pytorch, using waveglow as vocoder, with biaobei and aishell3 datasets #TODO github.com/ranchlai/mandarin-tts
Common Voice Dataset:开源、多语言语音数据集 github.com/common-voice/cv-dataset
语音合成技术百花齐放,一篇综述带你全面梳理 https://weibo.com/ttarticle/p/show?id=2309404668701587932163
github.com/rhasspy/larynx
github.com/babysor/Realtime-Voice-Clone-Chinese
Speech Emotion Recognition:用 LSTM、CNN、SVM、MLP 进行语音情感识别,Keras 实现 github.com/Renovamen/Speech-Emotion-Recognition
ParallelTTS:快速语音合成模型,适用于英语、普通话/中文、日语、韩语、俄语和藏语(当前已测试) github.com/atomicoo/ParallelTTS
Neural HMMs are all you need (for high-quality attention-free TTS) https://arxiv.org/abs/2108.13320
praudio:面向深度学习音频应用的音频预处理框架 github.com/musikalkemist/praudio
为言语障碍人士合成自然语音的PnG NAT 模型 https://ai.googleblog.com/2021/08/recreating-natural-voices-for-people.html
大规模多样无序语音数据集的个性化语音识别模型 https://ai.googleblog.com/2021/09/personalized-asr-models-from-large-and.html
功能齐全的语音工具包:SpeechBrain,提供语音识别(支持普通话)、语音增强、语音处理、多麦克风信号处理、模块化定制等功能。 此外,该工具还提供了颇为齐全的教程文档,以便帮助开发者更好的入门语音识别技术。 github.com/speechbrain/speechbrain/
Music Demixing Challenge 2021音源分离比赛第四名方案 github.com/yoyololicon/music-demixing-challenge-ismir-2021-entry
Keras实例:CTC自动语音识别 https://keras.io/examples/audio/ctc_asr/
mdx-tutorial:音源分离开源工具教程 github.com/kuielab/mdx-tutorial
github.com/wenet-e2e/wenet-kws
Wav2Vec2 STT Python:基于Wav2Vec2.0的语音识别库 github.com/daanzu/wav2vec2_stt_python
github.com/yeyupiaoling/PPASR
music2video:基于Wav2CLIP和VQGAN-CLIP根据音乐自动生成视频 github.com/joeljang/music2video
compound-word-transformer-tensorflow - Tensorflow 实现的AI作曲’ compound-word-transformer-tensorflow
MockingBird - AI拟声: 5秒内克隆您的声音并生成任意语音内容 github.com/babysor/MockingBird
Wenet STT Python:基于WeNet的Python语音识别库 github.com/daanzu/wenet_stt_python
github.com/petewarden/spchcat
Open Audio Search:开源音频搜索引擎(基于语音识别) github.com/openaudiosearch/openaudiosearch
语音识别资源集锦 https://wiki.nikitavoloboev.xyz/nlp/speech-recognition
HuggingSound:基于HuggingFace工具包的语音相关任务工具包 github.com/jonatasgrosman/huggingsound
Muskit: 聚焦于端到端歌唱合成基准测试的开源音乐处理工具包,用PyTorch作为深度学习引擎,并遵循 ESPnet 和 Kaldi 风格的数据处理,为各种音乐处理实验提供完整设置 github.com/SJTMusicTeam/Muskits
基于很少样本的神经乐器克隆 https://erlj.notion.site/Neural-Instrument-Cloning-from-very-few-samples-2cf41d8b630842ee8c7eb55036a1bfd6
PaddleSpeech:基于飞桨PaddlePaddle的语音方向的开源模型库,用于语音和音频中的各种关键任务的开发,包含大量基于深度学习前沿和有影响力的模型 github.com/PaddlePaddle/PaddleSpeech
WeNet:面向工业落地应用的语音识别工具包,提供了从语音识别模型的训练到部署的一条龙服务 github.com/wenet-e2e/wenet
IMS-Toucan:支持最新模型的语音合成工具包 github.com/DigitalPhonetics/IMS-Toucan
NeuralSpeech:微软亚研院的研究项目,专注于基于神经网络的语音处理,包括自动语音识别(ASR)、文本到语音转换(TTS)等 github.com/microsoft/NeuralSpeech
Tensorflow 2实现的最先进实时语音合成 github.com/TensorSpeech/TensorflowTTS
Awesome Keyword Spotting:语音关键字检测(唤醒词检测)论文列表 github.com/zycv/awesome-keyword-spotting
ocotillo - A fast, accurate and super simple speech recognition model - Performant and accurate speech recognition built on Pytorch github.com/neonbjb/ocotillo
libspecbleach:C语言音频降噪库 github.com/lucianodato/libspecbleach
NaturalSpeech: End-to-End Text to Speech Synthesis with Human-Level Quality 提出了完全端到端文本波形生成系统NaturalSpeech,首次在LJSpeech数据集上实现了人类水平的质量。 https://arxiv.org/abs/2205.04421
sherpa:支持流式和非流式识别的Python语音识别服务框架 github.com/k2-fsa/sherpa
audio-preview:VS Code的wav音频文件预览与播放扩展 github.com/sukumo28/vscode-audio-preview
【WeNet:面向工业落地应用的语音识别工具包,提供了从语音识别模型的训练到部署的一条龙服务】’WeNet - Production First and Production Ready End-to-End Speech Recognition Toolkit' by WeNet Open Source Community GitHub: github.com/wenet-e2e/wenet paper:《WeNet: Production First and Production Ready End-to-End Speech Recognition Toolkit》
【WeTTS:产品级端到端文本语音合成工具包】’WeTTS - Production First and Production Ready End-to-End Text-to-Speech Toolkit' by WeNet Open Source Community GitHub: github.com/wenet-e2e/wetts
【音频编码教程资料】’Audio Coding Video Tutorials and Python Notebooks - Audio Coding Notebooks and Tutorials' by Guitars.AI GitHub: github.com/GuitarsAI/AudioCodingTutorials
'wenet_trt8 - 用TRT8部署开源语音识别工具包WeNet,为语音识别模型在TRT8上部署提供参考方案’ by huismiling GitHub: github.com/huismiling/wenet_trt8
'FastASR - 基于PaddleSpeech所使用的conformer模型,使用C++的高效实现模型推理,在树莓派4B等ARM平台运行也可流畅运行' by chenkui164 GitHub: github.com/chenkui164/FastASR
【Open Text to Speech Server:开源多语言文本语音合成服务器】’Open Text to Speech Server - Open Text to Speech Server' by Michael Hansen GitHub: github.com/synesthesiam/opentts
GitHub 上的开源技术教程:《语音增强初探》,主要讲解语音增强技术相关的技术解析,以及模型应用。 GitHub:github.com/WenzheLiu-Speech/The-guidebook-of-speech-enhancement
【StemRoller:免费的音源分离工具,可从从歌曲中分离出人声、鼓声、贝斯和其他乐器声部】’StemRoller - Isolate vocals, drums, bass, and other instrumental stems from any song' by StemRoller GitHub: github.com/stemrollerapp/stemroller
【口语语言识别相关文献资源列表】’Awesome-Spoken-Language-Identification - An awesome spoken LID repository. (Working in progress' by HexinHexin GitHub: github.com/Lhx94As/Awesome-Spoken-Language-Identification
【KAN-TTS:训练自己的TTS语音合成模型】’KAN-TTS - With KAN-TTS you can train your own TTS model from zero to hero’ by Alibaba Research GitHub: github.com/AlibabaResearch/KAN-TTS
【语音合成、文字转语音(TTS)、歌唱声音合成(SVS)、声音转换(VC)、歌唱声音转换(SVC)等相关论文项目列表】’Awesome Singing Voice Synthesis and Singing Voice Conversion - A paper and project list about the cutting edge Speech Synthesis, Text-to-Speech (TTS), Singing Voice Synthesis (SVS), Voice Conversion (VC), Singing Voice Conversion (SVC), and related interesting works.' by GYChen GitHub: github.com/guan-yuan/Awesome-Singing-Voice-Synthesis-and-Singing-Voice-Conversion
'VITS+BigVGAN+SpanPSP 中文TTS - 基于PyTorch的VITS-BigVGAN的tts中文模型,加入韵律预测模型' by Zz-ww GitHub: github.com/Zz-ww/VITS-BigVGAN-SpanPSP-Chinese
【Open Speech Corpora:面向ASR, TTS和其他语音技术的开放语音数据集列表】’Open Speech Corpora - A list of accessible speech corpora for ASR, TTS, and other Speech Technologies' by coqui GitHub: github.com/coqui-ai/open-speech-corpora
【(Interspeech 2022 Tutorial)神经语音合成】'Neural Speech Synthesis' by Xu Tan, Hung-yi Lee GitHub: github.com/tts-tutorial/interspeech2022
'Streamlit Custom Component that enables recording audio from the client's mic in apps that are deployed to the web. (via browser Media-API, REACT-based)' by Stefan Rummer GitHub: github.com/stefanrmmr/streamlit_audio_recorder
'sherpa-ncnn - Real-time speech recognition using next-gen Kaldi with ncnn' by k2-fsa GitHub: github.com/k2-fsa/sherpa-ncnn
'MASR流式与非流式语音识别项目 - Pytorch实现的流式与非流式的自动语音识别框架,同时兼容在线和离线识别,目前支持DeepSpeech2模型,支持多种数据增强方法' by yeyupiaoling GitHub: github.com/yeyupiaoling/MASR
'streamlit-stt-app - Real time web based Speech-to-Text app with Streamlit' by Yuichiro Tachibana (Tsuchiya) GitHub: github.com/whitphx/streamlit-stt-app
【Whisper:OpenAI开源的通用语音识别模型】’Whisper - a general-purpose speech recognition model’ GitHub: github.com/openai/whisper
【用youtube-dl+OpenAI's Whisper为Youtube视频自动生成字幕】’Automatic YouTube subtitle generation - Using OpenAI's Whisper to automatically generate YouTube subtitles' by Miguel Piedrafita GitHub: github.com/m1guelpf/yt-whisper
基于 Tensorflow 实现的音轨分离工具。可以用于提取音乐中的人声、鼓、钢琴等乐器 https://github.com/deezer/spleeter
基于深度学习的中文语音识别系统 https://github.com/nl8590687/ASRT_SpeechRecognition
【OpenAI Whisper语音识别的简单web演示界面】’openai-whisper-webapp - Code for OpenAI Whisper Web App Demo' by amrrs GitHub: github.com/amrrs/openai-whisper-webapp
【Whispering:基于whisper的流语音转录(字幕生成)】’Whispering - Streaming transcriber with whisper' by shirayu GitHub: github.com/shirayu/whispering
【Whisper ASR Webservice:Whisper语音识别的Webservice】’Whisper ASR Webservice - OpenAI Whisper ASR Webservice API' by Ahmet Oner GitHub: github.com/ahmetoner/whisper-asr-webservice
【Automatic subtitles in your videos:用ffmpeg+OpenAI's Whisper为视频文件自动加字幕】’Automatic subtitles in your videos - Automatically generate and overlay subtitles for any video.' by Miguel Piedrafita GitHub: github.com/m1guelpf/auto-subtitle
【whisper.cpp:OpenAI's Whisper高质量语音识别模块C/C++移植版,无依赖低内存支持CPU跨平台】’whisper.cpp - Port of OpenAI's Whisper model in C/C++' by Georgi Gerganov GitHub: github.com/ggerganov/whisper.cpp
【Sound Synthesis Recipes:C++音频合成代码集】’Sound Synthesis Recipes - Code snippets of sound synthesis algorithms in C++' by Matthijs Hollemans GitHub: github.com/hollance/synth-recipes
[AS]《Hierarchical Diffusion Models for Singing Voice Neural Vocoder》N Takahashi, M Kumar, Singh, Y Mitsufuji [Sony Group Corporation] (2022) https://arxiv.org/abs/2210.07508
【ICASSP2022 TTS&VC Summary:总结了ICASSP2022中TTS和VC相关论文,主要是TTS】'ICASSP2022 TTS&VC Summary - ICASSP2022 TTS&VC Summary' by Liumeng Xue GitHub: github.com/lmxue/ICASSP2022_TTS_VC_Summary
【EnCodec: 高保真神经音频压缩编码器】’EnCodec: High Fidelity Neural Audio Compression - State-of-the-art deep learning based audio codec supporting both mono 24 kHz audio and stereo 48 kHz audio.' by Meta Research GitHub: github.com/facebookresearch/encodec
【OpenAI Whisper - CPU:将量化方法应用于 OpenAI Whisper ASR 模型以提高基于CPU部署的推理速度和吞吐量的实验】’OpenAI Whisper - CPU - Improving transcription performance of OpenAI Whisper for CPU based deployment' by MiscellaneousStuff GitHub: github.com/MiscellaneousStuff/openai-whisper-cpu
【FunASR: 基础端到端语音识别工具包】'FunASR: A Fundamental End-to-End Speech Recognition Toolkit’ by Alibaba Damo Academy GitHub: github.com/alibaba-damo-academy/FunASR
【mayavoz:PyTorch语音增强工具包】'mayavoz - Pytorch based speech enhancement toolkit.' by Shahul ES GitHub: github.com/shahules786/mayavoz
【libf0:用于音乐录制中基频估计的Python库】'libf0 - A Python Library for Fundamental Frequency Estimation in Music Recordings' by GroupMM GitHub: github.com/groupmm/libf0
【ASR Corpus Creator:用伪标注创建自动语音识别语料库】’ASR Corpus Creator - This app is intended to automatically create a corpus for ASR systems using pseudo-labeling.' by Yehor Smoliakov GitHub: github.com/egorsmkv/asr-corpus-creator
【WhisperX:强制时间对齐的时间戳精确版Whisper语音识别】’WhisperX - WhisperX: Timestamp-Accurate Automatic Speech Recognition.' by m-bain GitHub: github.com/m-bain/whisperX
【Speech-Editing-Toolkit:集成最新深度学习算法的语音编辑工具箱】’Speech-Editing-Toolkit - It's a repository for implementations of neural speech editing algorithms.' by Jiangzy GitHub: github.com/Zain-Jiang/Speech-Editing-Toolkit
【教程:基于视觉Transformer(ViT)的音频分类(Colab)】《Audio classification with Vision Transformers》 https://colab.research.google.com/drive/1mnArj9S7cij3Ua-dHXoasKWqyNA-GCrT?usp=sharing
【whisperer:基于Whisper的文本-音频数据集构建工具】’whisperer - Go from raw audio files to a text-audio dataset automatically with OpenAI's Whisper.' by Miguel Valente GitHub: github.com/miguelvalente/whisperer
【KAN-TTS:支持中英文的语音合成训练框架】’KAN-TTS - a speech-synthesis training framework' by Alibaba Damo Academy GitHub: github.com/alibaba-damo-academy/KAN-TTS
【Speechbox:语音处理工具包】’Speechbox offers a set of speech processing tools, such as punctuation restoration' by Hugging Face GitHub: github.com/huggingface/speechbox
【Larynx:快速的本地部署神经文本语音合成工具,目前支持英语、德语、丹麦语、挪威语、尼泊尔语、越南语等】’Larynx - A fast, local neural text to speech system' Rhasspy GitHub: github.com/rhasspy/larynx2
'Fish Diffusion - 基于 diff-svc 实现的 TTS / SVS / SVC 的训练框架,用于实现歌声音色转换’ Fish Audio GitHub: github.com/fishaudio/fish-diffusion
【Whisper:用 C++ 重写的 OpenAI's Whisper 语音识别程序的高性能 GPGPU 接口,64-bit Win版,比Pytorch版快一倍多】’Whisper - High-performance GPGPU inference of OpenAI's Whisper automatic speech recognition (ASR) model'
Konstantin GitHub: github.com/Const-me/Whisper
'SoftVC VITS Singing Voice Conversion - 基于vits与softvc的歌声音色转换模型' innnky GitHub: github.com/innnky/so-vits-svc
【音频AI模型进展追踪】’Audio AI Timeline - A timeline of the latest AI models for audio generation, starting in 2023!' archinet GitHub: github.com/archinetai/audio-ai-timeline
【Real Time Whisper Transcription:基于 OpenAI Whisper 的实时语音转录(语音识别)】’Real Time Whisper Transcription - Real time transcription with OpenAI Whisper.' davabase GitHub: github.com/davabase/whisper_real_time
【WaaS - Whisper as a Service:基于 Whisper 的语音转录服务】’WaaS - Whisper as a Service - Whisper as a Service (GUI and API for OpenAI Whisper)' Schibsted GitHub: github.com/schibsted/WAAS
【基于 CTranslate2 的更快的 Whisper 语音转录】’Faster Whisper transcription with CTranslate2 - Faster Whisper transcription with CTranslate2' Guillaume Klein GitHub: github.com/guillaumekln/faster-whisper
【基于 OpenAI Whisper 的说话人分割】’Speaker Diarization Using OpenAI Whisper - Automatic Speech Recognition with Speaker Diarization based on OpenAI Whisper' Mahmoud Ashraf GitHub: github.com/MahmoudAshraf97/whisper-diarization
【Audio Slicer:根据静默片段分割音频的Python脚本】’Audio Slicer - Python script that slices audio with silence detection' Team OpenVPI GitHub: github.com/openvpi/audio-slicer
【transcribe-anything:基于 Whisper 的语音转录服务】’transcribe-anything - Input a local file or url and this service will transcribe it using Whisper AI' Zachary Vorhies GitHub: github.com/zackees/transcribe-anything
【audioFlux:音频/音乐分析与特征提取库】'audioFlux - A library for audio and music analysis, feature extraction.' audioFlux GitHub: github.com/libAudioFlux/audioFlux
【Whisper OpenVINO:OpenVINO版运行更快的 Whisper 语音转录】’Whisper OpenVINO - openvino version of openai/whisper' Zilin Zhu GitHub: github.com/zhuzilin/whisper-openvino
【同声翻译(文本到文本/语音到文本翻译)相关资源大列表】’Awesome Simultaneous Translation - Paper list of simultaneous translation, including text-to-text machine translation and speech-to-text translation.' ZhangShaolei1998 GitHub: github.com/Vily1998/Awesome-Simultaneous-Translation
【Decipher:基于 whisper 给视频自动加字幕】’Decipher - Effortlessly add AI-generated transcription subtitles to your videos' dsymbol GitHub: github.com/dsymbol/deciphe
【whisper-timestamped:基于openai-whisper的多语言自动语音识别(ASR)工具,可以将音频文件转换为文本,并为每个单词提供时间戳】'whisper-timestamped - Multilingual Automatic Speech Recognition with word-level timestamps and confidence' linto.ai GitHub: github.com/linto-ai/whisper-timestamped
【Subs AI:基于Whisper及其变体的字幕生成工具(】'Subs AI - Subtitles generation tool (Web-UI + CLI + Python package) powered by OpenAI's Whisper and its variants' Abdeladim Sadiki GitHub: github.com/abdeladim-s/subsa
【SpeechGPT:免费、开源的ChatGPT语音聊天应用,支持100多种语言,具备优秀的隐私保护和语音识别、语音合成功能】'SpeechGPT - a web application that enables you to converse with ChatGPT.' Xi 网页链接 GitHub: github.com/hahahumble/speechgpt
【Transcriber:采用 Flet 和 OpenAI Whisper 构建的实时语音转文字转录应用】'Transcriber - Real time speech to text transcription app.' davabase GitHub: github.com/davabase/transcriber_app
【Whispering Tiger (Live Translate/Transcribe):免费的开源工具,可以监听/观看机器上的任意音频流或游戏图像,通过Websockets或OSC将转录或翻译输出到Web浏览器】'Whispering Tiger (Live Translate/Transcribe) - Whispering Tiger - OpenAI's whisper with OSC and Websocket support. Allowing live transcription / translation in VRChat and Overlays in most Streaming Applications' Sharrnah GitHub: github.com/Sharrnah/whispering
针对OpenAI开源的语音转文本模型whisper的UI界面 🔗 gitlab.com/aadnk/whisper-webui
【whisper_streaming:基于Whisper的语音实时转录,面向长语音文本转录和翻译】'whisper_streaming - Whisper realtime streaming for long speech-to-text transcription and translation' ÚFAL GitHub: github.com/ufal/whisper_streaming
【Kesha v3.0 very early (aka Jarvis update):基于 Silero TTS + Vosk STT + Picovoice Porcupine + ChatGPT 的智能语音助手实验】'Kesha v3.0 very early (aka Jarvis update) - Voice Assistant made as an experiment using Silero TTS + Vosk STT + Picovoice Porcupine + ChatGPT.' Abraham Tugalov GitHub: github.com/Priler/jarvis
faster-whisper是对OpenAI的Whisper模型的重新实现,使用的是CTranslate2引擎,CTranslate2(github.com/OpenNMT/CTranslate2)是一个用于Transformer模型的快速推理引擎。 这个模型的速度是官方的Whisper性能的4-8倍。 🔗 github.com/guillaumekln/faster-whisper
【支持音色克隆的文本到音频生成,支持中文】’Bark...but with the ability to use voice cloning on custom audio/text pairs - Text-prompted Generative Audio Model - With the ability to clone voices' SERP AI GitHub: github.com/serp-ai/bark-with-voice-clone
【Audio Slicer:音频切片机,简约的 GUI 应用程序,通过静音检测对音频进行切片】'Audio Slicer - A simple GUI application that slices audio with silence detection' flutydeer GitHub: github.com/flutydeer/audio-slicer
So-vits-svc(也称Sovits)是基于VITS、soft-vc、VISinger2等一系列项目开发的一款开源免费AI 语音转换软件。 很多AI翻唱就是用Sovits训练的。 🔗github.com/svc-develop-team/so-vits-svc
【libvits-ncnn:VITS库的ncnn实现,可实现跨平台GPU加速语音合成。使用ncnn库实现深度学习推理,并支持CPU和GPU上的推理】'libvits-ncnn - libvits-ncnn is an ncnn implementation of the VITS library that enables cross-platform GPU-accelerated speech synthesis.' SgDylan GitHub: github.com/Sg4Dylan/libvits-ncnn
【SummerTTS:基于C++的独立编译的中文语音合成项目,可以在本地运行且无需网络连接。它没有额外的依赖,可以在C++环境下独立编译和运行。项目使用Eigen库实现了神经网络的算子,无需依赖像pytorch,tensorflow, ncnn等其他神经网络环境。模型基于语音合成算法vits,可以在Ubuntu、Android和树莓派等Linux平台上运行。此项目提供了一键编译,用户可以将下载的模型放入项目的model目录中,然后通过命令行进行编译和测试语音合成。此外,该项目提供了不同大小的模型,以适应不同的计算能力和音质需求】'SummerTTS - a standalone Chinese speech synthesis(TTS) project that has almost no dependency and could be easily used for Chinese TTS with just one key build out' huakunyang GitHub: github.com/huakunyang/SummerTTS
【Whisper API Streaming:项目旨在为OpenAI的Whisper模型API提供一个流接口。目前只支持响应的流功能】'Whisper API Streaming - Thin wrapper around OpenAI Whisper API with streaming support' George Korepanov GitHub: github.com/gkorepanov/whisper-stream
【whisper-ctranslate2:与原始的基于CTranslate2的OpenAI客户端兼容的命令行客户端,使用CTranslate2和Faster-whisper Whisper实现,相较于openai/whisper,速度提高了4倍,同时占用更少的内存】’whisper-ctranslate2 - Whisper command line client compatible with original OpenAI client based on CTranslate2.' Softcatalà GitHub: github.com/Softcatala/whisper-ctranslate2
【声音活动检测(VAD)相关论文和代码资源】’Voice activity detection (VAD) paper and code - Voice activity detection (VAD) paper(From 198*~2019)and its classification. The arrangement of these papers was arranged when I was studying for a double master degree in UNOKI LAB of JAIST. Now share it with those in need to learn.' LI NAN GitHub: github.com/linan2/Voice-activity-detection-VAD-paper-and-code
【whisper-onnx-cpu:ONNX实现的whisper,不依赖于PyTorch or TensorFlow即可运行】’whisper-onnx-cpu - ONNX implementation of Whisper. PyTorch free.' Katsuya Hyodo GitHub: github.com/PINTO0309/whisper-onnx-cpu
介绍了一种名为LibriTTS-R的语音数据集,通过语音修复技术提高了语音样本的质量,为TTS研究提供了加速。 https://arxiv.org/abs/2305.18802 [AS]《LibriTTS-R: A Restored Multi-Speaker Text-to-Speech Corpus》Y Koizumi, H Zen, S Karita, Y Ding, K Yatabe, N Morioka, M Bacchiani, Y Zhang, W Han, A Bapna [Google & Tokyo University of Agriculture] (2023)
Meta 今天在 GitHub 开源的 Python 库:Audiocraft,可直接用 AI 生成音乐。 GitHub:github.com/facebookresearch/audiocraft 里面主要用到了一个名为 MusicGen 的音乐生成模型,MusicGen 是一个单级自回归 Transformer 模型,在 32kHz EnCodec 分词器上训练,具有 4 个以 50Hz 采样的码本。
'TTS Generation WebUI (Bark v2, MusicGen, Tortoise, Vocos)' Roberts Slisans GitHub: github.com/rsxdalv/tts-generation-webui
【在不超过2GB VRAM GPU的普通消费硬件上生成和训练短音频样本】'A repository for generating and training short audio samples with unconditional waveform diffusion on accessible consumer hardware (<2GB VRAM GPU)' Christopher Landschoot GitHub: github.com/crlandsc/tiny-audio-diffusion
【PhoneLM:用音素作为输入和音频编解码码字作为输出的文本转语音(TTS),基于MegaByte、VALL-E和Encodec模型,使用G2P将文本编码为音素,使用encodec对音频进行编码和解码】’PhoneLM - (R&D) Text to speech using phonemes as inputs and audio codec codes as outputs. Loosely based on MegaByte, VALL-E and Encodec.' MiscellaneousStuff GitHub: github.com/MiscellaneousStuff/PhoneLM
【free-music-demixer:免费的客户端静态网站,用于音乐分离(也称为音源分离),使用了Open-Unmix的AI模型(UMX-L权重)】’free-music-demixer - Open-Unmix (UMX-L) running client-side in the browser with WebAssembly' Sevag H GitHub: github.com/sevagh/free-music-demixer
Memo - AI 驱动的视频、播客转文字、字幕工具 字幕识别和翻译的工具Memo 支持多平台,利用Whisper技术识别语音到到字幕,然后可以对识别的字幕进行简单的编辑。 另外可以对识别的字幕翻译,支持Google翻译和OpenAI(需要自己的API Key) 界面操作友好,语音识别效果不错,普通句子翻译效果也挺好,不过遇到复杂的句子,不能对字幕合并稍微有点麻烦。 https://mxmefbp9p0g.feishu.cn/docx/ZI3ldweTXorTvMxYLbucT00Un5n
【RTVC: Real-Time Voice Conversion GUI:实时语音转换(变声)界面】’RTVC: Real-Time Voice Conversion GUI' Fish Audio GitHub: github.com/fishaudio/realtime-vc-gui
【Wordcab Transcribe:用faster-whisper和多尺度自适应谱聚类进行语音识别(ASR)的FastAPI服务】'Wordcab Transcribe- ASR FastAPI server using faster-whisper and Multi-Scale Auto-Tuning Spectral Clustering for diarization.' Wordcab GitHub: github.com/Wordcab/wordcab-transcribe
【april-asr:C语言写的语音转文本(STT)库】’april-asr - Speech-to-text library in C' abb128 GitHub: github.com/abb128/april-asr
’SummerAsr - 基于C++的可独立编译且几乎没有额外依赖库的本地中文语音识别器。 Summer Asr is a Chinese automatic speech recognize project written with C++ that can be easily built standalone without any depencency.' huakunyang GitHub: github.com/huakunyang/SummerAsr
【基于Grad-TTS的歌唱转换】’Grad-SVC based Grad-TTS from HUAWEI Noah's Ark Lab - Singing Voice Conversion based on Grad-TTS. The core algorithm is diffusion.' PlayVoice GitHub: github.com/PlayVoice/Grad-SVC
【SpeechMOS:只需 2 行代码即可预测主观语音得分,支持多种 MOS 预测系统】'SpeechMOS - Easy-to-Use Speech MOS predictors' tarepan GitHub: github.com/tarepan/SpeechMOS
【开源自动语音识别(ASR)模型排行榜】《Open ASR Leaderboard - a Hugging Face Space by hf-audio》 https://huggingface.co/spaces/hf-audio/
【Light Speed:基于 VITS 的开源文本转语音模型】'Light Speed - A modified VITS that utilizes phoneme duration's ground truth for better robustness' NTT123 GitHub: github.com/NTT123/light-speed
lalal.ai,这个音频处理工具太牛了,它可以对复杂的合成音轨进行精准分离和无损提取。我试了一下,效果非常好。 它主要用于两个场景,一个是音轨剥离,一个是声音移除,例如它可以提取人声、鼓、贝斯、吉他和弦乐等声音,也可以去除背景音乐、麦克风隆隆声以及其他不需要的噪音。下面的视频演示了剥离伴奏和人声的效果,还是比较直观的。 也去搜罗了下实现原理,找到一篇介绍 MSS(Musical Source Separation)的论文:inria.hal.science/hal-01945345/document,它介绍了基于模型和基于信号处理的两种较为传统的处理方式,也提到,当前引入深度神经网络来解决这个问题的应用越来越多,不过最大的局限性还是可用于学习的数据太少,例如你让工具单独提取音频中鸟叫的声音,可能就比较吃力。
Whisper语音识别模型 INT4 低精度版,可以在计算资源有限的环境中更快地运行: huggingface.co/Intel/whisper-tiny-onnx-int4 huggingface.co/Intel/whisper-base-onnx-int4 huggingface.co/Intel/whisper-small-onnx-int4 huggingface.co/Intel/whisper-medium-onnx-int4 huggingface.co/Intel/whisper-large-onnx-int4 huggingface.co/Intel/whisper-large-v2-onnx-int4
'Open-Lyrics - Transcribe (whisper) and translate (gpt) voice into LRC file. 用whisper和gpt将音频转录、翻译为字幕文件' zh-plus GitHub: github.com/zh-plus/openlrc
【Insanely Fast Whisper:超快的Whisper语音识别脚本,用OpenAI的Whisper Large v2在10分钟内转录5小时的音频】’Insanely Fast Whisper' by Vaibhav Srivastav GitHub: github.com/Vaibhavs10/insanely-fast-whisper
Voice Changer 是一款实时语音转换客户端,支持Windows和Mac。 它可以实时变声成其他人或者虚拟角色的音色,可以接入多种语音转换技术,例如:
- MMVC(github.com/isletennos/MMVC_Trainer)
- so-vits-svc (github.com/svc-develop-team/so-vits-svc)
- RVC(Retrieval-based-Voice-Conversion) (github.com/liujing04/Retrieval-based-Voice-Conversion-WebUI)
- DDSP-SVC (github.com/yxlllc/DDSP-SVC) 对于如何使用,有一个YouTube教学视频讲的蛮详细:www.youtube.com/watch?v=_JXbvSTGPoo 项目地址:github.com/w-okada/voice-changer
【Distil-Whisper:蒸馏版Whisper,将语音识别速度提高6倍,模型瘦身49%】’Distil-Whisper' by Hugging Face GitHub: github.com/huggingface/distil-whisper
Insanely Fast Whisper:使用 OpenAI 的 Whisper Large v2 在 10 分钟内转录 300 分钟(5 小时)的音频。 地址:github.com/Vaibhavs10/insanely-fast-whisper
【RealtimeSTT:实时语音转文本库,实现了语音转文本的主流算法,性能优异,易于集成和应用,对开发语音助手、语音表单应用等实时语音交互系统很有帮助】’RealtimeSTT - A robust, efficient, low-latency speech-to-text library with advanced voice activity detection, wake word activation and instant transcription. Designed for real-time applications like voice assistants.' Kolja Beigel GitHub: github.com/KoljaB/RealtimeSTT
声音克隆项目,只要几秒钟的音频样本就能创造出AI语音克隆。刚刚的发布了XTTS v2,包括以下重要更新: ✅ 更出色的零样本克隆能力 ✅ 可以用更多数据进行克隆 ✅ 更加自然的语调和表达力 ✅ 支持匈牙利语和韩语 项目地址:github.com/coqui-ai/tts
【whisper-cpp-python:whisper.cpp的Python封装】’whisper-cpp-python - whisper.cpp bindings for python' Carlos Cardoso Dias GitHub: github.com/carloscdias/whisper-cpp-python
Meta 新推出的实时语音翻译模型 Seamless,能保持原声的表情和风格。 它比较先进的地方在于能判断当前的上下文是否足够输出,如果还不足以判断语音的真实含义,会等待有足够输入后再输出。 号称在语音生成文本和语音翻译方面超越了 Whisper 和 AudioPalm 2。 Seamless 包含一系列的语音模型:
- SeamlessM4Tv2:一款基础的多语种模型
- SeamlessStreaming:提供实时翻译功能
- SeamlessExpressive:能在翻译过程中保留原声的表情和风格
- Seamless:将以上所有模型集成在一起 Github: github.com/facebookresearch/seamless_communication
【Insanely Fast Whisper (CLI):基于Whisper语音识别模型的超快音频转文字命令行工具,用Whisper Large v2在10分钟内转录300分钟音频】’Insanely Fast Whisper (CLI) - The fastest Whisper optimization for automatic speech recognition as a command-line interface' ochen1 GitHub: github.com/ochen1/insanely-fast-whisper-cli
【abracadabra: Python写的歌曲识别工具,实现了Shazam论文中的音频搜索算法,可以通过电脑的麦克风识别正在播放的歌曲,可以用于多个视频的音频对齐和音乐库去重等应用】'abracadabra: Sound recognition in Python' Cameron MacLeod GitHub: github.com/notexactlyawe/abracadabra