My research interests encompass the extensive domain of speech and language intelligence, which includes speech foundation models, large language models (LLMs), text-to-speech synthesis (TTS), voice conversion (VC), singing synthesis, cross-modal representation learning, audio adversarial attacks & defense, among other related areas.
🎯
Focusing
Work on spoken language processing: General Audio synthesis, TTS, VC, SVS & SVC etc.
Highlights
- Pro
Pinned Loading
-
StarGAN-Voice-Conversion
StarGAN-Voice-Conversion PublicThis is a pytorch implementation of the paper: StarGAN-VC: Non-parallel many-to-many voice conversion with star generative adversarial networks
-
efficient_tts
efficient_tts PublicPytorch implementation of "Efficienttts: an efficient and high-quality text-to-speech architecture"
-
BNE-Seq2SeqMoL-VC
BNE-Seq2SeqMoL-VC PublicDemo for "Any-to-Many Voice Conversion with Location-Relative Sequence-to-Sequence Modeling"
-
Large-Audio-Models
Large-Audio-Models PublicKeep track of big models in audio domain, including speech, singing, music etc.
Something went wrong, please refresh the page to try again.
If the problem persists, check the GitHub status page or contact support.
If the problem persists, check the GitHub status page or contact support.