NeuralSpeech

NeuralSpeech is a research project in Microsoft Research Asia focusing on neural network based speech processing, including automatic speech recognition (ASR), text to speech (TTS), spatial audio, etc.

Currently this repo covers several research work:

Automatic Speech Recognition
Text to Speech
- LightSpeech, ICASSP 2021
- PriorGrad, ICLR 2022
Spatial Audio
- BinauralGrad, NeurIPS 2022

For more research in NeuralSpeech project, you can refer to this page: https://speechresearch.github.io/. We will release more research work in the future.

For our research on AI music, you can refer to our Muzic project: https://github.com/microsoft/muzic.

We are hiring!

We are hiring researchers on speech (speech synthesis, speech recognition, voice conversion, audio processing), natural language processing, and machine learning. Please contact Xu Tan (xuta@microsoft.com) if you have interests.

Reference

If you find NeuralSpeech project useful in your work, you can cite the following papers:

FastCorrect: Fast Error Correction with Edit Alignment for Automatic Speech Recognition, Yichong Leng, Xu Tan, Linchen Zhu, Jin Xu, Renqian Luo, Linquan Liu, Tao Qin, Xiang-Yang Li, Ed Lin and Tie-Yan Liu, NeurIPS 2021.
FastCorrect 2: Fast Error Correction on Multiple Candidates for Automatic Speech Recognition, Yichong Leng, Xu Tan, Rui Wang, Linchen Zhu, Jin Xu, Wenjie Liu, Linquan Liu, Tao Qin, Xiang-Yang Li, Ed Lin, Tie-Yan Liu, Findings of EMNLP 2021.
LightSpeech: Lightweight and Fast Text to Speech with Neural Architecture Search, Renqian Luo, Xu Tan, Rui Wang, Tao Qin, Jinzhu Li, Sheng Zhao, Enhong Chen and Tie-Yan Liu, ICASSP 2021.
PriorGrad: Improving Conditional Denoising Diffusion Models with Data-Dependent Adaptive Prior, Sang-gil Lee, Heeseung Kim, Chaehun Shin, Xu Tan, Chang Liu, Qi Meng, Tao Qin, Wei Chen, Sungroh Yoon, Tie-Yan Liu, ICLR 2022.
[CMatch] Cross-domain Speech Recognition with Unsupervised Character-level Distribution Matching, Wenxin Hou, Jindong Wang, Xu Tan, Tao Qin, Takahiro Shinozaki. Interspeech 2021.
[Adapter] Exploiting Adapters for Cross-lingual Low-resource Speech Recognition, Wenxin Hou, Han Zhu, Yidong Wang, Jindong Wang, Tao Qin, Renjun Xu, Takahiro Shinozaki. IEEE/ACM Transactions on Audio, Speech, and Language Processing (TASLP) 2022.
BinauralGrad: A Two-Stage Conditional Diffusion Probabilistic Model for Binaural Audio Synthesis, Yichong Leng, Zehua Chen, Junliang Guo, Haohe Liu, Jiawei Chen, Xu Tan, Danilo Mandic, Lei He, Xiang-Yang Li, Tao Qin, Sheng Zhao and Tie-Yan Liu, NeurIPS 2022.

Contributing

This project welcomes contributions and suggestions. Most contributions require you to agree to a Contributor License Agreement (CLA) declaring that you have the right to, and actually do, grant us the rights to use your contribution. For details, visit https://cla.opensource.microsoft.com.

When you submit a pull request, a CLA bot will automatically determine whether you need to provide a CLA and decorate the PR appropriately (e.g., status check, comment). Simply follow the instructions provided by the bot. You will only need to do this once across all repos using our CLA.

This project has adopted the Microsoft Open Source Code of Conduct. For more information see the Code of Conduct FAQ or contact opencode@microsoft.com with any additional questions or comments.

Trademarks

This project may contain trademarks or logos for projects, products, or services. Authorized use of Microsoft trademarks or logos is subject to and must follow Microsoft's Trademark & Brand Guidelines. Use of Microsoft trademarks or logos in modified versions of this project must not cause confusion or imply Microsoft sponsorship. Any use of third-party trademarks or logos are subject to those third-party's policies.

Name		Name	Last commit message	Last commit date
Latest commit History 113 Commits
AdapterASR		AdapterASR
BinauralGrad		BinauralGrad
CMatchASR		CMatchASR
FastCorrect		FastCorrect
FastCorrect2		FastCorrect2
LightSpeech		LightSpeech
PriorGrad-acoustic		PriorGrad-acoustic
PriorGrad-vocoder		PriorGrad-vocoder
img		img
.gitmodules		.gitmodules
LICENSE.txt		LICENSE.txt
README.md		README.md
SECURITY.md		SECURITY.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

NeuralSpeech

We are hiring!

Reference

Contributing

Trademarks

About

Uh oh!

Releases

Packages

Languages

License

wyh2000/NeuralSpeech

Folders and files

Latest commit

History

Repository files navigation

NeuralSpeech

We are hiring!

Reference

Contributing

Trademarks

About

Resources

License

Security policy

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages