I am a PhD candidate at the College of Computer Science and Technology, Zhejiang University (浙江大学计算机学院).
I work on the Audio Research Team at Zhejiang University, under the supervision of Prof. Zhou Zhao (赵洲). Previously, I graduated from Chu Kochen Honors College, Zhejiang University (浙江大学竺可桢学院), with dual bachelor's degrees in Computer Science and Automation. I have also served as a visiting scholar at University of Rochester with Prof. Zhiyao Duan and University of Massachusetts Amherst with Prof. Przemyslaw Grabowicz.
My research interests primarily focus on Multi-Modal Generative AI, specifically in Spatial Audio, Music, Singing, and Speech. I have published first-author papers at top international AI conferences, including NeurIPS, AAAI, and EMNLP. Currently, I am working on spatial audio generation (binaural or FOA speech/audio/music) with multimodal prompts.
I am actively seeking research collaborations. Please feel free to contact me via email at yuzhang34@zju.edu.cn.
- Personal Pages: https://aaronz345.github.io (updated recently🔥)
- Linkedin: www.linkedin.com/in/yuzhang34
- Google Scholar: https://scholar.google.com/citations?user=kA9A6LsAAAAJ
- DBLP: https://dblp.org/pid/50/671-126.html
Preprint
ISDrama: Immersive Spatial Drama Generation through Multimodal Prompting, Yu Zhang, Wenxiang Guo, Changhao Pan, et al.
Preprint
Versatile Framework for Song Generation with Prompt-based Control, Yu Zhang, Wenxiang Guo, Changhao Pan, et al.
EMNLP 2024
TCSinger: Zero-Shot Singing Voice Synthesis with Style Transfer and Multi-Level Style Control, Yu Zhang, Ziyue Jiang, Ruiqi Li, et al.NeurIPS 2024 Spotlight
GTSinger: A Global Multi-Technique Singing Corpus with Realistic Music Scores for All Singing Tasks, Yu Zhang, Changhao Pan, Wenxinag Guo, et al.AAAI 2024
StyleSinger: Style Transfer for Out-of-Domain Singing Voice Synthesis, Yu Zhang, Rongjie Huang, Ruiqi Li, et al.AAAI 2025
TechSinger: Technique Controllable Multilingual Singing Voice Synthesis via Flow Matching, Wenxiang Guo, Yu Zhang, Changhao Pan, et al.ACL 2024
Robust Singing Voice Transcription Serves Synthesis, Ruiqi Li, Yu Zhang, Yongqi Wang, et al.
Preprint
MegaTTS 3: Sparse Alignment Enhanced Latent Diffusion Transformer for Zero-Shot Speech Synthesis, Ziyue Jiang, Yi Ren, Ruiqi Li, Shengpeng Ji, Zhenhui Ye, Chen Zhang, Bai Jionghao, Xiaoda Yang, Jialong Zuo, Yu Zhang, et al.
IJCAI 2025
Leveraging Pretrained Diffusion Models for Zero-Shot Part Assembly, Ruiyuan Zhang, Qi Wang, Jiaxiang Liu, Yu Zhang, et al.