forked from SpeechResearch/speechresearch.github.io
-
Notifications
You must be signed in to change notification settings - Fork 0
/
index.html
226 lines (182 loc) · 12.2 KB
/
index.html
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
<!DOCTYPE html>
<html lang="en">
<head>
<meta charset="utf-8">
<meta name="generator" content="Hugo 0.88.1" />
<meta name="viewport" content="width=device-width, initial-scale=1">
<link href="https://fonts.googleapis.com/css?family=Roboto:300,400,700" rel="stylesheet" type="text/css">
<link rel="stylesheet" href="//cdnjs.cloudflare.com/ajax/libs/highlight.js/8.4/styles/github.min.css">
<link rel="stylesheet" href="css/normalize.css">
<link rel="stylesheet" href="css/skeleton.css">
<link rel="stylesheet" href="css/custom.css">
<link rel="alternate" href="index.xml" type="application/rss+xml" title="Speech Research">
<link rel="shortcut icon" href="favicon.png" type="image/x-icon" />
<title>Speech Research</title>
</head>
<body>
<div class="container">
<header role="banner">
<h1 class="site-title">Speech Research </h1>
</header>
This page lists some speech related research at Microsoft Research Asia, conducted by the team led by <a href="https://www.microsoft.com/en-us/research/people/xuta/">Xu Tan</a>. The research topics cover text to speech, singing voice synthesis, music generation, automatic speech recognition, etc. Some research are open-sourced via <a href="https://github.com/microsoft/neuralspeech">NeuralSpeech</a> and <a href="https://github.com/microsoft/muzic">Muzic</a>.
<br>
<br>
We are hiring researchers on speech, NLP, and deep learning at Microsoft Research Asia. Please contact xuta@microsoft.com if you have interests.
<br>
<br>
<main role="main">
<article itemscope itemtype="http://schema.org/Blog">
<h2 class="entry-title" itemprop="headline"><a href="/prompttts2/">PromptTTS 2: Describing and Generating Voices with Text Prompt</a></h2>
<span class="entry-meta"><time itemprop="datePublished" datetime="2023-09-07">September 07, 2023</time></span>
</article>
<article itemscope itemtype="http://schema.org/Blog">
<h2 class="entry-title" itemprop="headline"><a href="/naturalspeech2/">NaturalSpeech 2: Latent Diffusion Models are Natural and Zero-Shot Speech and Singing Synthesizers</a></h2>
<span class="entry-meta"><time itemprop="datePublished" datetime="2023-4-19">April 19, 2023</time></span>
</article>
<article itemscope itemtype="http://schema.org/Blog">
<h2 class="entry-title" itemprop="headline"><a href="/prompttts/">PromptTTS: Controllable Text-to-Speech with Text Descriptions</a></h2>
<span class="entry-meta"><time itemprop="datePublished" datetime="2022-11-22">November 22, 2022</time></span>
</article>
<article itemscope itemtype="http://schema.org/Blog">
<h2 class="entry-title" itemprop="headline"><a href="/videodubbing/">VideoDubber: Machine Translation with Speech-Aware Length Control for Video Dubbing</a></h2>
<span class="entry-meta"><time itemprop="datePublished" datetime="2022-08-30">August 30, 2022</time></span>
</article>
<article itemscope itemtype="http://schema.org/Blog">
<h2 class="entry-title" itemprop="headline"><a href="/binauralgrad/">BinauralGrad: A Two-Stage Conditional Diffusion Probabilistic Model for Binaural Audio Synthesis</a></h2>
<span class="entry-meta"><time itemprop="datePublished" datetime="2022-05-29">May 29, 2022</time></span>
</article>
<article itemscope itemtype="http://schema.org/Blog">
<h2 class="entry-title" itemprop="headline"><a href="/naturalspeech/">NaturalSpeech: End-to-End Text to Speech Synthesis with Human-Level Quality</a></h2>
<span class="entry-meta"><time itemprop="datePublished" datetime="2022-05-03">May 03, 2022</time></span>
</article>
<article itemscope itemtype="http://schema.org/Blog">
<h2 class="entry-title" itemprop="headline"><a href="/mpbert/">Mixed-Phoneme BERT: Improving BERT with Mixed Phoneme and Sup-Phoneme Representations for Text to Speech</a></h2>
<span class="entry-meta"><time itemprop="datePublished" datetime="2022-04-02">April 02, 2022</time></span>
</article>
<article itemscope itemtype="http://schema.org/Blog">
<h2 class="entry-title" itemprop="headline"><a href="/adaspeech4/">AdaSpeech 4: Adaptive Text to Speech in Zero-Shot Scenarios</a></h2>
<span class="entry-meta"><time itemprop="datePublished" datetime="2022-03-06">March 06, 2022</time></span>
</article>
<article itemscope itemtype="http://schema.org/Blog">
<h2 class="entry-title" itemprop="headline"><a href="/speechtransducer/">Speech-T: Transducer for Text to Speech and Beyond</a></h2>
<span class="entry-meta"><time itemprop="datePublished" datetime="2021-10-06">October 06, 2021</time></span>
</article>
<article itemscope itemtype="http://schema.org/Blog">
<h2 class="entry-title" itemprop="headline"><a href="/telemelody/">TeleMelody: Lyric-to-Melody Generation with a Template-Based Two-Stage Method</a></h2>
<span class="entry-meta"><time itemprop="datePublished" datetime="2021-09-21">September 21, 2021</time></span>
</article>
<article itemscope itemtype="http://schema.org/Blog">
<h2 class="entry-title" itemprop="headline"><a href="/deeprapper/">DeepRapper: Neural Rap Generation with Rhyme and Rhythm Modeling</a></h2>
<span class="entry-meta"><time itemprop="datePublished" datetime="2021-08-16">August 16, 2021</time></span>
</article>
<article itemscope itemtype="http://schema.org/Blog">
<h2 class="entry-title" itemprop="headline"><a href="/priorgrad/">PriorGrad: Improving Conditional Denoising Diffusion Models with Data-Driven Adaptive Prior</a></h2>
<span class="entry-meta"><time itemprop="datePublished" datetime="2021-06-11">June 11, 2021</time></span>
</article>
<article itemscope itemtype="http://schema.org/Blog">
<h2 class="entry-title" itemprop="headline"><a href="/adaspeech3/">AdaSpeech 3: Adaptive Text to Speech for Spontaneous Style</a></h2>
<span class="entry-meta"><time itemprop="datePublished" datetime="2021-06-02">June 02, 2021</time></span>
</article>
<article itemscope itemtype="http://schema.org/Blog">
<h2 class="entry-title" itemprop="headline"><a href="/adaspeech2/">AdaSpeech 2: Adaptive Text to Speech with Untranscribed Data</a></h2>
<span class="entry-meta"><time itemprop="datePublished" datetime="2021-03-05">March 05, 2021</time></span>
</article>
<article itemscope itemtype="http://schema.org/Blog">
<h2 class="entry-title" itemprop="headline"><a href="/adaspeech/">AdaSpeech: Adaptive Text to Speech for Custom Voice</a></h2>
<span class="entry-meta"><time itemprop="datePublished" datetime="2021-03-01">March 01, 2021</time></span>
</article>
<article itemscope itemtype="http://schema.org/Blog">
<h2 class="entry-title" itemprop="headline"><a href="/fastspeech2/">FastSpeech 2: Fast and High-Quality End-to-End Text to Speech</a></h2>
<span class="entry-meta"><time itemprop="datePublished" datetime="2021-02-10">February 10, 2021</time></span>
</article>
<article itemscope itemtype="http://schema.org/Blog">
<h2 class="entry-title" itemprop="headline"><a href="/songmass/">SongMASS: Automatic Song Writing with Pre-training and Alignment Constraint</a></h2>
<span class="entry-meta"><time itemprop="datePublished" datetime="2020-12-14">December 14, 2020</time></span>
</article>
<article itemscope itemtype="http://schema.org/Blog">
<h2 class="entry-title" itemprop="headline"><a href="/lightspeech/">LightSpeech: Lightweight and Fast Text to Speech with Neural Architecture Search</a></h2>
<span class="entry-meta"><time itemprop="datePublished" datetime="2020-11-03">November 03, 2020</time></span>
</article>
<article itemscope itemtype="http://schema.org/Blog">
<h2 class="entry-title" itemprop="headline"><a href="/denoispeech/">DenoiSpeech: Denoising Text to Speech with Frame-Level Noise Modeling</a></h2>
<span class="entry-meta"><time itemprop="datePublished" datetime="2020-10-14">October 14, 2020</time></span>
</article>
<article itemscope itemtype="http://schema.org/Blog">
<h2 class="entry-title" itemprop="headline"><a href="/hifisinger/">HiFiSinger: Towards High-Fidelity Neural Singing Voice Synthesis</a></h2>
<span class="entry-meta"><time itemprop="datePublished" datetime="2020-09-02">September 02, 2020</time></span>
</article>
<article itemscope itemtype="http://schema.org/Blog">
<h2 class="entry-title" itemprop="headline"><a href="/popmag/">PopMAG: Pop Music Accompaniment Generation</a></h2>
<span class="entry-meta"><time itemprop="datePublished" datetime="2020-08-01">August 01, 2020</time></span>
</article>
<article itemscope itemtype="http://schema.org/Blog">
<h2 class="entry-title" itemprop="headline"><a href="/uwspeech/">UWSpeech: Speech to Speech Translation for Unwritten Languages</a></h2>
<span class="entry-meta"><time itemprop="datePublished" datetime="2020-06-12">June 12, 2020</time></span>
</article>
<article itemscope itemtype="http://schema.org/Blog">
<h2 class="entry-title" itemprop="headline"><a href="/multispeech/">MultiSpeech: Multi-Speaker Text to Speech with Transformer</a></h2>
<span class="entry-meta"><time itemprop="datePublished" datetime="2020-05-09">May 09, 2020</time></span>
</article>
<article itemscope itemtype="http://schema.org/Blog">
<h2 class="entry-title" itemprop="headline"><a href="/seminas/">Semi-Supervised Neural Architecture Search</a></h2>
<span class="entry-meta"><time itemprop="datePublished" datetime="2020-03-01">March 01, 2020</time></span>
</article>
<article itemscope itemtype="http://schema.org/Blog">
<h2 class="entry-title" itemprop="headline"><a href="/deepsinger/">DeepSinger: Singing Voice Synthesis with Data Mined From the Web</a></h2>
<span class="entry-meta"><time itemprop="datePublished" datetime="2020-02-14">February 14, 2020</time></span>
</article>
<article itemscope itemtype="http://schema.org/Blog">
<h2 class="entry-title" itemprop="headline"><a href="/lrspeech/">LRSpeech: Extremely Low-Resource Speech Synthesis and Recognition</a></h2>
<span class="entry-meta"><time itemprop="datePublished" datetime="2020-02-02">February 02, 2020</time></span>
</article>
<article itemscope itemtype="http://schema.org/Blog">
<h2 class="entry-title" itemprop="headline"><a href="/fastspeech/">FastSpeech: Fast, Robust and Controllable Text to Speech</a></h2>
<span class="entry-meta"><time itemprop="datePublished" datetime="2019-05-10">May 10, 2019</time></span>
</article>
<article itemscope itemtype="http://schema.org/Blog">
<h2 class="entry-title" itemprop="headline"><a href="/unsuper/">Almost Unsupervised Text to Speech and Automatic Speech Recognition</a></h2>
<span class="entry-meta"><time itemprop="datePublished" datetime="2019-04-10">April 10, 2019</time></span>
</article>
</main>
</div>
<script>
(function(i,s,o,g,r,a,m){i['GoogleAnalyticsObject']=r;i[r]=i[r]||function(){
(i[r].q=i[r].q||[]).push(arguments)},i[r].l=1*new Date();a=s.createElement(o),
m=s.getElementsByTagName(o)[0];a.async=1;a.src=g;m.parentNode.insertBefore(a,m)
})(window,document,'script','//www.google-analytics.com/analytics.js','ga');
ga('create', 'UA-139981676-1', 'auto');
ga('send', 'pageview');
</script>
<script src="//cdnjs.cloudflare.com/ajax/libs/highlight.js/8.4/highlight.min.js"></script>
<script>hljs.initHighlightingOnLoad();</script>
<script type="text/x-mathjax-config">
MathJax.Hub.Config({
HTML: ["input/TeX","output/HTML-CSS"],
TeX: {
Macros: {
bm: ["\\boldsymbol{#1}", 1],
argmax: ["\\mathop{\\rm arg\\,max}\\limits"],
argmin: ["\\mathop{\\rm arg\\,min}\\limits"]},
extensions: ["AMSmath.js","AMSsymbols.js"],
equationNumbers: { autoNumber: "AMS" } },
extensions: ["tex2jax.js"],
jax: ["input/TeX","output/HTML-CSS"],
tex2jax: { inlineMath: [ ['$','$'], ["\\(","\\)"] ],
displayMath: [ ['$$','$$'], ["\\[","\\]"] ],
processEscapes: true },
"HTML-CSS": { availableFonts: ["TeX"],
linebreaks: { automatic: true } }
});
</script>
<script type="text/x-mathjax-config">
MathJax.Hub.Config({
tex2jax: {
skipTags: ['script', 'noscript', 'style', 'textarea', 'pre', 'code']
}
});
</script>
<script type="text/javascript" async
src="https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.4/MathJax.js?config=TeX-MML-AM_CHTML">
</script>
</body>
</html>