Skip to content

Commit

Permalink
Update index.html
Browse files Browse the repository at this point in the history
  • Loading branch information
yongxuUSTC authored May 12, 2020
1 parent 2702618 commit 26930fe
Showing 1 changed file with 13 additions and 1 deletion.
14 changes: 13 additions & 1 deletion index.html
Original file line number Diff line number Diff line change
Expand Up @@ -28,7 +28,7 @@ <h2 id="neural-spatio-temporal-filtering-for-target-speech-separation">Neural Sp
</tr>
</table>
<p>&nbsp;</p>
<p><strong>Demo 2: Real-world recording and testing:</strong></p>
<p><strong>Demo 2: Real-world far-field recording and testing:</strong></p>
<p><img src="180degree-camera-and-15-element-mic-array.png" width="573" height="94" /></p>
<p>15-element non-uniform linear microphone array and colocated 180 degree wide-angle camera for our real-world video and audio recording</p>
<p>For the real-world videos, as the 180-degree wide-angle camera is colocated with the linear mic array, the rough DOA of the target speaker could be estimated according to the location of the target speaker in the whole camera view [1]. Face detection and face tracking are conducted to track the target speaker's DOA and lip movement. (Note that in our simulation data, there is no need to do face tracking, because the video of each overlapped speaker is already in single-face mode after the data cleaning and filtering process.)</p>
Expand All @@ -42,6 +42,18 @@ <h2 id="neural-spatio-temporal-filtering-for-target-speech-separation">Neural Sp
<td width="572">Real-world <strong>separated male speaker's speech</strong> by the proposed multi-tap MVDR method (face detected and tracked in the red rectangle)</td>
</tr>
</table>
<p><strong>Demo 3: Real-world far-field recording and testing 2:</strong></p>
<p>
<video src="video/real_world_demo2_mixture.mp4" width="574" height="273" controls preload></video>
<video src="video/real_world_demo2_female_enh.mp4" width="574" height="273" controls preload></video>
</p>
<table width="1157" border="1">
<tr>
<td width="569">Real-world far-field <strong>two-speaker mixture</strong> recorded by the hardware (camera and microphone array ) above</td>
<td width="572">Real-world <strong>separated female speaker's speech</strong> by the proposed multi-tap MVDR method (face detected and tracked in the red rectangle).</td>
</tr>
</table>
<p>&nbsp;</p>
<p>&nbsp;</p>
<p><strong>Reference: </strong></p>
<p>[1] Audio-Visual Speech Separation and Dereverberation with a Two-Stage Multimodal Network, Ke Tan, Yong XU, Shixiong Zhang, Meng Yu, Dong Yu, accepted to IEEE Journal of Selcted Topics in Signal Processing, 2020</p>
Expand Down

0 comments on commit 26930fe

Please sign in to comment.