Update index.html

yongxuUSTC · May 12, 2020 · 26930fe · 26930fe
1 parent 2702618
commit 26930fe
Showing 1 changed file with 13 additions and 1 deletion.
diff --git a/index.html b/index.html
@@ -28,7 +28,7 @@ <h2 id="neural-spatio-temporal-filtering-for-target-speech-separation">Neural Sp
   </tr>
 </table>
 <p>&nbsp;</p>
-<p><strong>Demo 2: Real-world recording and testing:</strong></p>
+<p><strong>Demo 2: Real-world far-field recording and testing:</strong></p>
 <p><img src="180degree-camera-and-15-element-mic-array.png" width="573" height="94" /></p>
 <p>15-element non-uniform linear microphone array and colocated 180 degree wide-angle camera for our real-world video and audio recording</p>
 <p>For the real-world videos, as the 180-degree wide-angle camera is colocated with the linear mic array, the rough DOA of the target speaker could be estimated according to the location of the target speaker in the whole camera view [1]. Face detection and face tracking are conducted to track the target speaker's DOA and lip movement. (Note that in our simulation data, there is no need to do face tracking, because the video of each overlapped speaker is already in single-face mode after the data cleaning and filtering process.)</p>
@@ -42,6 +42,18 @@ <h2 id="neural-spatio-temporal-filtering-for-target-speech-separation">Neural Sp
     <td width="572">Real-world <strong>separated male speaker's speech</strong> by the proposed multi-tap MVDR method (face detected and tracked in the red rectangle)</td>
   </tr>
 </table>
+  <p><strong>Demo 3: Real-world far-field recording and testing 2:</strong></p>
+<p>
+  <video src="video/real_world_demo2_mixture.mp4" width="574" height="273" controls preload></video>
+  <video src="video/real_world_demo2_female_enh.mp4" width="574" height="273" controls preload></video>
+</p>
+<table width="1157" border="1">
+  <tr>
+    <td width="569">Real-world far-field <strong>two-speaker mixture</strong> recorded by the hardware (camera and microphone array ) above</td>
+    <td width="572">Real-world <strong>separated female speaker's speech</strong> by the proposed multi-tap MVDR method (face detected and tracked in the red rectangle).</td>
+  </tr>
+</table>
+<p>&nbsp;</p>
 <p>&nbsp;</p>
 <p><strong>Reference: </strong></p>
 <p>[1] Audio-Visual Speech Separation and Dereverberation with a Two-Stage Multimodal Network, Ke Tan, Yong XU, Shixiong Zhang, Meng Yu, Dong Yu, accepted to IEEE Journal of Selcted Topics in Signal Processing, 2020</p>