ICASSP 2019

Speech Language Processing

Lecture 1: End-to-end Speech Recognition I: General Topics

Bo Li, Yu Zhang, Tara Sainath, Yonghui Wu, William Chan. Bytes Are All You Need: End-to-end Multilingual Speech Recognition and Synthesis with Bytes. [ICASSP 2019] [CoRR2018]
- ASR TTS Multilingual
- Use Unicode bytes instead of characters, sub-words or words as the unit of text representation.
- ASR: outperform grapheme models in both multilingual and monolingual models.
- TTS: match the performance of monolingual grapheme models.
- Small vocabulary size (256 for UTF-8) helps to build compact models.
Shuo-Yiin Chang, Rohit Prabhavalkar, Yanzhang He, Tara N. Sainath, Gabor Simko. Joint Endpointing and Decoding with End-to-end Models. [ICASSP 2019]
Changhao Shan, Chao Weng, Guangsen Wang, Dan Su, Min Luo, Dong Yu, Lei Xie. Component Fusion: Learning Replaceable Language Model Component for End-to-end Speech Recognition System. [ICASSP 2019]
- ASR LM Fushion Component Fushion
- Use Component Fusion to incorporate externally trained LM into an attention-based ASR system.
- Make use of the large amount of text corpora.
- Concatenate the hidden states of ASR model and LM (unlike Cold Fushion: first, train the ASR model jointly with a fixed LM pre-trained on transcript text, then replace the LM with another one which is trained on a larger text corpora or out-of-domain text during decoding).
- Achieve better performance in both in-domain and out-of-domain senary.
Stefan Braun, Shih-Chii Liu. Parameter Uncertainty for End-to-end Speech Recognition. [ICASSP 2019]
Shane Settle, Kartik Audhkhasi, Karen Livescu, Michael Picheny. Acoustically Grounded Word Embeddings for Improved Acoustics-to-Word Speech Recognition. [ICASSP 2019] [CoRR 2019]
Murali Karthick Baskar, Lukás Burget, Shinji Watanabe, Martin Karafiát, Takaaki Hori, Jan Honza Cernocký. Promising Accurate Prefix Boosting for sequence-to-sequence ASR. [ICASSP 2019] [CoRR 2018]

Lecture 2: End-to-end Speech Recognition II: New Models

Lecture 6: Systems for Speaker Recognition and Identification

Poster 2: Speaker Verification and Identification I

Poster 4: Speaker Verification and Identification II

Poster 6: Features and Robustness for Speaker Identification

Poster 8: Features and Learning for Speaker Identification and Diarization

Lecture 8: Speech Synthesis I

Jean-Marc Valin, Jan Skoglund. LPCNET: Improving Neural Speech Synthesis through Linear Prediction. [ICASSP 2019] [[CoRR 2018](LPCNET: Improving Neural Speech Synthesis through Linear Prediction)]
- TTS Spectrogram-to-waveform LPCNet
- Use linear predictor to simplify the prediction, so that the network directly predict the excitation (residual).
Jungbae Park, Kijong Han, Yuneui Jeong, Sang Wan Lee. Phonemic-level Duration Control Using Attention Alignment for Natural Speech Synthesis. [ICASSP 2019]
Wei-Ning Hsu, Yu Zhang, Ron J. Weiss, Yu-An Chung, Yuxuan Wang, Yonghui Wu, James Glass. Disentangling Correlated Speaker and Noise for Speech Synthesis via Data Augmentation and Adversarial Factorization. [ICASSP 2019]
Kyle Kastner, João Felipe Santos, Yoshua Bengio, Aaron Courville. Representation Mixing for TTS Synthesis. [ICASSP 2019] [CoRR 2018]
- TTS Text-to-spectrogram Representation Mixing
- Single word can have different pronunciation.
- Combine grapheme and phoneme inputs in a single encoder flexibly (with mask).
Younggun Lee, Taesu Kim. Robust and Fine-grained Prosody Control of End-to-end Speech Synthesis. [ICASSP 2019] [CoRR 2018]
Xin Wang, Shinji Takaki, Junichi Yamagishi. Neural Source-filter-based Waveform Model for Statistical Parametric Speech Synthesis. [ICASSP 2019] [CoRR 2018]
- TTS Spectrogram-to-waveform
- Use source module to generate sine-based excitation signal, then use filter module to transform excitation signal into waveform.
- Faster than autoregressive models (e.g., WaveNet).
- Simpler than non-autoregressive models (e.g., Parallel WaveNet), since it does not need to employ complicated training method (e.g., distilling).

Poster 20: Speech Synthesis II

Poster 22: Speech Synthesis III

Name		Name	Last commit message	Last commit date
Latest commit History 31 Commits
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

ICASSP 2019

Speech Language Processing

Lecture 1: End-to-end Speech Recognition I: General Topics

Lecture 2: End-to-end Speech Recognition II: New Models

Lecture 6: Systems for Speaker Recognition and Identification

Poster 2: Speaker Verification and Identification I

Poster 4: Speaker Verification and Identification II

Poster 6: Features and Robustness for Speaker Identification

Poster 8: Features and Learning for Speaker Identification and Diarization

Lecture 8: Speech Synthesis I

Poster 20: Speech Synthesis II

Poster 22: Speech Synthesis III

About

Releases

Packages

TangHaitao1994/ICASSP2019

Folders and files

Latest commit

History

Repository files navigation

ICASSP 2019

Speech Language Processing

Lecture 1: End-to-end Speech Recognition I: General Topics

Lecture 2: End-to-end Speech Recognition II: New Models

Lecture 6: Systems for Speaker Recognition and Identification

Poster 2: Speaker Verification and Identification I

Poster 4: Speaker Verification and Identification II

Poster 6: Features and Robustness for Speaker Identification

Poster 8: Features and Learning for Speaker Identification and Diarization

Lecture 8: Speech Synthesis I

Poster 20: Speech Synthesis II

Poster 22: Speech Synthesis III

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Packages