Lina-Speech: Gated Linear Attention is a Fast and Parameter-Efficient Learner for text-to-speech synthesis

Authors: Théodor Lemerle, Harrison Vanderbyl, Vaibhav Srivastav, Nicolas Obin, Axel Roebel.

Lina-Speech is a neural codec language model that provides state-of-the-art performances on zero-shot TTS. It replaces self-attention with Gated Linear Attention, we believe it is a sound choice for audio. It features:

Voice cloning with short samples by prompt continuation.
High-throughput : batch inference can go high at no cost on a consumer grade GPU.
Initial-State Tuning (s/o RWKV + fast implem by FLA): fast speaker adaptation by tuning a recurrent state. Save your context window from long prompt !

Environment setup

conda create -n lina python=3.10
conda activate lina

pip install torch==2.5.1
pip install causal-conv1d==1.3.0.post1
pip install -r requirements.txt

ln -s 3rdparty/flash-linear-attention/fla fla
ln -s 3rdparty/encoder encoder
ln -s 3rdparty/decoder decoder

cd 3rdparty/flash-linear-attention
git checkout 739ef15f97cff06366c97dfdf346f2ceaadf05ce

Checkpoints

WavTokenizer

You will need this checkpoint of WavTokenizer and the config file : [WavTokenizer-ckpt] [config file]

Lina-Speech

Dataset: LibriTTS + LibriTTS-R + MLS-english split (10k hours) + GigaSpeech XL:

169M parameters version trained for 100B tokens: [Lina-Speech 169M]

Inference

See InferenceLina.ipynb and complete the first cells with the correct checkpoints and config paths.

horse.mp4

count_it_up.mp4

Acknowledgments

The RWKV authors and the community for carrying high-level truly open-source research.
@SmerkyG for making our life easy at testing cutting edge language model.
To the GLA/flash-linear-attention authors for their outstanding work.
To the WavTokenizer authors for releasing such a brilliant speech codec.
🤗 for supporting this project.

Cite

@misc{lemerle2024linaspeechgatedlinearattention,
      title={Lina-Speech: Gated Linear Attention is a Fast and Parameter-Efficient Learner for text-to-speech synthesis}, 
      author={Théodor Lemerle and Harrison Vanderbyl and Vaibhav Srivastav and Nicolas Obin and Axel Roebel},
      year={2024},
      eprint={2410.23320},
      archivePrefix={arXiv},
      primaryClass={eess.AS},
      url={https://arxiv.org/abs/2410.23320}, 
}

Disclaimer

Before using these pre-trained models, you agree to inform the listeners that the speech samples are synthesized by the pre-trained models, unless you have the permission to use the voice you synthesize. That is, you agree to only use voices whose speakers grant the permission to have their voice cloned, either directly or by license before making synthesized voices public, or you have to publicly announce that these voices are synthesized if you do not have the permission to use these voices.

IRCAM

This work has been initiated in the Analysis/Synthesis team of the STMS Laboratory at IRCAM, and has been funded by the following project:

ANR Exovoices

Name		Name	Last commit message	Last commit date
Latest commit History 66 Commits
3rdparty		3rdparty
model		model
.gitmodules		.gitmodules
InferenceLina.ipynb		InferenceLina.ipynb
LICENSE		LICENSE
README.md		README.md
bpe256.json		bpe256.json
initial_state.py		initial_state.py
logo_ircam.jpeg		logo_ircam.jpeg
requirements.txt		requirements.txt
train.py		train.py
train_lina.py		train_lina.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Lina-Speech: Gated Linear Attention is a Fast and Parameter-Efficient Learner for text-to-speech synthesis

Authors: Théodor Lemerle, Harrison Vanderbyl, Vaibhav Srivastav, Nicolas Obin, Axel Roebel.

Environment setup

Checkpoints

WavTokenizer

Lina-Speech

Inference

Acknowledgments

Cite

Disclaimer

IRCAM

About

Releases

Packages

Contributors 2

Languages

License

theodorblackbird/lina-speech

Folders and files

Latest commit

History

Repository files navigation

Lina-Speech: Gated Linear Attention is a Fast and Parameter-Efficient Learner for text-to-speech synthesis

Authors: Théodor Lemerle, Harrison Vanderbyl, Vaibhav Srivastav, Nicolas Obin, Axel Roebel.

Environment setup

Checkpoints

WavTokenizer

Lina-Speech

Inference

Acknowledgments

Cite

Disclaimer

IRCAM

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages