lina-speech (beta)

Exploring "linear attention" for text-to-speech.

It predicts audio codec "à la" MusicGen : delayed residual vector quantizers so that we do not need multiple models.

Featuring RWKV, Mamba, Gated Linear Attention.

Compared to other LM TTS model :

Can be easily pretrained and finetuned on midrange GPUs.
Tiny memory footprint.
Trained on long context (up to 2000 tokens : ~27s).

Models

Model	#Params	Dataset	Checkpoint	Steps	Note
GLA	60M, 130M	Librilight-medium	Download	300k	GPU inference only
Mamba	60M	Librilight-medium	Download	300k	GPU inference only
RWKV v6	60M	LibriTTS	Download	150k	GPU inference only

Installation

Following the linear complexity LM you choose, follow respective instructions first:

For Mamba check the official repo.
For GLA/RWKV inference check flash-linear-attention.
For RWKV training check RWKV-LM

Inference

Download configuration and weights above, then check Inference.ipynb.

TODO

Fix RWKV6 inference and/or switch to FLA implem.
Provide a Datamodule for training (lhotse might also work well).
Implement CFG.
Scale up.

Acknowledgment

The RWKV authors and the community around for carrying high-level truly opensource research.
@SmerkyG for making my life easy at testing cutting edge language model.
@lucidrains for its huge codebase.
@sustcsonglin who made GLA and FLA.
@harrisonvanderbyl fixing RWKV inference.

Cite

@software{lemerle2024linaspeech,
  title  = {LinaSpeech: Exploring "linear attention" for text-to-speech.},
  author = {Lemerle, Théodor},
  url    = {https://github.com/theodorblackbird/lina-speech},
  month  = april,
  year   = {2024}
}

IRCAM

This work is performed in the Analysis/Synthesis team of the STMS Laboratory at IRCAM, and is part of the following project: ANR Exovoices

Name		Name	Last commit message	Last commit date
Latest commit History 51 Commits
configs		configs
examples		examples
model		model
prompt		prompt
Inference.ipynb		Inference.ipynb
LICENSE		LICENSE
README.md		README.md
bpe256.json		bpe256.json
dataset.py		dataset.py
inference.py		inference.py
logo_ircam.jpeg		logo_ircam.jpeg
requirements.txt		requirements.txt
testmodel.py		testmodel.py
train.py		train.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

lina-speech (beta)

Models

Installation

Inference

TODO

Acknowledgment

Cite

IRCAM

About

Releases

Packages

Languages

License

harrisonvanderbyl/lina-speech

Folders and files

Latest commit

History

Repository files navigation

lina-speech (beta)

Models

Installation

Inference

TODO

Acknowledgment

Cite

IRCAM

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages