SNAC 🍿

Multi-Scale Neural Audio Codec (SNAC) compresses audio into discrete codes at a low bitrate. For more information, read the paper: https://arxiv.org/abs/2410.14411

🎸 Music samples	🗣️ Speech samples
snac-audio-samples.mp4	speech-samples.mp4

🎧 More audio samples available at https://hubertsiuzdak.github.io/snac/

Overview

SNAC encodes audio into hierarchical tokens similarly to SoundStream, EnCodec, and DAC (see the image on the left). However, SNAC introduces a simple change where coarse tokens are sampled less frequently, covering a broader time span (see the image on the right).

This can not only save on bitrate, but more importantly this might be very useful for language modeling approaches to audio generation. E.g. with coarse tokens of ~10 Hz and a context window of 2048 you can effectively model a consistent structure of an audio track for ~3 minutes.

Pretrained models

Currently, all models support only single audio channel (mono).

Model	Bitrate	Sample Rate	Params	Recommended use case
hubertsiuzdak/snac_24khz	0.98 kbps	24 kHz	19.8 M	🗣️ Speech
hubertsiuzdak/snac_32khz	1.9 kbps	32 kHz	54.5 M	🎸 Music / Sound Effects
hubertsiuzdak/snac_44khz	2.6 kbps	44 kHz	54.5 M	🎸 Music / Sound Effects

Usage

Install it using:

pip install snac

To encode (and decode) audio with SNAC in Python, use the following code:

import torch
from snac import SNAC

model = SNAC.from_pretrained("hubertsiuzdak/snac_32khz").eval().cuda()
audio = torch.randn(1, 1, 32000).cuda()  # placeholder for actual audio with shape (B, 1, T)

with torch.inference_mode():
    codes = model.encode(audio)
    audio_hat = model.decode(codes)

You can also encode and reconstruct in a single call:

with torch.inference_mode():
    audio_hat, codes = model(audio)

⚠️ Note that codes is a list of token sequences of variable lengths, each corresponding to a different temporal resolution.

>>> [code.shape[1] for code in codes]
[12, 24, 48, 96]

Acknowledgements

Module definitions are adapted from the Descript Audio Codec.

Citation

If this code contributes to your research, please cite our work:

@inproceedings{siuzdak2024snac,
  title={SNAC: Multi-Scale Neural Audio Codec},
  author={Siuzdak, Hubert and Gr{\"o}tschla, Florian and Lanzend{\"o}rfer, Luca A},
  booktitle={Audio Imagination: NeurIPS 2024 Workshop AI-Driven Speech, Music, and Sound Generation},
  year={2024}
}

Name		Name	Last commit message	Last commit date
Latest commit History 20 Commits
.github/workflows		.github/workflows
img		img
snac		snac
.gitignore		.gitignore
CITATION.cff		CITATION.cff
LICENSE		LICENSE
README.md		README.md
requirements.txt		requirements.txt
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

SNAC 🍿

Overview

Pretrained models

Usage

Acknowledgements

Citation

About

Uh oh!

Releases 4

Packages

Used by 136

Contributors 2

Languages

License

hubertsiuzdak/snac

Folders and files

Latest commit

History

Repository files navigation

SNAC 🍿

Overview

Pretrained models

Usage

Acknowledgements

Citation

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 4

Packages 0

Used by 136

Contributors 2

Languages

Packages