🔀 DyCAST

A variable-frame-rate 16 kHz speech codec based on FocalCodec.

📜 Papers:
🔊 Downstream Tasks: https://github.com/lucadellalib/audiocodecs

📌 Available Checkpoints

Checkpoint	Sample Rate (kHz)	Frame Rate (Hz)	Codebooks	Bitrate (kbps)	Dataset
lucadellalib/dycast	16	6.2 - 17.5	1 x (4^32)	0.40 - 1.12	LibriTTS-960

🛠️️ Installation

First of all, install Python 3.8 or later. Then, open a terminal and run:

pip install faiss-cpu huggingface-hub numpy safetensors soundfile torch torchaudio torchcodec transformers

▶️ Quickstart

NOTE: the audios directory contains audio samples that you can download and use to test the codec.

You can easily load the model using torch.hub without cloning the repository:

import torch
import torchaudio

# Load DyCAST model
codec = torch.hub.load(
    repo_or_dir="lucadellalib/dycast",
    model="dycast",
    config="lucadellalib/dycast",
    force_reload=True,  # Fetch the latest version from Torch Hub
)
codec.eval().requires_grad_(False)

# Load and preprocess the input audio
audio_file = "audios/librispeech-dev-clean/251-118436-0003.wav"
sig, sample_rate = torchaudio.load(audio_file)
sig = torchaudio.functional.resample(sig, sample_rate, codec.sample_rate_input)

# Forward
toks, pcodes, rec_sig = codec(sig)
print("Tokens:")
print(toks.shape)
print(toks)

print("Pooled codes:")
print(pcodes.shape)
print(pcodes)

# Save the reconstructed audio
rec_sig = torchaudio.functional.resample(rec_sig, codec.sample_rate_output, sample_rate)
torchaudio.save("reconstruction.wav", rec_sig, sample_rate)

Alternatively, you can install DyCAST as a standard Python package using pip:

pip install dycast@git+https://github.com/lucadellalib/dycast.git@main#egg=dycast

Once installed, you can import it in your scripts:

import dycast

config = "lucadellalib/dycast"
codec = dycast.DyCAST.from_pretrained(config)

Check the code documentation for more details on model usage and available configurations.

🎤 Running the Demo

Clone or download and extract the repository, navigate to <path-to-repository>, open a terminal and run:

python dycast/codec.py audios/librispeech-dev-clean/251-118436-0003.wav

Reconstructed audio samples using different inference modes can be found in the reconstructions directory.

@ Citing

@article{dellalibera2026dycast,
    title   = {Beyond Fixed Frames: Dynamic Character-Aligned Speech Tokenization},
    author  = {Luca {Della Libera} and Cem Subakan and Mirco Ravanelli},
    journal = {arXiv preprint arXiv:2601.23174},
    year    = {2026},
}

@article{dellalibera2025focalcodecstream,
    title   = {{FocalCodec-Stream}: Streaming Low-Bitrate Speech Coding via Causal Distillation},
    author  = {Luca {Della Libera} and Cem Subakan and Mirco Ravanelli},
    journal = {arXiv preprint arXiv:2509.16195},
    year    = {2025},
}

@inproceedings{dellalibera2025focalcodec,
    title     = {{FocalCodec}: Low-Bitrate Speech Coding via Focal Modulation Networks},
    author    = {Luca {Della Libera} and Francesco Paissan and Cem Subakan and Mirco Ravanelli},
    booktitle = {Advances in Neural Information Processing Systems},
    year      = {2025},
}

📧 Contact

luca.dellalib@gmail.com

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
audios		audios
dycast		dycast
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
LICENSE		LICENSE
MANIFEST.in		MANIFEST.in
NOTICE		NOTICE
README.md		README.md
dycast.png		dycast.png
hubconf.py		hubconf.py
requirements.txt		requirements.txt
setup.cfg		setup.cfg
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

🔀 DyCAST

📌 Available Checkpoints

🛠️️ Installation

▶️ Quickstart

🎤 Running the Demo

@ Citing

📧 Contact

About

Uh oh!

Contributors 1

Languages

Folders and files

Latest commit

History

Repository files navigation

🔀 DyCAST

📌 Available Checkpoints

🛠️️ Installation

▶️ Quickstart

🎤 Running the Demo

@ Citing

📧 Contact

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Contributors 1

Languages