CSM Voice Clone - CPU Edition

This is a modified version of Sesame's CSM (Conversational Speech Model) that adds voice cloning capabilities and runs on Windows WSL with CPU-only (no GPU required).

CSM generates high-quality speech from text using the CSM-1B model and Llama-3.2-1B backbone.

What's New

✅ Voice cloning support via voice_clone.py
✅ CPU-only execution on Windows WSL
✅ Optimized dependencies for CPU builds

Full Setup Guide: WSL (CPU-only)

1. Install system dependencies

sudo apt update && sudo apt upgrade -y
sudo apt install -y python3 python3-venv python3-pip git build-essential \
                    libsndfile1 ffmpeg

Why these packages?

libsndfile1, ffmpeg → Required for torchaudio and pydub
build-essential → Compilation tools for Python packages

2. Clone the repo into Linux home

mkdir -p ~/code && cd ~/code
git clone <your-csm-repo-url> csm
cd csm

⚠️ Important: Avoid /mnt/c/... paths. Always keep the project in your Linux home directory (~) for better speed and stability.

3. Create & activate a virtual environment

python3 -m venv .venv
source .venv/bin/activate

You should see (.venv) in your shell prompt.

4. Fix requirements.txt for CPU builds

At the top of requirements.txt, add these lines:

--index-url https://download.pytorch.org/whl/cpu
--extra-index-url https://pypi.org/simple

Then ensure it contains:

torch==2.4.1
torchaudio==2.4.1
tokenizers==0.21.0
transformers==4.49.0
huggingface_hub==0.28.1
moshi==0.2.2
torchtune==0.4.0
torchao==0.9.0
silentcipher @ git+https://github.com/SesameAILabs/silentcipher@master

5. Upgrade pip tooling inside venv

python -m pip install --upgrade pip setuptools wheel certifi

6. Install dependencies

This will pull CPU-only wheels for PyTorch:

python -m pip install --no-cache-dir -r requirements.txt

7. Sanity check your environment

python - <<'PY'
import torch, torchaudio, soundfile
print("torch:", torch.__version__)
print("torchaudio:", torchaudio.__version__)
print("cuda available:", torch.cuda.is_available())
print("torch.version.cuda:", torch.version.cuda)
print("libsndfile OK")
PY

✅ Expected output:

torch: 2.4.1+cpu
torchaudio: 2.4.1
cuda available: False
torch.version.cuda: None
libsndfile OK

8. Disable Mimi lazy compilation

export NO_TORCH_COMPILE=1

Add this to your ~/.bashrc to make it permanent:

echo 'export NO_TORCH_COMPILE=1' >> ~/.bashrc
source ~/.bashrc

9. Login to Hugging Face

You need access to:

huggingface-cli login

Enter your Hugging Face token when prompted.

Running Voice Clone

Prepare Your Voice Sample

Before you can clone your voice, you need to prepare a voice sample:

Record a 30-second audio sample of your voice
- Use any recording app (phone, computer, etc.)
- Speak naturally and clearly
- Choose content that represents your normal speaking style
- Save as .wav or .m4a format
Create a transcript of exactly what you said in the recording
- This should match the audio word-for-word
- Accuracy is important for better cloning results
Place your audio file in the data/ folder
- Example: data/my_voice_sample.wav

Configure Voice Clone Settings

Copy the example config:

cp data/voice_clone_config.example.json data/voice_clone_config.json

Edit data/voice_clone_config.json:

{
  "voice_prompt_file": "data/my_voice_sample.wav",
  "prompt_transcript": "Your exact transcript here...",
  "voiceover_script": [
    "First sentence to generate in your cloned voice.",
    "Second sentence to generate.",
    "Add as many as you need."
  ]
}

Basic Usage

python voice_clone.py

This will:

Load the CSM-1B model
Use your voice prompt audio (configure in data/voice_clone_config.json)
Generate cloned speech from text input

Configuration

Edit data/voice_clone_config.json to customize:

Voice prompt audio file
Text to generate
Output settings
Model parameters

Troubleshooting

Issue: `torch.cuda.is_available()` returns `True` but you want CPU

Set the device explicitly:

export CUDA_VISIBLE_DEVICES=""

Issue: Slow performance on CPU

This is expected. CPU inference is significantly slower than GPU. For faster generation:

Use shorter text prompts
Reduce max_audio_length_ms
Consider cloud GPU options if speed is critical

Issue: `ImportError: libsndfile.so`

Install the missing library:

sudo apt install -y libsndfile1

Issue: ffmpeg errors

Install ffmpeg:

sudo apt install -y ffmpeg

License

This project is based on CSM by Sesame AI Labs. Please refer to the LICENSE file for terms and conditions.

Ethical Use ⚠️

This tool provides high-quality voice cloning capabilities. Please use it responsibly:

✅ Get explicit consent before cloning someone's voice
❌ Do not use for impersonation, fraud, or deception
❌ Do not create misleading or harmful content
❌ Do not violate any laws or regulations

You are responsible for how you use this technology. Use it ethically and legally.

Credits

Original CSM: Sesame AI Labs
Authors: Johan Schalkwyk, Ankit Kumar, Dan Lyth, Sefik Emre Eskimez, Zack Hodari, Cinjon Resnick, Ramon Sanabria, Raven Jiang, and the Sesame team
Modifications: Voice cloning and CPU support added by this fork

Name		Name	Last commit message	Last commit date
Latest commit History 25 Commits
data		data
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
generator.py		generator.py
models.py		models.py
requirements.txt		requirements.txt
run_csm.py		run_csm.py
setup.py		setup.py
voice_clone.py		voice_clone.py
watermarking.py		watermarking.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

CSM Voice Clone - CPU Edition

What's New

Full Setup Guide: WSL (CPU-only)

1. Install system dependencies

2. Clone the repo into Linux home

3. Create & activate a virtual environment

4. Fix requirements.txt for CPU builds

5. Upgrade pip tooling inside venv

6. Install dependencies

7. Sanity check your environment

8. Disable Mimi lazy compilation

9. Login to Hugging Face

Running Voice Clone

Prepare Your Voice Sample

Configure Voice Clone Settings

Basic Usage

Configuration

Troubleshooting

Issue: `torch.cuda.is_available()` returns `True` but you want CPU

Issue: Slow performance on CPU

Issue: `ImportError: libsndfile.so`

Issue: ffmpeg errors

License

Ethical Use ⚠️

Credits

About

Uh oh!

Releases

Packages

Languages

License

TianqiZhang/csm

Folders and files

Latest commit

History

Repository files navigation

CSM Voice Clone - CPU Edition

What's New

Full Setup Guide: WSL (CPU-only)

1. Install system dependencies

2. Clone the repo into Linux home

3. Create & activate a virtual environment

4. Fix requirements.txt for CPU builds

5. Upgrade pip tooling inside venv

6. Install dependencies

7. Sanity check your environment

8. Disable Mimi lazy compilation

9. Login to Hugging Face

Running Voice Clone

Prepare Your Voice Sample

Configure Voice Clone Settings

Basic Usage

Configuration

Troubleshooting

Issue: torch.cuda.is_available() returns True but you want CPU

Issue: Slow performance on CPU

Issue: ImportError: libsndfile.so

Issue: ffmpeg errors

License

Ethical Use ⚠️

Credits

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Issue: `torch.cuda.is_available()` returns `True` but you want CPU

Issue: `ImportError: libsndfile.so`

Packages