Skip to content

Commit

Permalink
Adding dependency versioning via poetry (metavoiceio#92)
Browse files Browse the repository at this point in the history
  • Loading branch information
lucapericlp authored Mar 13, 2024
1 parent 9078234 commit b236bcf
Show file tree
Hide file tree
Showing 7 changed files with 4,475 additions and 44 deletions.
34 changes: 25 additions & 9 deletions Dockerfile
Original file line number Diff line number Diff line change
@@ -1,14 +1,28 @@
FROM nvidia/cuda:12.1.0-devel-ubuntu22.04 as base

ENV POETRY_NO_INTERACTION=1 \
POETRY_VIRTUALENVS_IN_PROJECT=1 \
POETRY_VIRTUALENVS_CREATE=1 \
POETRY_CACHE_DIR=/tmp/poetry_cache \
DEBIAN_FRONTEND=noninteractive

# Install system dependencies in a single RUN command to reduce layers
# Combine apt-get update, upgrade, and installation of packages. Clean up in the same layer to reduce image size.
RUN apt-get update && \
apt-get upgrade -y && \
apt-get install -y python3.10 python3-pip git wget curl build-essential && \
apt-get install -y python3.10 python3-pip git wget curl build-essential pipx && \
apt-get autoremove -y && \
apt-get clean && \
rm -rf /var/lib/apt/lists/*

# install via pip given ubuntu 22.04 as per docs https://pipx.pypa.io/stable/installation/
RUN python3 -m pip install --user pipx && \
python3 -m pipx ensurepath && \
python3 -m pipx install poetry==1.8.2

# make pipx installs (i.e poetry) available
ENV PATH="/root/.local/bin:${PATH}"

# install ffmpeg
RUN wget https://johnvansickle.com/ffmpeg/builds/ffmpeg-git-amd64-static.tar.xz &&\
wget https://johnvansickle.com/ffmpeg/builds/ffmpeg-git-amd64-static.tar.xz.md5 &&\
Expand All @@ -19,15 +33,17 @@ RUN wget https://johnvansickle.com/ffmpeg/builds/ffmpeg-git-amd64-static.tar.xz

WORKDIR /app

COPY requirements.txt requirements.txt
COPY pyproject.toml poetry.lock ./
RUN touch README.md # poetry will complain otherwise

RUN pip install --no-cache-dir packaging wheel torch
RUN pip install --no-cache-dir audiocraft # HACK: installation fails within the requirements.txt
RUN pip install --no-cache-dir -r requirements.txt
RUN pip install --no-cache-dir --upgrade torch torchaudio
RUN poetry install --without dev --no-root
RUN poetry run python -m pip install torch==2.2.1 torchaudio==2.2.1 && \
rm -rf $POETRY_CACHE_DIR

COPY . .
COPY fam ./fam
COPY serving.py ./
COPY app.py ./

RUN pip install --no-cache-dir -e .
RUN poetry install --only-root

ENTRYPOINT ["python3.10", "serving.py"]
ENTRYPOINT ["poetry", "run", "python", "serving.py"]
26 changes: 15 additions & 11 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,7 @@
<p>
<a href="https://ttsdemo.themetavoice.xyz/"><b>Playground</b></a> | <a target="_blank" style="display: inline-block; vertical-align: middle" href="https://colab.research.google.com/github/metavoiceio/metavoice-src/blob/main/colab_demo.ipynb">
<img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/>
</a>
</a>
</p>

MetaVoice-1B is a 1.2B parameter base model trained on 100K hours of speech for TTS (text-to-speech). It has been built with the following priorities:
Expand All @@ -29,11 +29,12 @@ Server
docker-compose up -d server && docker-compose ps && docker-compose logs -f
```

## Installation
## Installation

**Pre-requisites:**
- GPU VRAM >=12GB
- Python >=3.10,<3.12
- pipx ([installation instructions](https://pipx.pypa.io/stable/installation/))

**Environment setup**
```bash
Expand All @@ -47,16 +48,19 @@ rm -rf ffmpeg-git-*

# install rust if not installed (ensure you've restarted your terminal after installation)
curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | sh
# install poetry if not installed (ensure you've restarted your terminal after installation)
pipx install poetry

pip install -r requirements.txt
pip install --upgrade torch torchaudio  # for torch.compile improvements
pip install -e .
# if running from Linux, keyring backend can hang on `poetry install`. This prevents that.
export PYTHON_KEYRING_BACKEND=keyring.backends.fail.Keyring

poetry install && poetry run pip install torch==2.2.1 torchaudio==2.2.1
```

## Usage
1. Download it and use it anywhere (including locally) with our [reference implementation](/fam/llm/fast_inference.py)
```bash
python -i fam/llm/fast_inference.py
poetry run python -i fam/llm/fast_inference.py

# Run e.g. of API usage within the interactive python session
tts.synthesise(text="This is a demo of text to speech by MetaVoice-1B, an open-source foundational audio model.", spk_ref_path="assets/bria.mp3")
Expand All @@ -67,8 +71,8 @@ tts.synthesise(text="This is a demo of text to speech by MetaVoice-1B, an open-s
2. Deploy it on any cloud (AWS/GCP/Azure), using our [inference server](serving.py) or [web UI](app.py)
```bash
python serving.py
python app.py
poetry run python serving.py
poetry run python app.py
```

3. Use it via [Hugging Face](https://huggingface.co/metavoiceio)
Expand All @@ -91,11 +95,11 @@ We predict EnCodec tokens from text, and speaker information. This is then diffu
- Note that we've skipped predicting semantic tokens as done in other works, as we found that this isn't strictly necessary.
* We use a non-causal (encoder-style) transformer to predict the rest of the 6 hierarchies from the first two hierarchies. This is a super small model (~10Mn parameters), and has extensive zero-shot generalisation to most speakers we've tried. Since it's non-causal, we're also able to predict all the timesteps in parallel.
* We use multi-band diffusion to generate waveforms from the EnCodec tokens. We noticed that the speech is clearer than using the original RVQ decoder or VOCOS. However, the diffusion at waveform level leaves some background artifacts which are quite unpleasant to the ear. We clean this up in the next step.
* We use DeepFilterNet to clear up the artifacts introduced by the multi-band diffusion.
* We use DeepFilterNet to clear up the artifacts introduced by the multi-band diffusion.

## Optimizations
The model supports:
1. KV-caching via Flash Decoding
The model supports:
1. KV-caching via Flash Decoding
2. Batching (including texts of different lengths)

## Contribute
Expand Down
2 changes: 1 addition & 1 deletion docker-compose.yml
Original file line number Diff line number Diff line change
Expand Up @@ -51,7 +51,7 @@ services:
ui:
<<: *common-settings
container_name: metavoice-ui
entrypoint: [ "python3.10", "app.py" ]
entrypoint: [ "poetry", "run", "python", "app.py" ]
ports:
- 7861:7861
healthcheck:
Expand Down
Loading

0 comments on commit b236bcf

Please sign in to comment.