Skip to content

Commit

Permalink
docs: info on python version and min. gpu memory (metavoiceio#34)
Browse files Browse the repository at this point in the history
* docs: info on python version and min. gpu mem

* revert: edit

* remove: duplicate

* remove: duplicate

* revert: to union typing

* update: python versions

---------

Co-authored-by: sid <sid@themetavoice.xyz>
  • Loading branch information
sidroopdaska and sid authored Feb 10, 2024
1 parent 0df797a commit 7282924
Show file tree
Hide file tree
Showing 5 changed files with 17 additions and 8 deletions.
5 changes: 4 additions & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -11,7 +11,10 @@ We’re releasing MetaVoice-1B under the Apache 2.0 license, *it can be used wit

Try out the [demo](https://ttsdemo.themetavoice.xyz/)!

## Installation
## Installation

**Pre-requisites:** Python >=3.10,<3.12; GPU with >=24GB RAM.

```bash
# install ffmpeg
wget https://johnvansickle.com/ffmpeg/builds/ffmpeg-git-amd64-static.tar.xz
Expand Down
4 changes: 2 additions & 2 deletions fam/llm/decoders.py
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@
import pathlib
import uuid
from abc import ABC, abstractmethod
from typing import Callable, Optional
from typing import Callable, Optional, Union

import julius
import torch
Expand Down Expand Up @@ -63,7 +63,7 @@ def get_tokens(self, audio_path: str) -> list[list[int]]:

def decode(
self, tokens: list[list[int]], causal: bool = True, ref_audio_path: Optional[str] = None
) -> str | torch.Tensor:
) -> Union[str, torch.Tensor]:
# TODO: this has strange behaviour -- if causal is True, it returns tokens. if causal is False, it SAVES the audio file.
text_ids, extracted_audio_ids = self._data_adapter_fn(tokens)
text = self.tokeniser_decode_fn(text_ids)
Expand Down
6 changes: 3 additions & 3 deletions fam/llm/sample.py
Original file line number Diff line number Diff line change
Expand Up @@ -8,7 +8,7 @@
import tempfile
from contextlib import nullcontext
from dataclasses import dataclass
from typing import List, Literal, Optional, Type
from typing import List, Literal, Optional, Type, Union

import librosa
import torch
Expand Down Expand Up @@ -452,7 +452,7 @@ def _sample_utterance_batch(
spkemb_model,
first_stage_model,
second_stage_model,
enhancer: Optional[Literal["df"] | BaseEnhancer],
enhancer: Optional[Union[Literal["df"], BaseEnhancer]],
first_stage_ckpt_path: str,
second_stage_ckpt_path: str,
guidance_scale: Optional[float],
Expand Down Expand Up @@ -530,7 +530,7 @@ def sample_utterance(
spkemb_model,
first_stage_model,
second_stage_model,
enhancer: Optional[Literal["df"] | BaseEnhancer],
enhancer: Optional[Union[Literal["df"], BaseEnhancer]],
first_stage_ckpt_path: str,
second_stage_ckpt_path: str,
guidance_scale: Optional[float],
Expand Down
8 changes: 7 additions & 1 deletion pyproject.toml
Original file line number Diff line number Diff line change
@@ -1,3 +1,9 @@
[project]
name = "metavoice"
version = "0.1.0"
description = "Foundational model for text to speech"
requires-python = ">=3.10,<3.12"

[tool.black]
line-length = 120
exclude = '''
Expand All @@ -12,4 +18,4 @@ exclude = '''
'''

[tool.isort]
profile = "black"
profile = "black"
2 changes: 1 addition & 1 deletion requirements.txt
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,7 @@ librosa
tqdm
tiktoken==0.5.1
audiocraft
numpy<1.25
numpy
ninja
flash-attn
fastapi
Expand Down

0 comments on commit 7282924

Please sign in to comment.