Skip to content

Commit

Permalink
Add bark docs
Browse files Browse the repository at this point in the history
  • Loading branch information
erogol committed Jun 29, 2023
1 parent a035b25 commit 797eab2
Show file tree
Hide file tree
Showing 2 changed files with 104 additions and 0 deletions.
1 change: 1 addition & 0 deletions docs/source/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -52,6 +52,7 @@
models/tacotron1-2.md
models/overflow.md
models/tortoise.md
models/bark.md
.. toctree::
:maxdepth: 2
Expand Down
103 changes: 103 additions & 0 deletions docs/source/models/bark.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,103 @@
# Bark 🐶

Bark is a multi-lingual TTS model created by [Suno-AI](https://www.suno.ai/). It can generate conversational speech as well as music and sound effects.
It is architecturally very similar to Google's [AudioLM](https://arxiv.org/abs/2209.03143). For more information, please refer to the [Suno-AI's repo](https://github.com/suno-ai/bark).


## Acknowledgements
- 👑[Suno-AI](https://www.suno.ai/) for training and open-sourcing this model.
- 👑[serp-ai](https://github.com/serp-ai/bark-with-voice-clone) for controlled voice cloning.


## Example Use

```python
text = "Hello, my name is Manmay , how are you?"

from TTS.tts.configs.bark_config import BarkConfig
from TTS.tts.models.bark import Bark

config = BarkConfig()
model = Bark.init_from_config(config)
model.load_checkpoint(config, checkpoint_dir="path/to/model/dir/", eval=True)

# with random speaker
output_dict = model.synthesize(text, config, speaker_id="random", voice_dirs=None)

# cloning a speaker.
# It assumes that you have a speaker file in `bark_voices/speaker_n/speaker.wav` or `bark_voices/speaker_n/speaker.npz`
output_dict = model.synthesize(text, config, speaker_id="ljspeech", voice_dirs="bark_voices/")
```

Using 🐸TTS API:

```python
from TTS.api import TTS

# Load the model to GPU
# Bark is really slow on CPU, so we recommend using GPU.
tts = TTS("tts_models/multilingual/multi-dataset/bark", gpu=True)


# Cloning a new speaker
# This expects to find a mp3 or wav file like `bark_voices/new_speaker/speaker.wav`
# It computes the cloning values and stores in `bark_voices/new_speaker/speaker.npz`
tts.tts_to_file(text="Hello, my name is Manmay , how are you?",
file_path="output.wav",
voice_dir="bark_voices/",
speaker="ljspeech")


# When you run it again it uses the stored values to generate the voice.
tts.tts_to_file(text="Hello, my name is Manmay , how are you?",
file_path="output.wav",
voice_dir="bark_voices/",
speaker="ljspeech")


# random speaker
tts = TTS("tts_models/multilingual/multi-dataset/bark", gpu=True)
tts.tts_to_file("hello world", file_path="out.wav")
```

Using 🐸TTS Command line:

```console
# cloning the `ljspeech` voice
tts --model_name tts_models/multilingual/multi-dataset/bark \
--text "This is an example." \
--out_path "output.wav" \
--voice_dir bark_voices/ \
--speaker_idx "ljspeech" \
--progress_bar True

# Random voice generation
tts --model_name tts_models/multilingual/multi-dataset/bark \
--text "This is an example." \
--out_path "output.wav" \
--progress_bar True
```


## Important resources & papers
- Original Repo: https://github.com/suno-ai/bark
- Cloning implementation: https://github.com/serp-ai/bark-with-voice-clone
- AudioLM: https://arxiv.org/abs/2209.03143

## BarkConfig
```{eval-rst}
.. autoclass:: TTS.tts.configs.bark_config.BarkConfig
:members:
```

## BarkArgs
```{eval-rst}
.. autoclass:: TTS.tts.models.bark.BarkArgs
:members:
```

## Bark Model
```{eval-rst}
.. autoclass:: TTS.tts.models.bark.Bark
:members:
```

0 comments on commit 797eab2

Please sign in to comment.