Skip to content

Commit

Permalink
docs: use nested contents for easier overview
Browse files Browse the repository at this point in the history
  • Loading branch information
eginhard committed Dec 12, 2024
1 parent e23766d commit ae2f8d2
Show file tree
Hide file tree
Showing 14 changed files with 71 additions and 26 deletions.
Original file line number Diff line number Diff line change
@@ -1,7 +1,9 @@
(formatting_your_dataset)=
# Formatting your dataset

For training a TTS model, you need a dataset with speech recordings and transcriptions. The speech must be divided into audio clips and each clip needs transcription.
For training a TTS model, you need a dataset with speech recordings and
transcriptions. The speech must be divided into audio clips and each clip needs
a transcription.

If you have a single audio file and you need to split it into clips, there are different open-source tools for you. We recommend Audacity. It is an open-source and free audio editing software.

Expand Down
12 changes: 12 additions & 0 deletions docs/source/datasets/index.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,12 @@
# Datasets

For training a TTS model, you need a dataset with speech recordings and
transcriptions. See the following pages for more information on:

```{toctree}
:maxdepth: 1
formatting_your_dataset
what_makes_a_good_dataset
tts_datasets
```
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
# TTS datasets
# Public TTS datasets

Some of the known public datasets that we successfully applied 🐸TTS:
Some of the known public datasets that were successfully used for 🐸TTS:

- [English - LJ Speech](https://keithito.com/LJ-Speech-Dataset/)
- [English - Nancy](http://www.cstr.ed.ac.uk/projects/blizzard/2011/lessac_blizzard2011/)
Expand Down
File renamed without changes.
Original file line number Diff line number Diff line change
Expand Up @@ -36,7 +36,8 @@
There is also the `callback` interface by which you can manipulate both the model and the `Trainer` states. Callbacks give you
an infinite flexibility to add custom behaviours for your model and training routines.

For more details, see [BaseTTS](main_classes/model_api.md#base-tts-model) and :obj:`TTS.utils.callbacks`.
For more details, see [BaseTTS](../main_classes/model_api.md#base-tts-model)
and `TTS.utils.callbacks`.

6. Optionally, define `MyModelArgs`.

Expand Down
14 changes: 14 additions & 0 deletions docs/source/extension/index.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,14 @@
# Adding models or languages

You can extend Coqui by implementing new model architectures or adding front
ends for new languages. See the pages below for more details. The [project
structure](../project_structure.md) and [contribution
guidelines](../contributing.md) may also be helpful. Please open a pull request
with your changes to share back the improvements with the community.

```{toctree}
:maxdepth: 1
implementing_a_new_model
implementing_a_new_language_frontend
```
4 changes: 2 additions & 2 deletions docs/source/faq.md
Original file line number Diff line number Diff line change
Expand Up @@ -7,7 +7,7 @@ We tried to collect common issues and questions we receive about 🐸TTS. It is
- If you feel like it's a bug to be fixed, then prefer Github issues with the same level of scrutiny.

## What are the requirements of a good 🐸TTS dataset?
- [See this page](what_makes_a_good_dataset.md)
- [See this page](datasets/what_makes_a_good_dataset.md)

## How should I choose the right model?
- First, train Tacotron. It is smaller and faster to experiment with. If it performs poorly, try Tacotron2.
Expand All @@ -18,7 +18,7 @@ We tried to collect common issues and questions we receive about 🐸TTS. It is
## How can I train my own `tts` model?
0. Check your dataset with notebooks in [dataset_analysis](https://github.com/idiap/coqui-ai-TTS/tree/main/notebooks/dataset_analysis) folder. Use [this notebook](https://github.com/idiap/coqui-ai-TTS/blob/main/notebooks/dataset_analysis/CheckSpectrograms.ipynb) to find the right audio processing parameters. A better set of parameters results in a better audio synthesis.

1. Write your own dataset `formatter` in `datasets/formatters.py` or format your dataset as one of the supported datasets, like LJSpeech.
1. Write your own dataset `formatter` in `datasets/formatters.py` or [format](datasets/formatting_your_dataset) your dataset as one of the supported datasets, like LJSpeech.
A `formatter` parses the metadata file and converts a list of training samples.

2. If you have a dataset with a different alphabet than English, you need to set your own character list in the ```config.json```.
Expand Down
16 changes: 7 additions & 9 deletions docs/source/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,10 +4,10 @@
```
----

# Documentation Content
```{toctree}
:maxdepth: 1
:caption: Get started
:hidden:
tutorial_for_nervous_beginners
installation
Expand All @@ -20,22 +20,19 @@ contributing
```{toctree}
:maxdepth: 1
:caption: Using Coqui
:hidden:
inference
training_a_model
finetuning
implementing_a_new_model
implementing_a_new_language_frontend
formatting_your_dataset
what_makes_a_good_dataset
tts_datasets
marytts
training/index
extension/index
datasets/index
```


```{toctree}
:maxdepth: 1
:caption: Main Classes
:hidden:
configuration
main_classes/trainer_api
Expand All @@ -50,6 +47,7 @@ main_classes/speaker_manager
```{toctree}
:maxdepth: 1
:caption: TTS Models
:hidden:
models/glow_tts.md
models/vits.md
Expand Down
9 changes: 7 additions & 2 deletions docs/source/inference.md
Original file line number Diff line number Diff line change
Expand Up @@ -86,8 +86,8 @@ tts --model_name "voice_conversion/<language>/<dataset>/<model_name>"

You can boot up a demo 🐸TTS server to run an inference with your models (make
sure to install the additional dependencies with `pip install coqui-tts[server]`).
Note that the server is not optimized for performance but gives you an easy way
to interact with the models.
Note that the server is not optimized for performance and does not support all
Coqui models yet.

The demo server provides pretty much the same interface as the CLI command.

Expand Down Expand Up @@ -192,3 +192,8 @@ api.tts_with_vc_to_file(
file_path="ouptut.wav"
)
```

```{toctree}
:hidden:
marytts
```
Original file line number Diff line number Diff line change
Expand Up @@ -22,7 +22,7 @@ them and fine-tune it for your own dataset. This will help you in two main ways:
speech dataset and achieve reasonable results with only a couple of hours of data.

However, note that, fine-tuning does not ensure great results. The model
performance still depends on the [dataset quality](what_makes_a_good_dataset.md)
performance still depends on the [dataset quality](../datasets/what_makes_a_good_dataset.md)
and the hyper-parameters you choose for fine-tuning. Therefore,
it still takes a bit of tinkering.

Expand All @@ -32,7 +32,7 @@ them and fine-tune it for your own dataset. This will help you in two main ways:
1. Setup your dataset.

You need to format your target dataset in a certain way so that 🐸TTS data loader will be able to load it for the
training. Please see [this page](formatting_your_dataset.md) for more information about formatting.
training. Please see [this page](../datasets/formatting_your_dataset.md) for more information about formatting.

2. Choose the model you want to fine-tune.

Expand All @@ -49,7 +49,7 @@ them and fine-tune it for your own dataset. This will help you in two main ways:
You should choose the model based on your requirements. Some models are fast and some are better in speech quality.
One lazy way to test a model is running the model on the hardware you want to use and see how it works. For
simple testing, you can use the `tts` command on the terminal. For more info
see [here](inference.md).
see [here](../inference.md).

3. Download the model.

Expand Down
10 changes: 10 additions & 0 deletions docs/source/training/index.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,10 @@
# Training and fine-tuning

The following pages show you how to train and fine-tune Coqui models:

```{toctree}
:maxdepth: 1
training_a_model
finetuning
```
Original file line number Diff line number Diff line change
Expand Up @@ -11,11 +11,10 @@

3. Check the recipes.

Recipes are located under `TTS/recipes/`. They do not promise perfect models but they provide a good start point for
`Nervous Beginners`.
Recipes are located under `TTS/recipes/`. They do not promise perfect models but they provide a good start point.
A recipe for `GlowTTS` using `LJSpeech` dataset looks like below. Let's be creative and call this `train_glowtts.py`.

```{literalinclude} ../../recipes/ljspeech/glow_tts/train_glowtts.py
```{literalinclude} ../../../recipes/ljspeech/glow_tts/train_glowtts.py
```

You need to change fields of the `BaseDatasetConfig` to match your dataset and then update `GlowTTSConfig`
Expand Down Expand Up @@ -113,7 +112,7 @@

Note that different models have different metrics, visuals and outputs.

You should also check the [FAQ page](https://github.com/coqui-ai/TTS/wiki/FAQ) for common problems and solutions
You should also check the [FAQ page](../faq.md) for common problems and solutions
that occur in a training.

7. Use your best model for inference.
Expand Down Expand Up @@ -142,5 +141,5 @@ d-vectors. For using d-vectors, you first need to compute the d-vectors using th

The same Glow-TTS model above can be trained on a multi-speaker VCTK dataset with the script below.

```{literalinclude} ../../recipes/vctk/glow_tts/train_glow_tts.py
```{literalinclude} ../../../recipes/vctk/glow_tts/train_glow_tts.py
```
6 changes: 5 additions & 1 deletion docs/source/tutorial_for_nervous_beginners.md
Original file line number Diff line number Diff line change
Expand Up @@ -24,10 +24,14 @@ $ tts-server --list_models # list the available models.
```
![server.gif](https://github.com/idiap/coqui-ai-TTS/raw/main/images/demo_server.gif)

See [this page](inference.md) for more details on synthesizing speech with the
CLI, server or Python API.

## Training a `tts` Model

A breakdown of a simple script that trains a GlowTTS model on the LJspeech dataset. See the comments for more details.
A breakdown of a simple script that trains a GlowTTS model on the LJspeech
dataset. For a more in-depth guide to training and fine-tuning also see [this
page](training/index.md).

### Pure Python Way

Expand Down

0 comments on commit ae2f8d2

Please sign in to comment.