Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adding neural HMM TTS #2271

Closed
wants to merge 83 commits into from
Closed
Show file tree
Hide file tree
Changes from 1 commit
Commits
Show all changes
83 commits
Select commit Hold shift + click to select a range
405bffe
Adding encoder
shivammehta25 Nov 26, 2022
d607993
currently modifying hmm
shivammehta25 Nov 27, 2022
a324920
Adding hmm
shivammehta25 Nov 28, 2022
8628648
Adding overflow
shivammehta25 Nov 30, 2022
6ec83c4
Adding overflow setting up flat start
shivammehta25 Dec 1, 2022
783a982
Removing runs
shivammehta25 Dec 1, 2022
10f15e0
adding normalization parameters
shivammehta25 Dec 1, 2022
aff8b1f
Fixing models on same device
shivammehta25 Dec 1, 2022
62941d6
Training overflow and plotting evaluations
shivammehta25 Dec 2, 2022
f448ea4
Adding inference
shivammehta25 Dec 3, 2022
ff33837
At the end of epoch the test sentences are coming on cpu instead of gpu
shivammehta25 Dec 4, 2022
3edb0d2
Adding figures from model during training to monitor
shivammehta25 Dec 5, 2022
5fc800c
reverting tacotron2 training recipe
shivammehta25 Dec 5, 2022
427dfe5
fixing inference on gpu for test sentences on config
shivammehta25 Dec 5, 2022
ecc12c6
moving helpers and texts within overflows source code
shivammehta25 Dec 5, 2022
b86f3f8
renaming to overflow
shivammehta25 Dec 5, 2022
995ee93
moving loss to the model file
shivammehta25 Dec 5, 2022
5b0fe46
Fixing the rename
shivammehta25 Dec 5, 2022
5377f87
Model training but not plotting the test config sentences's audios
shivammehta25 Dec 5, 2022
bd5be6c
Formatting logs
shivammehta25 Dec 5, 2022
755aa6f
Changing model name to camelcase
shivammehta25 Dec 5, 2022
1350a4b
Fixing test log
shivammehta25 Dec 5, 2022
3c986fd
Fixing plotting bug
shivammehta25 Dec 6, 2022
4a5b1a0
Adding some tests
shivammehta25 Dec 6, 2022
5b1dabc
Merge branch 'coqui-ai:dev' into dev
shivammehta25 Dec 7, 2022
f43d7e3
Adding more tests to overflow
shivammehta25 Dec 8, 2022
c3d0167
Adding all tests for overflow
shivammehta25 Dec 9, 2022
ddefe34
making changes to camel case in config
shivammehta25 Dec 9, 2022
c2df9f3
Adding information about parameters and docstring
shivammehta25 Dec 10, 2022
9927434
removing compute_mel_statistics moved statistic computation to the mo…
shivammehta25 Dec 10, 2022
340cd0b
Added overflow in readme
shivammehta25 Dec 10, 2022
aca3fe1
Adding more test cases, now it doesn't saves transition_p like tensor…
shivammehta25 Dec 11, 2022
e7c11dd
Merge branch 'coqui-ai:dev' into dev
shivammehta25 Dec 14, 2022
7e2dbb1
uncommenting the approximation to stablize the training
shivammehta25 Dec 14, 2022
be09d6c
Merge branch 'coqui-ai:dev' into dev
shivammehta25 Dec 14, 2022
282de93
Merge branch 'coqui-ai:dev' into dev
shivammehta25 Dec 22, 2022
5df4fe8
Adding encoder
shivammehta25 Nov 26, 2022
fa25825
currently modifying hmm
shivammehta25 Nov 27, 2022
3cb0f78
Adding hmm
shivammehta25 Nov 28, 2022
9984afa
Adding overflow
shivammehta25 Nov 30, 2022
4dad45c
Adding overflow setting up flat start
shivammehta25 Dec 1, 2022
377bd3e
Removing runs
shivammehta25 Dec 1, 2022
a441c71
adding normalization parameters
shivammehta25 Dec 1, 2022
995ac14
Fixing models on same device
shivammehta25 Dec 1, 2022
97b985b
Training overflow and plotting evaluations
shivammehta25 Dec 2, 2022
227077a
Adding inference
shivammehta25 Dec 3, 2022
bea46cc
At the end of epoch the test sentences are coming on cpu instead of gpu
shivammehta25 Dec 4, 2022
03d028e
Adding figures from model during training to monitor
shivammehta25 Dec 5, 2022
fc3c641
reverting tacotron2 training recipe
shivammehta25 Dec 5, 2022
c429837
fixing inference on gpu for test sentences on config
shivammehta25 Dec 5, 2022
b804a12
moving helpers and texts within overflows source code
shivammehta25 Dec 5, 2022
3149b43
renaming to overflow
shivammehta25 Dec 5, 2022
8aff87a
moving loss to the model file
shivammehta25 Dec 5, 2022
8d7b0e7
Fixing the rename
shivammehta25 Dec 5, 2022
8aaffed
Model training but not plotting the test config sentences's audios
shivammehta25 Dec 5, 2022
648b2c3
Formatting logs
shivammehta25 Dec 5, 2022
d22c6c0
Changing model name to camelcase
shivammehta25 Dec 5, 2022
6e08e4f
Fixing test log
shivammehta25 Dec 5, 2022
9394ce0
Fixing plotting bug
shivammehta25 Dec 6, 2022
e115361
Adding some tests
shivammehta25 Dec 6, 2022
7a541b9
Adding more tests to overflow
shivammehta25 Dec 8, 2022
1dccc29
Adding all tests for overflow
shivammehta25 Dec 9, 2022
1b1bf1f
making changes to camel case in config
shivammehta25 Dec 9, 2022
916b98e
Adding information about parameters and docstring
shivammehta25 Dec 10, 2022
6eff37c
removing compute_mel_statistics moved statistic computation to the mo…
shivammehta25 Dec 10, 2022
8a8dd1d
Added overflow in readme
shivammehta25 Dec 10, 2022
e738c0c
Adding more test cases, now it doesn't saves transition_p like tensor…
shivammehta25 Dec 11, 2022
479c0cf
Handle espeak 1.48.15 (#2203)
erogol Dec 12, 2022
4f02e2c
Python API implementation (#2195)
erogol Dec 12, 2022
89b9868
Update README (#2204)
erogol Dec 12, 2022
684adb0
Adding missing key to formatter (#2194)
p0p4k Dec 12, 2022
55801cc
Add YourTTS VCTK recipe (#2198)
Edresson Dec 12, 2022
a0be902
Add Original YourTTS vocabulary for full transfer learning (#2206)
Edresson Dec 13, 2022
f3fe409
uncommenting the approximation to stablize the training
shivammehta25 Dec 14, 2022
aedd795
Adding pre-trained Overflow model (#2211)
erogol Dec 14, 2022
253b03f
Fixup overflow (#2218)
erogol Dec 14, 2022
c2ce4fb
Bump up to v0.10.0
erogol Dec 15, 2022
fd5ad8c
Add Ukrainian LADA (female) voice
egorsmkv Dec 16, 2022
1260c7f
Merge branch 'coqui-ai:dev' into dev
shivammehta25 Dec 30, 2022
f73cd29
Merge branch 'coqui-ai:dev' into dev
shivammehta25 Jan 3, 2023
2abbc97
Merge branch 'dev' of github.com:shivammehta25/TTS into dev
shivammehta25 Jan 5, 2023
790b846
Adding a config flag to train neural HMM TTS instead of overflow
shivammehta25 Jan 9, 2023
a8d0b22
Backwards compatibility: Fixing model zoo if the flag is not set, set it
shivammehta25 Jan 9, 2023
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Prev Previous commit
Next Next commit
Adding overflow setting up flat start
  • Loading branch information
shivammehta25 committed Dec 23, 2022
commit 4dad45c639cfcb1df15dfe73b0afc2edf48be071
110 changes: 110 additions & 0 deletions TTS/tts/configs/overflow_config.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,110 @@
from dataclasses import dataclass, field
from typing import List

from TTS.tts.configs.shared_configs import BaseTTSConfig


@dataclass
class OverFlowConfig(BaseTTSConfig):
"""
Define parameters for OverFlow model.

Args:
BaseTTSConfig (_type_): _description_
"""
model: str = "overflow"

# data parameters
normalize_mel: bool = True
normalized_mel_parameter_path: str = None

# Encoder parameters
num_chars: int = None
state_per_phone: int = 2
encoder_in_out_features: int = 512
encoder_n_convolutions: int = 3

# HMM parameters
out_channels: int = 80
ar_order: int = 1
sampling_temp: float = 0.667
deterministic_transition: bool = True
duration_threshold: float = 0.55
use_grad_checkpointing: bool = True

## Prenet parameters
prenet_type: str = "original"
prenet_dim: int = 256
prenet_n_layers: int = 2
prenet_dropout: float = 0.5
prenet_dropout_at_inference: bool = False
memory_rnn_dim: int = 1024

## Outputnet parameters
outputnet_size: List[int] = field(default_factory=lambda: [256, 256])
flat_start_params: dict = field(
default_factory=lambda: {
"mean": 0.0,
"std": 1.0,
"transition_p": 0.14
}
)
std_floor: float = 0.01

# Decoder parameters
hidden_channels_dec: int = 150
kernel_size_dec: int = 5
dilation_rate: int = 1
num_flow_blocks_dec: int = 12
num_block_layers: int = 4
dropout_p_dec: float = 0.05
num_splits: int = 4
num_squeeze: int = 2
sigmoid_scale: bool = False
c_in_channels: int = 0

# optimizer parameters
optimizer: str = "RAdam"
optimizer_params: dict = field(default_factory=lambda: {"betas": [0.9, 0.998], "weight_decay": 1e-6})
lr_scheduler: str = "NoamLR"
lr_scheduler_params: dict = field(default_factory=lambda: {"warmup_steps": 4000})
grad_clip: float = 40000.0
lr: float = 1e-3

# overrides
min_seq_len: int = 3
max_seq_len: int = 500

# testing
test_sentences: List[str] = field(
default_factory=lambda: [
"It took me quite a long time to develop a voice, and now that I have it I'm not going to be silent.",
"Be a voice, not an echo.",
"I'm sorry Dave. I'm afraid I can't do that.",
"This cake is great. It's so delicious and moist.",
"Prior to November 22, 1963.",
]
)


# Extra needed config
# Do not change overflow does not use them
r: int = 1
use_d_vector_file: bool = False

def check_values(self):
"""Validate the hyperparameters.

Raises:
AssertionError: when the parameters network is not defined
AssertionError: transition probability is not between 0 and 1
"""
assert (
self.parameternetwork >= 1
), f"Parameter Network must have atleast one layer check the config file for parameter network. Provided: {self.parameternetwork}"
assert (
0 < self.flat_start_params["transition_p"] < 1
), f"Transition probability must be between 0 and 1. Provided: {self.flat_start_params['transition_p']}"

if self.normalize_mel:
assert self.normalized_mel_parameter_path is not None, "Normalized mel parameter path must be provided when normalize_mel is True."
2 changes: 2 additions & 0 deletions TTS/tts/datasets/dataset.py
Original file line number Diff line number Diff line change
Expand Up @@ -65,6 +65,7 @@ def __init__(
use_noise_augment: bool = False,
start_by_longest: bool = False,
verbose: bool = False,
compute_mel_statistics: bool = False,
):
"""Generic 📂 data loader for `tts` models. It is configurable for different outputs and needs.

Expand Down Expand Up @@ -140,6 +141,7 @@ def __init__(
self.language_id_mapping = language_id_mapping
self.use_noise_augment = use_noise_augment
self.start_by_longest = start_by_longest
self.compute_mel_statistics = compute_mel_statistics

self.verbose = verbose
self.rescue_item_idx = 1
Expand Down
22 changes: 22 additions & 0 deletions TTS/tts/layers/losses.py
Original file line number Diff line number Diff line change
Expand Up @@ -872,3 +872,25 @@ def forward(

return_dict["loss"] = loss
return return_dict


class NLLLoss(nn.Module):
"""Negative log likelihood loss."""

def __init__(self):
super().__init__()

def forward(self, log_prob: torch.Tensor) -> dict:
"""Compute the loss.

Args:
logits (Tensor): [B, T, D]

Returns:
Tensor: [1]

"""
return_dict = {}
return_dict["loss"] = - log_prob.mean()
return return_dict

74 changes: 50 additions & 24 deletions TTS/tts/layers/neural_hmm/common_layers.py
Original file line number Diff line number Diff line change
Expand Up @@ -25,17 +25,18 @@ class Encoder(nn.Module):
def __init__(
self, num_chars,
state_per_phone,
in_out_channels=512
in_out_channels=512,
n_convolutions=3
):

super().__init__()

self.state_per_phone = state_per_phone
self.in_out_channels = in_out_channels

self.emb = nn.Embedding(num_chars, hidden_channels)
self.emb = nn.Embedding(num_chars, in_out_channels)
self.convolutions = nn.ModuleList()
for _ in range(3):
for _ in range(n_convolutions):
self.convolutions.append(ConvBNBlock(in_out_channels, in_out_channels, 5, "relu"))
self.lstm = nn.LSTM(
in_out_channels,
Expand Down Expand Up @@ -68,22 +69,20 @@ class ParameterModel(nn.Module):
Note: Do not put dropout layers here, the model will not converge.

Args:
parameternetwork (List[int]): the architecture of the parameter model
outputnet_size (List[int]): the architecture of the parameter model
input_size (int): size of input for the first layer
output_size (int): size of output i.e size of the feature dim
frame_channels (int): feature dim to set the flat start bias
init_transition_probability (float): flat start transition probability
init_mean (float): flat start mean
init_std (float): flat start std
flat_start_params (dict): flat start parameters to set the bias
"""

def __init__(
self,
outputnet_size: List[int],
input_size: int,
output_size: int,
flat_start_params: dict,
frame_channels: int,
flat_start_params: dict,
):
super().__init__()
self.flat_start_params = flat_start_params
Expand Down Expand Up @@ -134,8 +133,6 @@ def __init__(
input_size = memory_rnn_dim + encoder_dim
output_size = 2 * frame_channels + 1

self._validate_parameters()

self.parametermodel = ParameterModel(
outputnet_size=outputnet_size,
input_size=input_size,
Expand All @@ -144,20 +141,6 @@ def __init__(
frame_channels=frame_channels,
)

def _validate_parameters(self):
"""Validate the hyperparameters.

Raises:
AssertionError: when the parameters network is not defined
AssertionError: transition probability is not between 0 and 1
"""
assert (
self.parameternetwork >= 1
), f"Parameter Network must have atleast one layer check the config file for parameter network. Provided: {self.parameternetwork}"
assert (
0 < self.flat_start_params["transition_p"] < 1
), f"Transition probability must be between 0 and 1. Provided: {self.flat_start_params['transition_p']}"

def forward(self, ar_mels, inputs):
r"""Inputs observation and returns the means, stds and transition probability for the current state

Expand Down Expand Up @@ -205,3 +188,46 @@ def _floor_std(self, std):
"[*] Standard deviation was floored! The model is preventing overfitting, nothing serious to worry about"
)
return std


class OverFlowUtils:
@staticmethod
def get_data_parameters_for_flat_start(data_loader: torch.utils.data.DataLoader, out_channels: int, states_per_phone: int):
"""Generates data parameters for flat starting the HMM.

Args:
data_loader (torch.utils.data.Dataloader): _description_
out_channels (int): mel spectrogram channels
states_per_phone (_type_): HMM states per phone
"""

# State related information for transition_p
total_state_len = 0
total_mel_len = 0

# Useful for data mean an std
total_mel_sum = 0
total_mel_sq_sum = 0

for batch in tqdm(data_loader, leave=False):
text_lengths = batch['token_id_lengths']
mels = batch['mel']
mel_lengths = batch['mel_lengths']

total_state_len += torch.sum(text_lengths)
total_mel_len += torch.sum(mel_lengths)
total_mel_sum += torch.sum(mels)
total_mel_sq_sum += torch.sum(torch.pow(mels, 2))

data_mean = total_mel_sum / (total_mel_len * out_channels)
data_std = torch.sqrt((total_mel_sq_sum / (total_mel_len * out_channels)) - torch.pow(data_mean, 2))
average_num_states = total_state_len / len(data_loader.dataset)
average_mel_len = total_mel_len / len(data_loader.dataset)
average_duration_each_state = average_mel_len / average_num_states
init_transition_prob = 1 / average_duration_each_state

return data_mean, data_std, init_transition_prob




Loading