Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adding neural HMM TTS #2271

Closed
wants to merge 83 commits into from
Closed
Show file tree
Hide file tree
Changes from 1 commit
Commits
Show all changes
83 commits
Select commit Hold shift + click to select a range
405bffe
Adding encoder
shivammehta25 Nov 26, 2022
d607993
currently modifying hmm
shivammehta25 Nov 27, 2022
a324920
Adding hmm
shivammehta25 Nov 28, 2022
8628648
Adding overflow
shivammehta25 Nov 30, 2022
6ec83c4
Adding overflow setting up flat start
shivammehta25 Dec 1, 2022
783a982
Removing runs
shivammehta25 Dec 1, 2022
10f15e0
adding normalization parameters
shivammehta25 Dec 1, 2022
aff8b1f
Fixing models on same device
shivammehta25 Dec 1, 2022
62941d6
Training overflow and plotting evaluations
shivammehta25 Dec 2, 2022
f448ea4
Adding inference
shivammehta25 Dec 3, 2022
ff33837
At the end of epoch the test sentences are coming on cpu instead of gpu
shivammehta25 Dec 4, 2022
3edb0d2
Adding figures from model during training to monitor
shivammehta25 Dec 5, 2022
5fc800c
reverting tacotron2 training recipe
shivammehta25 Dec 5, 2022
427dfe5
fixing inference on gpu for test sentences on config
shivammehta25 Dec 5, 2022
ecc12c6
moving helpers and texts within overflows source code
shivammehta25 Dec 5, 2022
b86f3f8
renaming to overflow
shivammehta25 Dec 5, 2022
995ee93
moving loss to the model file
shivammehta25 Dec 5, 2022
5b0fe46
Fixing the rename
shivammehta25 Dec 5, 2022
5377f87
Model training but not plotting the test config sentences's audios
shivammehta25 Dec 5, 2022
bd5be6c
Formatting logs
shivammehta25 Dec 5, 2022
755aa6f
Changing model name to camelcase
shivammehta25 Dec 5, 2022
1350a4b
Fixing test log
shivammehta25 Dec 5, 2022
3c986fd
Fixing plotting bug
shivammehta25 Dec 6, 2022
4a5b1a0
Adding some tests
shivammehta25 Dec 6, 2022
5b1dabc
Merge branch 'coqui-ai:dev' into dev
shivammehta25 Dec 7, 2022
f43d7e3
Adding more tests to overflow
shivammehta25 Dec 8, 2022
c3d0167
Adding all tests for overflow
shivammehta25 Dec 9, 2022
ddefe34
making changes to camel case in config
shivammehta25 Dec 9, 2022
c2df9f3
Adding information about parameters and docstring
shivammehta25 Dec 10, 2022
9927434
removing compute_mel_statistics moved statistic computation to the mo…
shivammehta25 Dec 10, 2022
340cd0b
Added overflow in readme
shivammehta25 Dec 10, 2022
aca3fe1
Adding more test cases, now it doesn't saves transition_p like tensor…
shivammehta25 Dec 11, 2022
e7c11dd
Merge branch 'coqui-ai:dev' into dev
shivammehta25 Dec 14, 2022
7e2dbb1
uncommenting the approximation to stablize the training
shivammehta25 Dec 14, 2022
be09d6c
Merge branch 'coqui-ai:dev' into dev
shivammehta25 Dec 14, 2022
282de93
Merge branch 'coqui-ai:dev' into dev
shivammehta25 Dec 22, 2022
5df4fe8
Adding encoder
shivammehta25 Nov 26, 2022
fa25825
currently modifying hmm
shivammehta25 Nov 27, 2022
3cb0f78
Adding hmm
shivammehta25 Nov 28, 2022
9984afa
Adding overflow
shivammehta25 Nov 30, 2022
4dad45c
Adding overflow setting up flat start
shivammehta25 Dec 1, 2022
377bd3e
Removing runs
shivammehta25 Dec 1, 2022
a441c71
adding normalization parameters
shivammehta25 Dec 1, 2022
995ac14
Fixing models on same device
shivammehta25 Dec 1, 2022
97b985b
Training overflow and plotting evaluations
shivammehta25 Dec 2, 2022
227077a
Adding inference
shivammehta25 Dec 3, 2022
bea46cc
At the end of epoch the test sentences are coming on cpu instead of gpu
shivammehta25 Dec 4, 2022
03d028e
Adding figures from model during training to monitor
shivammehta25 Dec 5, 2022
fc3c641
reverting tacotron2 training recipe
shivammehta25 Dec 5, 2022
c429837
fixing inference on gpu for test sentences on config
shivammehta25 Dec 5, 2022
b804a12
moving helpers and texts within overflows source code
shivammehta25 Dec 5, 2022
3149b43
renaming to overflow
shivammehta25 Dec 5, 2022
8aff87a
moving loss to the model file
shivammehta25 Dec 5, 2022
8d7b0e7
Fixing the rename
shivammehta25 Dec 5, 2022
8aaffed
Model training but not plotting the test config sentences's audios
shivammehta25 Dec 5, 2022
648b2c3
Formatting logs
shivammehta25 Dec 5, 2022
d22c6c0
Changing model name to camelcase
shivammehta25 Dec 5, 2022
6e08e4f
Fixing test log
shivammehta25 Dec 5, 2022
9394ce0
Fixing plotting bug
shivammehta25 Dec 6, 2022
e115361
Adding some tests
shivammehta25 Dec 6, 2022
7a541b9
Adding more tests to overflow
shivammehta25 Dec 8, 2022
1dccc29
Adding all tests for overflow
shivammehta25 Dec 9, 2022
1b1bf1f
making changes to camel case in config
shivammehta25 Dec 9, 2022
916b98e
Adding information about parameters and docstring
shivammehta25 Dec 10, 2022
6eff37c
removing compute_mel_statistics moved statistic computation to the mo…
shivammehta25 Dec 10, 2022
8a8dd1d
Added overflow in readme
shivammehta25 Dec 10, 2022
e738c0c
Adding more test cases, now it doesn't saves transition_p like tensor…
shivammehta25 Dec 11, 2022
479c0cf
Handle espeak 1.48.15 (#2203)
erogol Dec 12, 2022
4f02e2c
Python API implementation (#2195)
erogol Dec 12, 2022
89b9868
Update README (#2204)
erogol Dec 12, 2022
684adb0
Adding missing key to formatter (#2194)
p0p4k Dec 12, 2022
55801cc
Add YourTTS VCTK recipe (#2198)
Edresson Dec 12, 2022
a0be902
Add Original YourTTS vocabulary for full transfer learning (#2206)
Edresson Dec 13, 2022
f3fe409
uncommenting the approximation to stablize the training
shivammehta25 Dec 14, 2022
aedd795
Adding pre-trained Overflow model (#2211)
erogol Dec 14, 2022
253b03f
Fixup overflow (#2218)
erogol Dec 14, 2022
c2ce4fb
Bump up to v0.10.0
erogol Dec 15, 2022
fd5ad8c
Add Ukrainian LADA (female) voice
egorsmkv Dec 16, 2022
1260c7f
Merge branch 'coqui-ai:dev' into dev
shivammehta25 Dec 30, 2022
f73cd29
Merge branch 'coqui-ai:dev' into dev
shivammehta25 Jan 3, 2023
2abbc97
Merge branch 'dev' of github.com:shivammehta25/TTS into dev
shivammehta25 Jan 5, 2023
790b846
Adding a config flag to train neural HMM TTS instead of overflow
shivammehta25 Jan 9, 2023
a8d0b22
Backwards compatibility: Fixing model zoo if the flag is not set, set it
shivammehta25 Jan 9, 2023
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Prev Previous commit
Next Next commit
Adding overflow
  • Loading branch information
shivammehta25 committed Dec 23, 2022
commit 9984afae653c28e1d76bf2183e901e898786895b
24 changes: 15 additions & 9 deletions TTS/tts/layers/neural_hmm/common_layers.py
Original file line number Diff line number Diff line change
Expand Up @@ -22,12 +22,18 @@ class Encoder(nn.Module):
- output: (B, C_in, T)
"""

def __init__(self, state_per_phone, in_out_channels=512):
def __init__(
self, num_chars,
state_per_phone,
in_out_channels=512
):

super().__init__()

self.state_per_phone = state_per_phone
self.in_out_channels = in_out_channels


self.emb = nn.Embedding(num_chars, hidden_channels)
self.convolutions = nn.ModuleList()
for _ in range(3):
self.convolutions.append(ConvBNBlock(in_out_channels, in_out_channels, 5, "relu"))
Expand All @@ -42,8 +48,8 @@ def __init__(self, state_per_phone, in_out_channels=512):
self.rnn_state = None

def forward(self, x, input_lengths):
b, _, T = x.shape
o = x
b, T = x.shape
o = self.emb(x).transpose(1, 2)
for layer in self.convolutions:
o = layer(o)
o = o.transpose(1, 2)
Expand Down Expand Up @@ -73,7 +79,7 @@ class ParameterModel(nn.Module):

def __init__(
self,
parameternetwork: List[int],
outputnet_size: List[int],
input_size: int,
output_size: int,
flat_start_params: dict,
Expand All @@ -83,9 +89,9 @@ def __init__(
self.flat_start_params = flat_start_params

self.layers = nn.ModuleList(
[Linear(inp, out) for inp, out in zip([input_size] + parameternetwork[:-1], parameternetwork)]
[Linear(inp, out) for inp, out in zip([input_size] + outputnet_size[:-1], outputnet_size)]
)
last_layer = self._flat_start_output_layer(parameternetwork[-1], output_size, frame_channels)
last_layer = self._flat_start_output_layer(outputnet_size[-1], output_size, frame_channels)
self.layers.append(last_layer)

def _flat_start_output_layer(self, input_size, output_size, frame_channels):
Expand Down Expand Up @@ -115,7 +121,7 @@ def __init__(
encoder_dim: int,
memory_rnn_dim: int,
frame_channels: int,
parameternetwork: List[int],
outputnet_size: List[int],
flat_start_params: dict,
std_floor: float = 1e-2,
):
Expand All @@ -131,7 +137,7 @@ def __init__(
self._validate_parameters()

self.parametermodel = ParameterModel(
parameternetwork=parameternetwork,
outputnet_size=outputnet_size,
input_size=input_size,
output_size=output_size,
flat_start_params=flat_start_params,
Expand Down
69 changes: 69 additions & 0 deletions TTS/tts/layers/neural_hmm/decoder.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,69 @@
import torch

from TTS.tts.layers.glow_tts.decoder import Decoder as GlowDecoder
from TTS.tts.utils.helpers import sequence_mask


class Decoder(GlowDecoder):
"""Uses glow decoder with some modifications.
::

Squeeze -> ActNorm -> InvertibleConv1x1 -> AffineCoupling -> Unsqueeze

Args:
in_channels (int): channels of input tensor.
hidden_channels (int): hidden decoder channels.
kernel_size (int): Coupling block kernel size. (Wavenet filter kernel size.)
dilation_rate (int): rate to increase dilation by each layer in a decoder block.
num_flow_blocks (int): number of decoder blocks.
num_coupling_layers (int): number coupling layers. (number of wavenet layers.)
dropout_p (float): wavenet dropout rate.
sigmoid_scale (bool): enable/disable sigmoid scaling in coupling layer.
"""

def __init__(
self,
in_channels,
hidden_channels,
kernel_size,
dilation_rate,
num_flow_blocks,
num_coupling_layers,
dropout_p=0.0,
num_splits=4,
num_squeeze=2,
sigmoid_scale=False,
c_in_channels=0
):
super().__init__(
in_channels,
hidden_channels,
kernel_size,
dilation_rate,
num_flow_blocks,
num_coupling_layers,
dropout_p,
num_splits,
num_squeeze,
sigmoid_scale,
c_in_channels
)

def forward(self, x, x_len, g=None, reverse=False):
"""
Shapes:
- x: :math:`[B, C, T]`
- x_len :math:`[B]`
- g: :math:`[B, C]`
"""
x, x_len, x_max_len = self.preprocess(x, x_len, x_len.max())
x_mask = torch.unsqueeze(sequence_mask(x_len, x_max_len), 1).to(x.dtype)
x, logdet_tot = super().forward(x, x_mask, g, reverse)
return x, x_len, logdet_tot

def preprocess(self, y, y_lengths, y_max_length):
if y_max_length is not None:
y_max_length = torch.div(y_max_length, self.n_sqz, rounding_mode="floor") * self.n_sqz
y = y[:, :, :y_max_length]
y_lengths = torch.div(y_lengths, self.n_sqz, rounding_mode="floor") * self.n_sqz
return y, y_lengths, y_max_length
6 changes: 4 additions & 2 deletions TTS/tts/layers/neural_hmm/hmm.py
Original file line number Diff line number Diff line change
Expand Up @@ -39,7 +39,7 @@ def __init__(
prenet_dropout: float,
memory_rnn_dim: int,
prenet_dropout_at_inference: bool,
parameternetwork: List[int],
outputnet_size: List[int],
flat_start_params: dict,
std_floor: float,
):
Expand All @@ -64,7 +64,9 @@ def __init__(
bias=False,
)
self.memory_rnn = nn.LSTMCell(input_size=prenet_dim, hidden_size=memory_rnn_dim)
self.output_net = Outputnet(encoder_dim, memory_rnn_dim, frame_channels, parameternetwork, flat_start_params, std_floor)
self.output_net = Outputnet(
encoder_dim, memory_rnn_dim, frame_channels, outputnet_size, flat_start_params, std_floor
)
self.register_buffer("go_tokens", torch.zeros(ar_order, 1))

def forward(self, inputs, inputs_len, mels, mel_lens):
Expand Down
106 changes: 106 additions & 0 deletions TTS/tts/models/overflow.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,106 @@
import torch
import torch.nn as nn

from TTS.tts.layers.glow_tts.decoder import Decoder
from TTS.tts.layers.neural_hmm.common_layers import Encoder
from TTS.tts.layers.neural_hmm.hmm import HMM
from TTS.tts.models.base_tts import BaseTTS
from TTS.tts.utils.speakers import SpeakerManager
from TTS.tts.utils.text.tokenizer import TTSTokenizer


class OverFlow(BaseTTS):
"""OverFlow TTS model.

Paper::
https://arxiv.org/abs/2211.06892

Paper abstract::
Neural HMMs are a type of neural transducer recently proposed for
sequence-to-sequence modelling in text-to-speech. They combine the best features
of classic statistical speech synthesis and modern neural TTS, requiring less
data and fewer training updates, and are less prone to gibberish output caused
by neural attention failures. In this paper, we combine neural HMM TTS with
normalising flows for describing the highly non-Gaussian distribution of speech
acoustics. The result is a powerful, fully probabilistic model of durations and
acoustics that can be trained using exact maximum likelihood. Compared to
dominant flow-based acoustic models, our approach integrates autoregression for
improved modelling of long-range dependences such as utterance-level prosody.
Experiments show that a system based on our proposal gives more accurate
pronunciations and better subjective speech quality than comparable methods,
whilst retaining the original advantages of neural HMMs. Audio examples and code
are available at https://shivammehta25.github.io/OverFlow/.

Check :class:`TTS.tts.configs.overflow.OverFlowConfig` for class arguments.
"""

def __init__(
self, config: "OverFlowConfig",
ap: "AudioProcessor" = None,
tokenizer: "TTSTokenizer" = None,
speaker_manager: SpeakerManager = None,
):
super().__init__(config, ap, tokenizer, speaker_manager)

# pass all config fields to `self`
# for fewer code change
self.config = config
for key in config:
setattr(self, key, config[key])

self.decoder_output_dim = config.out_channels

self.encoder = Encoder(self.num_char,config.state_per_phone, config.encoder_in_features)
self.hmm = HMM(
self.out_channels,
self.ar_order,
self.encoder_dim,
self.prenet_type,
self.prenet_dim,
self.prenet_dropout,
self.memory_rnn_dim,
self.prenet_dropout_at_inference,
self.outputnet_size,
self.flat_start_params,
self.std_floor
)

self.decoder = Decoder(
self.out_channels,
self.hidden_channels_dec,
self.kernel_size_dec,
self.dilation_rate,
self.num_flow_blocks_dec,
self.num_block_layers,
dropout_p=self.dropout_p_dec,
num_splits=self.num_splits,
num_squeeze=self.num_squeeze,
sigmoid_scale=self.sigmoid_scale,
c_in_channels=self.c_in_channels
)


def forward(
self, text, text_len, mels, mel_len
):
"""
Forward pass for training and computing the log likelihood of a given batch.

Shapes:
Shapes:
text: :math:`[B, T_in]`
text_lengths: :math:`[B]`
mel_specs: :math:`[B, T_out, C]`
mel_lengths: :math:`[B]`
"""
outputs = {
"log_alpha": None
}

encoder_outputs, text_lengths = self.encoder(text, text_lengths)
z, z_lengths, log_det = self.decoder(mels, mel_len)
log_probs = self.hmm(encoder_outputs, text_lengths, z, z_lengths)




1 change: 0 additions & 1 deletion TTS/tts/utils/helpers.py
Original file line number Diff line number Diff line change
Expand Up @@ -282,6 +282,5 @@ def logsumexp(x, dim):

m, _ = x.max(dim=dim)
mask = m == -float("inf")

s = (x - m.masked_fill_(mask, 0).unsqueeze(dim=dim)).exp().sum(dim=dim)
return s.masked_fill_(mask, 1).log() + m.masked_fill_(mask, -float("inf"))
19 changes: 18 additions & 1 deletion tests/tts_tests/test_helpers.py
Original file line number Diff line number Diff line change
@@ -1,6 +1,13 @@
import torch as T

from TTS.tts.utils.helpers import average_over_durations, generate_path, rand_segments, segment, sequence_mask
from TTS.tts.utils.helpers import (
average_over_durations,
generate_path,
logsumexp,
rand_segments,
segment,
sequence_mask,
)


def average_over_durations_test(): # pylint: disable=no-self-use
Expand Down Expand Up @@ -86,3 +93,13 @@ def generate_path_test():
assert all(path[b, t, :current_idx] == 0.0)
assert all(path[b, t, current_idx + durations[b, t].item() :] == 0.0)
current_idx += durations[b, t].item()

def logsumexp_test():
a = T.randn(10) # random numbers
assert T.eq(T.logsumexp(a, dim=0), logsumexp(a, dim=0)).all()

a = T.zeros(10) # all zeros
assert T.eq(T.logsumexp(a, dim=0), logsumexp(a, dim=0)).all()

a = T.ones(10) # all ones
assert T.eq(T.logsumexp(a, dim=0), logsumexp(a, dim=0)).all()
Empty file.