-
Notifications
You must be signed in to change notification settings - Fork 807
Description
I've been trying to get TensorflowTTS to train on Cloud TPUs because they're really fast and easy to access with the TRC, starting with MB-MelGAN+HiFi-GAN discriminator. I've already implemented all changes, including dataloader overhauls to use TFRecords and Google Cloud required here. When I try to train, however, I get this cryptic error, both in TF 2.5.0 and nightly (I didn't use TF 2.3.1 because it allocates something wrongly to the CPU causing another error).
[[cond_1]]
[[TPUReplicate/_compile/_10135486412832257275/_4]]
[[TPUReplicate/_compile/_10135486412832257275/_4/_76]]
(4) Invalid argument: {{function_node __inference__one_step_forward_179257}} Output shapes of then and else branches do not match: (f32[64,<=8192], f32[64,<=8192]) vs. (f32[64,<=8192], f32[0])
[64,<=8192] are [batch_size, batch_max_steps]
Here's the full training log:
train_log.txt
I can't figure out what causes this issue, no matter what I try. Any idea? Being able to train on TPUs would be really beneficial and within reach. I can provide specific instructions to replicate the issue, but it requires a Google Cloud with storage even if using Colab TPU (Tensorflow 2.x refuses to save and load data from local filesystem when using TPU). The same code, including TFRecord dataloader, trains fine on GPU.