Skip to content

First step training very slow and high GPU memory #318

@zaidato

Description

@zaidato

Thank you for your great works.
I am using 3x40GB A100 GPUs to train the first step. I set max_len=300 frames. These GPUs just enough for batch_size=5 and training speed is very slow for both mix_precision=fp16 and fp32. Is this normal with styletts2?

Another problem is that loss become NaN after more than 10k steps

INFO:2025-03-05 18:39:06,022: Epoch [1/10], Step [13340/394954], Mel Loss: 0.73000, Gen Loss: 0.00000, Disc Loss: 0.00000, Mono Loss: 0.00000, S2S Loss: 0.00000, SLM Loss: 0.00000
INFO:2025-03-05 18:39:18,793: Epoch [1/10], Step [13350/394954], Mel Loss: 0.71747, Gen Loss: 0.00000, Disc Loss: 0.00000, Mono Loss: 0.00000, S2S Loss: 0.00000, SLM Loss: 0.00000
INFO:2025-03-05 18:39:30,779: Epoch [1/10], Step [13360/394954], Mel Loss: 0.72086, Gen Loss: 0.00000, Disc Loss: 0.00000, Mono Loss: 0.00000, S2S Loss: 0.00000, SLM Loss: 0.00000
INFO:2025-03-05 18:39:43,441: Epoch [1/10], Step [13370/394954], Mel Loss: nan, Gen Loss: 0.00000, Disc Loss: 0.00000, Mono Loss: 0.00000, S2S Loss: 0.00000, SLM Loss: 0.00000
INFO:2025-03-05 18:39:55,656: Epoch [1/10], Step [13380/394954], Mel Loss: nan, Gen Loss: 0.00000, Disc Loss: 0.00000, Mono Loss: 0.00000, S2S Loss: 0.00000, SLM Loss: 0.00000

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions