Skip to content

训练时报错 #2

Open
Open
@zzhbb2002

Description

在ubuntu系统环境下训练,在训练时时提示cuda不支持complexhalf计算,请问一下是cuda安装问题吗
错误日志:
python train.py -c configs/biaobei_base.json -m biaobei_base
[INFO] {'train': {'log_interval': 200, 'eval_interval': 1000, 'seed': 1234, 'epochs': 200, 'learning_rate': 0.0002, 'betas': [0.8, 0.99], 'eps': 1e-09, 'batch_size': 4, 'fp16_run': True, 'lr_decay': 0.999875, 'segment_size': 8192, 'init_lr_ratio': 1, 'warmup_epochs': 0, 'c_mel': 45, 'c_kl': 1.0}, 'data': {'training_files': 'filelists/train_filelist.txt.cleaned', 'validation_files': 'filelists/val_filelist.txt.cleaned', 'text_cleaners': ['chinese_cleaners1'], 'max_wav_value': 32768.0, 'sampling_rate': 22050, 'filter_length': 1024, 'hop_length': 256, 'win_length': 1024, 'n_mel_channels': 80, 'mel_fmin': 0.0, 'mel_fmax': None, 'add_blank': True, 'n_speakers': 0, 'cleaned_text': True}, 'model': {'inter_channels': 192, 'hidden_channels': 192, 'filter_channels': 768, 'n_heads': 2, 'n_layers': 6, 'kernel_size': 3, 'p_dropout': 0.1, 'resblock': '1', 'resblock_kernel_sizes': [3, 7, 11], 'resblock_dilation_sizes': [[1, 3, 5], [1, 3, 5], [1, 3, 5]], 'upsample_rates': [8, 8, 2, 2], 'upsample_initial_channel': 512, 'upsample_kernel_sizes': [16, 16, 4, 4], 'n_layers_q': 3, 'use_spectral_norm': False}, 'model_dir': './logs/biaobei_base'}
[WARNING] /home/zzh/下载/vits-mandarin-biaobei-main is not a git repository, therefore hash value comparison will be ignored.
/home/zzh/.local/lib/python3.8/site-packages/torch/functional.py:572: UserWarning: stft will soon require the return_complex parameter be given for real inputs, and will further require that return_complex=True in a future PyTorch release. (Triggered internally at ../aten/src/ATen/native/SpectralOps.cpp:659.)
return _VF.stft(input, n_fft, hop_length, win_length, window, # type: ignore[attr-defined]
Traceback (most recent call last):
File "train.py", line 295, in
main()
File "train.py", line 55, in main
mp.spawn(run, nprocs=n_gpus, args=(n_gpus, hps,))
File "/home/zzh/.local/lib/python3.8/site-packages/torch/multiprocessing/spawn.py", line 230, in spawn
return start_processes(fn, args, nprocs, join, daemon, start_method='spawn')
File "/home/zzh/.local/lib/python3.8/site-packages/torch/multiprocessing/spawn.py", line 188, in start_processes
while not context.join():
File "/home/zzh/.local/lib/python3.8/site-packages/torch/multiprocessing/spawn.py", line 150, in join
raise ProcessRaisedException(msg, error_index, failed_process.pid)
torch.multiprocessing.spawn.ProcessRaisedException:

-- Process 0 terminated with the following error:
Traceback (most recent call last):
File "/home/zzh/.local/lib/python3.8/site-packages/torch/multiprocessing/spawn.py", line 59, in _wrap
fn(i, *args)
File "/home/zzh/下载/vits-mandarin-biaobei-main/train.py", line 122, in run
train_and_evaluate(rank, epoch, hps, [net_g, net_d], [optim_g, optim_d], [scheduler_g, scheduler_d], scaler, [train_loader, eval_loader], logger, [writer, writer_eval])
File "/home/zzh/下载/vits-mandarin-biaobei-main/train.py", line 195, in train_and_evaluate
scaler.scale(loss_gen_all).backward()
File "/home/zzh/.local/lib/python3.8/site-packages/torch/_tensor.py", line 307, in backward
torch.autograd.backward(self, gradient, retain_graph, create_graph, inputs=inputs)
File "/home/zzh/.local/lib/python3.8/site-packages/torch/autograd/init.py", line 154, in backward
Variable._execution_engine.run_backward(
RuntimeError: "fill_cuda" not implemented for 'ComplexHalf'

Activity

AlexandaJerry

AlexandaJerry commented on Oct 22, 2022

@AlexandaJerry
Owner

您好,确实是由于CUDA没法运行

zzhbb2002

zzhbb2002 commented on Oct 24, 2022

@zzhbb2002
Author

谢谢您的回复,但我测试了./bandwidthTest ,显示pass
测试了mnistcudnn,显示testpass
为运行visual profiler,安装了java环境,实测cuda安装后的几个软件均可进入
torch.cuda.is_available()返回true
torch 1.10.2+cu111
torchaudio 0.10.2+cu111
torchvision 0.11.3+cu111
cuda版本11.1,请问一下是版本问题吗还是其他什么问题呢?

AlexandaJerry

AlexandaJerry commented on Oct 27, 2022

@AlexandaJerry
Owner

根据下方的issue是版本高的问题
pytorch/pytorch#67324
jaywalnut310/vits#15

zzhbb2002

zzhbb2002 commented on Nov 8, 2022

@zzhbb2002
Author

请问一下大佬该如何预测这个模型呢

ttkrpink

ttkrpink commented on Mar 24, 2023

@ttkrpink

jaywalnut310/vits#15 (comment)
works for me torch 1.9+cuda11.1

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

      Participants

      @ttkrpink@AlexandaJerry@zzhbb2002

      Issue actions

        训练时报错 · Issue #2 · AlexandaJerry/vits-mandarin-biaobei