Description
在ubuntu系统环境下训练,在训练时时提示cuda不支持complexhalf计算,请问一下是cuda安装问题吗
错误日志:
python train.py -c configs/biaobei_base.json -m biaobei_base
[INFO] {'train': {'log_interval': 200, 'eval_interval': 1000, 'seed': 1234, 'epochs': 200, 'learning_rate': 0.0002, 'betas': [0.8, 0.99], 'eps': 1e-09, 'batch_size': 4, 'fp16_run': True, 'lr_decay': 0.999875, 'segment_size': 8192, 'init_lr_ratio': 1, 'warmup_epochs': 0, 'c_mel': 45, 'c_kl': 1.0}, 'data': {'training_files': 'filelists/train_filelist.txt.cleaned', 'validation_files': 'filelists/val_filelist.txt.cleaned', 'text_cleaners': ['chinese_cleaners1'], 'max_wav_value': 32768.0, 'sampling_rate': 22050, 'filter_length': 1024, 'hop_length': 256, 'win_length': 1024, 'n_mel_channels': 80, 'mel_fmin': 0.0, 'mel_fmax': None, 'add_blank': True, 'n_speakers': 0, 'cleaned_text': True}, 'model': {'inter_channels': 192, 'hidden_channels': 192, 'filter_channels': 768, 'n_heads': 2, 'n_layers': 6, 'kernel_size': 3, 'p_dropout': 0.1, 'resblock': '1', 'resblock_kernel_sizes': [3, 7, 11], 'resblock_dilation_sizes': [[1, 3, 5], [1, 3, 5], [1, 3, 5]], 'upsample_rates': [8, 8, 2, 2], 'upsample_initial_channel': 512, 'upsample_kernel_sizes': [16, 16, 4, 4], 'n_layers_q': 3, 'use_spectral_norm': False}, 'model_dir': './logs/biaobei_base'}
[WARNING] /home/zzh/下载/vits-mandarin-biaobei-main is not a git repository, therefore hash value comparison will be ignored.
/home/zzh/.local/lib/python3.8/site-packages/torch/functional.py:572: UserWarning: stft will soon require the return_complex parameter be given for real inputs, and will further require that return_complex=True in a future PyTorch release. (Triggered internally at ../aten/src/ATen/native/SpectralOps.cpp:659.)
return _VF.stft(input, n_fft, hop_length, win_length, window, # type: ignore[attr-defined]
Traceback (most recent call last):
File "train.py", line 295, in
main()
File "train.py", line 55, in main
mp.spawn(run, nprocs=n_gpus, args=(n_gpus, hps,))
File "/home/zzh/.local/lib/python3.8/site-packages/torch/multiprocessing/spawn.py", line 230, in spawn
return start_processes(fn, args, nprocs, join, daemon, start_method='spawn')
File "/home/zzh/.local/lib/python3.8/site-packages/torch/multiprocessing/spawn.py", line 188, in start_processes
while not context.join():
File "/home/zzh/.local/lib/python3.8/site-packages/torch/multiprocessing/spawn.py", line 150, in join
raise ProcessRaisedException(msg, error_index, failed_process.pid)
torch.multiprocessing.spawn.ProcessRaisedException:
-- Process 0 terminated with the following error:
Traceback (most recent call last):
File "/home/zzh/.local/lib/python3.8/site-packages/torch/multiprocessing/spawn.py", line 59, in _wrap
fn(i, *args)
File "/home/zzh/下载/vits-mandarin-biaobei-main/train.py", line 122, in run
train_and_evaluate(rank, epoch, hps, [net_g, net_d], [optim_g, optim_d], [scheduler_g, scheduler_d], scaler, [train_loader, eval_loader], logger, [writer, writer_eval])
File "/home/zzh/下载/vits-mandarin-biaobei-main/train.py", line 195, in train_and_evaluate
scaler.scale(loss_gen_all).backward()
File "/home/zzh/.local/lib/python3.8/site-packages/torch/_tensor.py", line 307, in backward
torch.autograd.backward(self, gradient, retain_graph, create_graph, inputs=inputs)
File "/home/zzh/.local/lib/python3.8/site-packages/torch/autograd/init.py", line 154, in backward
Variable._execution_engine.run_backward(
RuntimeError: "fill_cuda" not implemented for 'ComplexHalf'
Activity
AlexandaJerry commentedon Oct 22, 2022
您好,确实是由于CUDA没法运行
zzhbb2002 commentedon Oct 24, 2022
谢谢您的回复,但我测试了./bandwidthTest ,显示pass
测试了mnistcudnn,显示testpass
为运行visual profiler,安装了java环境,实测cuda安装后的几个软件均可进入
torch.cuda.is_available()返回true
torch 1.10.2+cu111
torchaudio 0.10.2+cu111
torchvision 0.11.3+cu111
cuda版本11.1,请问一下是版本问题吗还是其他什么问题呢?
AlexandaJerry commentedon Oct 27, 2022
根据下方的issue是版本高的问题
pytorch/pytorch#67324
jaywalnut310/vits#15
zzhbb2002 commentedon Nov 8, 2022
请问一下大佬该如何预测这个模型呢
ttkrpink commentedon Mar 24, 2023
jaywalnut310/vits#15 (comment)
works for me torch 1.9+cuda11.1