运行报错 #12

Man1978-scd · 2023-04-10T07:47:05Z

当我使用torch1.10.0的时候
执行训练脚本
bash tools/dist_train.sh work_configs/tamper/tamper_convx_b_exp.py 2
报错如下：
RuntimeError: one of the variables needed for gradient computation has been modified by an inplace operation: [torch.cuda.HalfTensor [5, 512, 32, 32]], which is output 0 of ReluBackward0, is at version 1; expected version 0 instead. Hint: enable anomaly detection to find the operation that failed to compute its gradient, with torch.autograd.set_detect_anomaly(True).
报错位置处于mmcv库中runner/epoch_based_runner.py文件中
以为是版本问题
于是我降低torch到1.5,变成另外一个报错
RuntimeError: The size of tensor a (2) must match the size of tensor b (128) at non-singleton dimension 3
我不知道该怎么定位这个问题🤔，恳请作者提供requirements对应的版本，以及使用教程文档🙀

The text was updated successfully, but these errors were encountered:

Man1978-scd · 2023-04-11T07:14:24Z

已经解决🤮

在debug的时候，发现模型在使用混合精度优化时，用到mmcv.runner.hooks.optimizer.py中的模型权重拷贝的函数中

def copy_grads_to_fp32(self, fp16_net, fp32_weights):
    """Copy gradients from fp16 model to fp32 weight copy."""
    for fp32_param, fp16_param in zip(fp32_weights,
                                      fp16_net.parameters()):
        if fp16_param.grad is not None:
            if fp32_param.grad is None:
                fp32_param.grad = fp32_param.data.new(
                    fp32_param.size())
            fp32_param.grad.copy_(fp16_param.grad)

fp32_param 和 fp16_param 的grad维度不一致导致拷贝失败，torch1.5 在返回 fp16_net.parameters 时一会返回weight部分的Tensor，一会又返回bias部分的Tensor ，导致维度不一致，我也是服了。最后升级到torch1.6正常运行🙀

CongYep mentioned this issue Nov 19, 2023

使用temper中的config，更换为自己的数据集，报错RuntimeError: CUDA error: an illegal memory access was encountered terminate called after throwing an instance of 'c10::Error' #13

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

运行报错 #12

运行报错 #12

Man1978-scd commented Apr 10, 2023 •

edited

Loading

Man1978-scd commented Apr 11, 2023

运行报错 #12

运行报错 #12

Comments

Man1978-scd commented Apr 10, 2023 • edited Loading

Man1978-scd commented Apr 11, 2023

Man1978-scd commented Apr 10, 2023 •

edited

Loading