Skip to content

RuntimeError: Trying to backward through the graph a second time #1379

@FrancescoSaverioZuppichini

Description

Thanks for reporting the unexpected results and we appreciate it a lot.

Describe the Issue
GradientCumulativeOptimizerHook doesn't work.

Reproduction

  1. What command, code, or script did you run?
    Add GradientCumulativeOptimizerHook to your *_config.py file
custom_hooks = [
    dict(type="GradientCumulativeOptimizerHook", cumulative_iters=4),
]

Output

2021-09-27 14:21:26,889 - mmdet - WARNING - GradientCumulativeOptimizerHook may slightly decrease performance if the model has BatchNorm layers.
Traceback (most recent call last):
  File "/home/zuppif/integration-object-detection/playground.py", line 81, in <module>
    main(Args(config_file, cfg_options=options))
  File "/home/zuppif/integration-object-detection/src/train.py", line 185, in main
    train_detector(
  File "/home/zuppif/integration-object-detection/.venv/lib/python3.9/site-packages/mmdet/apis/train.py", line 174, in train_detector
    runner.run(data_loaders, cfg.workflow)
  File "/home/zuppif/integration-object-detection/.venv/lib/python3.9/site-packages/mmcv/runner/epoch_based_runner.py", line 127, in run
    epoch_runner(data_loaders[i], **kwargs)
  File "/home/zuppif/integration-object-detection/.venv/lib/python3.9/site-packages/mmcv/runner/epoch_based_runner.py", line 51, in train
    self.call_hook('after_train_iter')
  File "/home/zuppif/integration-object-detection/.venv/lib/python3.9/site-packages/mmcv/runner/base_runner.py", line 307, in call_hook
    getattr(hook, fn_name)(self)
  File "/home/zuppif/integration-object-detection/.venv/lib/python3.9/site-packages/mmcv/runner/hooks/optimizer.py", line 115, in after_train_iter
    loss.backward()
  File "/home/zuppif/integration-object-detection/.venv/lib/python3.9/site-packages/torch/_tensor.py", line 255, in backward
    torch.autograd.backward(self, gradient, retain_graph, create_graph, inputs=inputs)
  File "/home/zuppif/integration-object-detection/.venv/lib/python3.9/site-packages/torch/autograd/__init__.py", line 147, in backward
    Variable._execution_engine.run_backward(
RuntimeError: Trying to backward through the graph a second time (or directly access saved variables after they have already been freed). Saved intermediate values of the graph are freed when you call .backward() or autograd.grad(). Specify retain_graph=True if you need to backward through the graph a second time or if you need to access saved variables after calling backward.

Environment

  1. Please run python -c "from mmcv.utils import collect_env; print(collect_env())"
{'sys.platform': 'linux', 'Python': '3.9.5 (default, Jun  4 2021, 12:28:51) [GCC 7.5.0]', 'CUDA available': True, 'GPU 0,1,2': 'GeForce GTX 1080 Ti', 'CUDA_HOME': '/usr/local/cuda', 'NVCC': 'Build cuda_11.2.r11.2/compiler.29373293_0', 'GCC': 'gcc (Ubuntu 9.3.0-17ubuntu1~20.04) 9.3.0', 'PyTorch': '1.9.1+cu102', 'PyTorch compiling details': 'PyTorch built with:\n  - GCC 7.3\n  - C++ Version: 201402\n  - Intel(R) Math Kernel Library Version 2020.0.0 Product Build 20191122 for Intel(R) 64 architecture applications\n  - Intel(R) MKL-DNN v2.1.2 (Git Hash 98be7e8afa711dc9b66c8ff3504129cb82013cdb)\n  - OpenMP 201511 (a.k.a. OpenMP 4.5)\n  - NNPACK is enabled\n  - CPU capability usage: AVX2\n  - CUDA Runtime 10.2\n  - NVCC architecture flags: -gencode;arch=compute_37,code=sm_37;-gencode;arch=compute_50,code=sm_50;-gencode;arch=compute_60,code=sm_60;-gencode;arch=compute_70,code=sm_70\n  - CuDNN 7.6.5\n  - Magma 2.5.2\n  - Build settings: BLAS_INFO=mkl, BUILD_TYPE=Release, CUDA_VERSION=10.2, CUDNN_VERSION=7.6.5, CXX_COMPILER=/opt/rh/devtoolset-7/root/usr/bin/c++, CXX_FLAGS= -Wno-deprecated -fvisibility-inlines-hidden -DUSE_PTHREADPOOL -fopenmp -DNDEBUG -DUSE_KINETO -DUSE_FBGEMM -DUSE_QNNPACK -DUSE_PYTORCH_QNNPACK -DUSE_XNNPACK -DSYMBOLICATE_MOBILE_DEBUG_HANDLE -O2 -fPIC -Wno-narrowing -Wall -Wextra -Werror=return-type -Wno-missing-field-initializers -Wno-type-limits -Wno-array-bounds -Wno-unknown-pragmas -Wno-sign-compare -Wno-unused-parameter -Wno-unused-variable -Wno-unused-function -Wno-unused-result -Wno-unused-local-typedefs -Wno-strict-overflow -Wno-strict-aliasing -Wno-error=deprecated-declarations -Wno-stringop-overflow -Wno-psabi -Wno-error=pedantic -Wno-error=redundant-decls -Wno-error=old-style-cast -fdiagnostics-color=always -faligned-new -Wno-unused-but-set-variable -Wno-maybe-uninitialized -fno-math-errno -fno-trapping-math -Werror=format -Wno-stringop-overflow, LAPACK_INFO=mkl, PERF_WITH_AVX=1, PERF_WITH_AVX2=1, PERF_WITH_AVX512=1, TORCH_VERSION=1.9.1, USE_CUDA=ON, USE_CUDNN=ON, USE_EXCEPTION_PTR=1, USE_GFLAGS=OFF, USE_GLOG=OFF, USE_MKL=ON, USE_MKLDNN=ON, USE_MPI=OFF, USE_NCCL=ON, USE_NNPACK=ON, USE_OPENMP=ON, \n', 'TorchVision': '0.10.1+cu102', 'OpenCV': '4.5.3', 'MMCV': '1.3.13', 'MMCV Compiler': 'GCC 9.3', 'MMCV CUDA Compiler': '11.2'}

Error traceback
If applicable, paste the error traceback here.

2021-09-27 14:21:26,889 - mmdet - WARNING - GradientCumulativeOptimizerHook may slightly decrease performance if the model has BatchNorm layers.
Traceback (most recent call last):
  File "/home/zuppif/integration-object-detection/playground.py", line 81, in <module>
    main(Args(config_file, cfg_options=options))
  File "/home/zuppif/integration-object-detection/src/train.py", line 185, in main
    train_detector(
  File "/home/zuppif/integration-object-detection/.venv/lib/python3.9/site-packages/mmdet/apis/train.py", line 174, in train_detector
    runner.run(data_loaders, cfg.workflow)
  File "/home/zuppif/integration-object-detection/.venv/lib/python3.9/site-packages/mmcv/runner/epoch_based_runner.py", line 127, in run
    epoch_runner(data_loaders[i], **kwargs)
  File "/home/zuppif/integration-object-detection/.venv/lib/python3.9/site-packages/mmcv/runner/epoch_based_runner.py", line 51, in train
    self.call_hook('after_train_iter')
  File "/home/zuppif/integration-object-detection/.venv/lib/python3.9/site-packages/mmcv/runner/base_runner.py", line 307, in call_hook
    getattr(hook, fn_name)(self)
  File "/home/zuppif/integration-object-detection/.venv/lib/python3.9/site-packages/mmcv/runner/hooks/optimizer.py", line 115, in after_train_iter
    loss.backward()
  File "/home/zuppif/integration-object-detection/.venv/lib/python3.9/site-packages/torch/_tensor.py", line 255, in backward
    torch.autograd.backward(self, gradient, retain_graph, create_graph, inputs=inputs)
  File "/home/zuppif/integration-object-detection/.venv/lib/python3.9/site-packages/torch/autograd/__init__.py", line 147, in backward
    Variable._execution_engine.run_backward(
RuntimeError: Trying to backward through the graph a second time (or directly access saved variables after they have already been freed). Saved intermediate values of the graph are freed when you call .backward() or autograd.grad(). Specify retain_graph=True if you need to backward through the graph a second time or if you need to access saved variables after calling backward.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions