Multiple optimizers + half precision + skip first optimization step Bug #7792

matyushinleonid · 2021-06-01T14:04:02Z

🐛 Bug

When first batch is skipped (return None in training_step) in multiple optimizers (different model parameters belong to different optimizers) and half-precision setting, the following error appears (you can find full traceback on colab):

/usr/local/lib/python3.7/dist-packages/torch/cuda/amp/grad_scaler.py in step(self, optimizer, *args, **kwargs)
    335             self.unscale_(optimizer)
    336 
--> 337         assert len(optimizer_state["found_inf_per_device"]) > 0, "No inf checks were recorded for this optimizer."
    338 
    339         retval = self._maybe_opt_step(optimizer, optimizer_state, *args, **kwargs)

AssertionError: No inf checks were recorded for this optimizer.

Note that

There are no errors when I skip any other batch (not first)
There are no errors when I skip first batch in single optimizer setting
There are no errors when I skip first batch in full-precision setting

So all three conditions (multiple optimizers, half-precision, first batch) are crucial to reproduce this bug.

Please reproduce using the BoringModel

https://colab.research.google.com/drive/1a4XCOSumDxy2B3ywu6TrIgx5j1COcSfu?usp=sharing

Expected behavior

Users should be able to skip an optimization step whenever they want.

Environment

CUDA:
- GPU:
  - Tesla K80
- available: True
- version: 10.1
Packages:
- numpy: 1.19.5
- pyTorch_debug: False
- pyTorch_version: 1.8.1+cu101
- pytorch-lightning: 1.3.3
- tqdm: 4.41.1
System:
- OS: Linux
- architecture:
  - 64bit
- processor: x86_64
- python: 3.7.10
- version: Proposal for help #1 SMP Tue Apr 20 19:55:43 PDT 2021

The text was updated successfully, but these errors were encountered:

yifuwang · 2021-06-01T22:31:02Z

Seems like the same problems as: #4524

carmocca · 2021-06-01T23:24:09Z

Correct, closing as duplicate.

This is briefly mentioned in the first note of https://pytorch-lightning.readthedocs.io/en/latest/common/lightning_module.html#training-step

matyushinleonid added bug Something isn't working help wanted Open to be worked on labels Jun 1, 2021

carmocca closed this as completed Jun 1, 2021

yifuwang mentioned this issue Jun 14, 2021

Make optimizers skippable when using amp #7975

Merged

11 tasks

schopra8 mentioned this issue Jul 22, 2024

Error when disabling an optimizer with native AMP turned on #20116

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Multiple optimizers + half precision + skip first optimization step Bug #7792

Multiple optimizers + half precision + skip first optimization step Bug #7792

matyushinleonid commented Jun 1, 2021

yifuwang commented Jun 1, 2021

carmocca commented Jun 1, 2021

Multiple optimizers + half precision + skip first optimization step Bug #7792

Multiple optimizers + half precision + skip first optimization step Bug #7792

Comments

matyushinleonid commented Jun 1, 2021

🐛 Bug

Please reproduce using the BoringModel

Expected behavior

Environment

yifuwang commented Jun 1, 2021

carmocca commented Jun 1, 2021