Skip to content

auto_lr_find does not work if there is a BackboneFinetuning callback #14674

Open
@ejm714

Description

@ejm714

🐛 Bug

auto_lr_find does not properly restore the model for training if there is a BackboneFinetuning callback.

To Reproduce

Specify a BackboneFinetuning callback, set auto_lr_find to True, and then run tune and fit.

trainer = Trainer(
    auto_lr_find=True,
    callbacks=[BackboneFinetuning()],
)
trainer.tune(model, train_dataloaders=train_data, val_dataloaders=val_data)
trainer.fit(model, train_dataloaders=train_data, val_dataloaders=val_data)

which will yield the following error

/usr/local/lib/python3.7/dist-packages/pytorch_lightning/callbacks/finetuning.py in on_fit_start(self, trainer, pl_module)
    103             for opt_idx, optimizer in enumerate(trainer.optimizers):
    104                 param_groups = self._apply_mapping_to_param_groups(
--> 105                     self._internal_optimizer_metadata[opt_idx], named_parameters
    106                 )
    107                 optimizer.param_groups = param_groups

KeyError: 0

See notebook example: https://colab.research.google.com/drive/1ajrSRge90RM8Rlcwk0HyEosLLpOpyvg-

Expected behavior

It should be the case that after auto_lr_find runs, the model is reset and the found learning rate is used.

Environment

See bottom cell of colab notebook.

Additional context

I think the culprit is that on_fit_start on BackboneFinetuning now calls the on_fit_start method of BaseFinetuning, which then thinks the model is being restarted from a checkpoint.

It looks like the bug got introduced in this PR: 07635d0#diff-ac96be7ba54bac4d7dc79ee012a211498fb97689e37026fe8a1b06a359079224R410

The fix will need to both support the finetuning callbacks when training is resumed as well as as support using auto lr find when there is a backbone finetuning callback on the model.

cc @akihironitta @Borda @rohitgr7

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions