Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add wait_for_first_optimizer_step mode #737

Merged
merged 4 commits into from
Aug 31, 2023

Conversation

linshokaku
Copy link
Member

We will add a mode to delay the execution of scheduler.step until optimizer.step is run. This is to prevent a PyTorch warning that can occur when scheduler.step is executed before optimizer.step, a situation that can arise when GradScaler does not execute optimizer.step on the first iteration.

@kmaehashi kmaehashi self-assigned this Aug 10, 2023
@kmaehashi kmaehashi added the cat:enhancement New feature or request label Aug 10, 2023
Copy link
Member

@kmaehashi kmaehashi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM except for nits!

pytorch_pfn_extras/training/extensions/lr_scheduler.py Outdated Show resolved Hide resolved
pytorch_pfn_extras/training/extensions/lr_scheduler.py Outdated Show resolved Hide resolved
pytorch_pfn_extras/training/extensions/lr_scheduler.py Outdated Show resolved Hide resolved
@kmaehashi
Copy link
Member

For record: optimizer._step_count is a private API, but I take it as relatively stable as it's there since the oldest PyTorch version we support (v1.10.0):
https://github.com/pytorch/pytorch/blob/v1.10.0/torch/optim/lr_scheduler.py#L127-L128

Copy link
Member

@kmaehashi kmaehashi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! Could you add tests as well?

self.scheduler.optimizer._step_count < 1
and self.wait_for_first_optimizer_step
self.wait_for_first_optimizer_step
and hasattr(self.scheduler.optimizer.step, "_with_counter")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A very nit: I think assuming the existence of private APIs is fragile, because this feature will silently stop working if PyTorch renames this attribute. I guess this check is not essentially needed and can be removed?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When using the base class of PyTorch's LRScheduler, the addition of the _step_count attribute and the addition of a hook for counting are performed at the time of initialization. However, as ReduceLROnPlateau does not inherit from this base class, access to these attributes results in an error. Therefore, it is appropriate to perform an existence check for the _with_counter attribute, following the code that checks for PyTorch's lr_scheduler _step_count.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah got it, thanks! I'd prefer this to be commented somewhere, but it can be done separately 😄

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this may be a very detailed matter and not essential to the functionality of the code, would it be possible to include the link to the comment made on this PR as a comment in the code?

@takagi takagi added this to the v0.7.2 milestone Aug 28, 2023
@kmaehashi
Copy link
Member

/test mini

@kmaehashi kmaehashi merged commit f9d28d2 into pfnet:master Aug 31, 2023
5 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cat:enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants