Stepwise LR scheduler #20211

01AbhiSingh · 2024-08-18T14:10:46Z

What does this PR do?

Fixes #<17544>

Hii @awaelchli. Can you please verify the changes I made. If they are correct then i will take up and correct any failing tests also.

Before submitting

Was this discussed/agreed via a GitHub issue? (not for typos and docs)
Did you read the contributor guideline, Pull Request section?
Did you make sure your PR does only one thing, instead of bundling different changes together?
Did you make sure to update the documentation with your changes? (if necessary)
Did you write any new necessary tests? (not for typos and docs)
Did you verify new and existing tests pass locally with your changes?
Did you list all the breaking changes introduced by this pull request?
Did you update the CHANGELOG? (not for typos, docs, test updates, or minor internal changes/refactors)

PR review

Anyone in the community is welcome to review the PR.
Before you start reviewing, make sure you have read the review guidelines. In short, see the following bullet-list:

hii

Reviewer checklist

- [ ] Is this pull request ready for review? (if not, please submit in draft mode) - [ ] Check that all items from **Before submitting** are resolved - [ ] Make sure the title is self-explanatory and the description concisely explains the PR - [ ] Add labels and milestones (and optionally projects) to the PR so it can be classified

📚 Documentation preview 📚: https://pytorch-lightning--20211.org.readthedocs.build/en/20211/

for more information, see https://pre-commit.ci

…ytorch-lightning into ddp-strategy-alias

codecov · 2024-08-18T20:26:37Z

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 79%. Comparing base (ea59e40) to head (337c1c2).

❗ There is a different number of reports uploaded between BASE (ea59e40) and HEAD (337c1c2). Click for more details.

HEAD has 102 uploads less than BASE

Flag BASE (ea59e40) HEAD (337c1c2)

cpu 48 24

lightning_fabric 7 0

pytest 26 0

python3.9 12 6

lightning 37 18

python3.10 6 3

python3.11 12 6

python3.12.7 18 9

gpu 2 0

Additional details and impacted files

@@            Coverage Diff            @@
##           master   #20211     +/-   ##
=========================================
- Coverage      88%      79%     -9%     
=========================================
  Files         267      264      -3     
  Lines       23380    23325     -55     
=========================================
- Hits        20481    18366   -2115     
- Misses       2899     4959   +2060

01AbhiSingh · 2024-10-07T02:58:15Z

Hii @Borda. Do I need to make any kind of changes in the PR ?

lantiga · 2024-10-07T10:55:57Z

This looks good, thank you for the contribution @01AbhiSingh

Ideally we could add a test to verify the behavior described in #17544. The current test suite can't detect the current change and this is usually a sign of insufficient coverage. Would you be willing to contribute such test?

01AbhiSingh · 2024-10-07T11:24:40Z

Yes, sure let me look into it.

01AbhiSingh · 2024-10-18T03:50:55Z

Hi @lantiga , Do you want a new test written from scratch or need me to make the necessary changes in a preexisting file? All the tests have been passed. If the changes need to be made in a preexisting file, it would be very helpful if you could point out the test in which I need to make the changes, as all the tests have been passed, and due to that, I can't find the test.

lantiga · 2024-11-12T22:58:37Z

hey @01AbhiSingh sorry for the wait

You can take inspiration from:

pytorch-lightning/tests/tests_pytorch/trainer/optimization/test_optimizers.py

Line 459 in b0aa504

    
           def test_lr_scheduler_epoch_step_frequency(mocked_sched, check_val_every_n_epoch, tmp_path):

and add a new test where scheduling goes across epoch boundaries. Maybe @falckt can help too?

for more information, see https://pre-commit.ci

01AbhiSingh · 2024-12-11T09:03:27Z

Done please check

lantiga · 2024-12-11T11:25:55Z

Hey @01AbhiSingh can you import LightningModule here?

https://github.com/Lightning-AI/pytorch-lightning/pull/20211/files#diff-3c3f104dbdd06271c9e6e6d4fdf61398458148412401dd55a9bac1e9b5f913a8R19

Change:

from lightning.pytorch import Trainer

to

from lightning.pytorch import Trainer, LightningModule

this should fix the failing test

…pytorch-lightning into stepwiseLRscheduler

for more information, see https://pre-commit.ci

01AbhiSingh · 2024-12-12T06:59:10Z

Yeah, my bad. Forgot to add it even after seeing it. Done, please check.

01AbhiSingh · 2024-12-12T08:03:21Z

https://github.com/Lightning-AI/pytorch-lightning/actions/runs/12291356552/job/34299991507?pr=20211#:~:text=FAILED%20utilities/test_data.py%3A%3Atest_update_dataloader_typerror_custom_exception%20%2D%20AssertionError%3A%20Regex%20pattern%20did%20not%20match.

This is the test that is currently failing.

def train_dataloader(self):
           # Create a simple dataset for testing
           x = torch.randn(21, 32)  # 7 batches of size 3
           y = torch.randn(21, 2)
           return DataLoader(TensorDataset(x, y), batch_size=3)

should I add this and try to run the test again ?

lantiga · 2024-12-12T13:43:37Z

Go for it : )

You can also run this kind of test locally with pytest tests/tests_pytorch/<test_file>.py::<name_of_test> to make things quicker on your end. This test in particular can be ran on any machine (and you can use Lightning Studios for free if you want to run on GPUs ofc)

01AbhiSingh · 2024-12-12T14:00:54Z

Go for it : )

You can also run this kind of test locally with pytest tests/tests_pytorch/<test_file>.py::<name_of_test> to make things quicker on your end. This test in particular can be ran on any machine (and you can use Lightning Studios for free if you want to run on GPUs ofc)

I actually tried to run the test locally with the method you suggested but this error keeps showing up ERROR: file or directory not found: tests/tests_pytorch/test_optimizers.py anyway I am trying to solve this problem on my local env.

Edit: I've solved this problem, will now update the PR only when it's running perfectly on my local environment. Thanks :)

Another Edit 😝 : updated the PR please check

…utils.data import DataLoader, TensorDataset

…pytorch-lightning into stepwiseLRscheduler

for more information, see https://pre-commit.ci

…pytorch-lightning into stepwiseLRscheduler

for more information, see https://pre-commit.ci

01AbhiSingh · 2024-12-13T04:59:13Z

Test passing on my local environment but not in the PR in the repo.

01AbhiSingh · 2025-02-03T07:31:34Z

I think this time it is all done. Can you please check once ? @lantiga

lantiga

Looks good, added a couple of comments

lantiga · 2025-02-03T23:32:14Z

tests/tests_pytorch/trainer/optimization/test_optimizers.py

+    trainer.fit(model)
+
+    # Debug print statements
+    print(f"Mocked scheduler step calls: {mocked_sched.call_count}")


Please remove the debug statements, I'd just convert them to asserts that compare the values with expected ones.

lantiga · 2025-02-03T23:33:07Z

tests/tests_pytorch/trainer/optimization/test_optimizers.py

+        def training_step(self, batch, batch_idx):
+            # Add print statement to track batch index and global step
+            if hasattr(self, 'trainer'):
+                print(f"Batch idx: {batch_idx}, Global step: {self.trainer.global_step}")


Print statements in tests are not super helpful, just use asserts so the test will break if we don't get the expected value here.

lantiga · 2025-02-03T23:34:27Z

tests/tests_pytorch/trainer/optimization/test_optimizers.py

+
+    # Assert that the scheduler was called the expected number of times
+    # Allow for a small difference due to environment or rounding discrepancies
+    assert abs(mocked_sched.call_count - expected_steps) <= 1, (


I'm not sure why there should be rounding discrepancies. Shouldn't this be fully deterministic?

Actually the test was passing in my local environment but not in the CI / CD pipeline for some reason. I forgot to change it later. Let me correct it asap.

for more information, see https://pre-commit.ci

01AbhiSingh and others added 13 commits July 23, 2024 20:03

Fix DDP strategy registration with override

5dba6f9

added ddp alias strategy in strategies/ddp.py

3d8b2bf

added ddp alias strategy in strategies/ddp.py

7a55c5c

[pre-commit.ci] auto fixes from pre-commit.com hooks

f4b01e5

for more information, see https://pre-commit.ci

Merge branch 'master' into ddp-strategy-alias

4424d70

updated tests

607363e

[pre-commit.ci] auto fixes from pre-commit.com hooks

3099586

for more information, see https://pre-commit.ci

updated test_registry.py

935a9c1

Merge branch 'master' into ddp-strategy-alias

c70ef61

updated test_cli.py

ebfedf6

Merge branch 'ddp-strategy-alias' of https://github.com/01AbhiSingh/p…

3285d7a

…ytorch-lightning into ddp-strategy-alias

Stepwise LR scheduler not working across epochs

4b7b719

Merge remote-tracking branch 'origin' into Stepwise-LR-scheduler

5be642f

01AbhiSingh requested review from lantiga, Borda, tchaton, awaelchli and justusschock as code owners August 18, 2024 14:10

github-actions bot added the pl Generic label for PyTorch Lightning package label Aug 18, 2024

01AbhiSingh and others added 4 commits August 21, 2024 12:10

Merge branch 'master' into stepwiseLRscheduler

fc01630

Merge branch 'master' into stepwiseLRscheduler

7f748cf

Merge branch 'master' into stepwiseLRscheduler

06f0a0a

Merge branch 'master' into stepwiseLRscheduler

3c48c9e

lantiga mentioned this pull request Nov 12, 2024

Update LR step scheduler to use total step to work across epochs #20248

Closed

7 tasks

01AbhiSingh and others added 4 commits December 11, 2024 14:13

added the required changes

64ed819

added the required changes

29af194

added the required changes

09bc52b

[pre-commit.ci] auto fixes from pre-commit.com hooks

e96c474

for more information, see https://pre-commit.ci

01AbhiSingh and others added 3 commits December 12, 2024 12:21

added the required changes

2391336

Merge branch 'stepwiseLRscheduler' of https://github.com/01AbhiSingh/…

a273722

…pytorch-lightning into stepwiseLRscheduler

[pre-commit.ci] auto fixes from pre-commit.com hooks

eb98dce

for more information, see https://pre-commit.ci

01AbhiSingh and others added 8 commits December 12, 2024 19:43

added the dataloader function and added the following lib from torch.…

e45a8f9

…utils.data import DataLoader, TensorDataset

Merge branch 'stepwiseLRscheduler' of https://github.com/01AbhiSingh/…

7adad14

…pytorch-lightning into stepwiseLRscheduler

Merge branch 'master' into stepwiseLRscheduler

7bb9697

Merge branch 'master' into stepwiseLRscheduler

4c77cb3

[pre-commit.ci] auto fixes from pre-commit.com hooks

15052fb

for more information, see https://pre-commit.ci

added the changes

e30a504

Merge branch 'stepwiseLRscheduler' of https://github.com/01AbhiSingh/…

9dbbc8d

…pytorch-lightning into stepwiseLRscheduler

[pre-commit.ci] auto fixes from pre-commit.com hooks

27047bf

for more information, see https://pre-commit.ci

01AbhiSingh added 2 commits February 3, 2025 10:46

added the changes

ac5afed

added the changes

c61fd46

mergify bot added the has conflicts label Feb 3, 2025

lantiga reviewed Feb 3, 2025

View reviewed changes

lantiga and others added 2 commits February 4, 2025 00:35

Merge branch 'master' into stepwiseLRscheduler

e17fd6f

[pre-commit.ci] auto fixes from pre-commit.com hooks

337c1c2

for more information, see https://pre-commit.ci

mergify bot removed the has conflicts label Feb 3, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Stepwise LR scheduler #20211

Stepwise LR scheduler #20211

01AbhiSingh commented Aug 18, 2024 •

edited by github-actions bot

Loading

codecov bot commented Aug 18, 2024 •

edited

Loading

01AbhiSingh commented Oct 7, 2024

lantiga commented Oct 7, 2024

01AbhiSingh commented Oct 7, 2024

01AbhiSingh commented Oct 18, 2024

lantiga commented Nov 12, 2024

01AbhiSingh commented Dec 11, 2024

lantiga commented Dec 11, 2024

01AbhiSingh commented Dec 12, 2024

01AbhiSingh commented Dec 12, 2024

lantiga commented Dec 12, 2024

01AbhiSingh commented Dec 12, 2024 •

edited

Loading

01AbhiSingh commented Dec 13, 2024

01AbhiSingh commented Feb 3, 2025

lantiga left a comment

lantiga Feb 3, 2025

lantiga Feb 3, 2025

lantiga Feb 3, 2025

01AbhiSingh Feb 5, 2025

Stepwise LR scheduler #20211

Are you sure you want to change the base?

Stepwise LR scheduler #20211

Conversation

01AbhiSingh commented Aug 18, 2024 • edited by github-actions bot Loading

What does this PR do?

PR review

codecov bot commented Aug 18, 2024 • edited Loading

Codecov Report

01AbhiSingh commented Oct 7, 2024

lantiga commented Oct 7, 2024

01AbhiSingh commented Oct 7, 2024

01AbhiSingh commented Oct 18, 2024

lantiga commented Nov 12, 2024

01AbhiSingh commented Dec 11, 2024

lantiga commented Dec 11, 2024

01AbhiSingh commented Dec 12, 2024

01AbhiSingh commented Dec 12, 2024

lantiga commented Dec 12, 2024

01AbhiSingh commented Dec 12, 2024 • edited Loading

01AbhiSingh commented Dec 13, 2024

01AbhiSingh commented Feb 3, 2025

lantiga left a comment

Choose a reason for hiding this comment

lantiga Feb 3, 2025

Choose a reason for hiding this comment

lantiga Feb 3, 2025

Choose a reason for hiding this comment

lantiga Feb 3, 2025

Choose a reason for hiding this comment

01AbhiSingh Feb 5, 2025

Choose a reason for hiding this comment

01AbhiSingh commented Aug 18, 2024 •

edited by github-actions bot

Loading

codecov bot commented Aug 18, 2024 •

edited

Loading

01AbhiSingh commented Dec 12, 2024 •

edited

Loading