Fix dataloader not reloading when resuming from checkpoint #21514

littlebullGit · 2026-01-28T01:33:17Z

When resuming from a checkpoint with reload_dataloaders_every_n_epochs, the dataloader was not being reloaded at the correct epoch. This was because setup_data() was overwriting _last_train_dl_reload_epoch with the current epoch during checkpoint restoration, losing the information about when the dataloader was actually last reloaded.

The fix:

Save _last_train_dl_reload_epoch in checkpoint state
Restore _last_train_dl_reload_epoch from checkpoint on load
Only update _last_train_dl_reload_epoch when actually reloading the dataloader or during initial setup (not when resuming)

This ensures _should_reload_train_dl returns the correct value after resuming from a checkpoint.

Backward compatible: old checkpoints without this key will default to float('-inf'), which triggers a reload (the safest behavior).

Fixes #21492

📚 Documentation preview 📚: https://pytorch-lightning--21514.org.readthedocs.build/en/21514/

When resuming from a checkpoint with reload_dataloaders_every_n_epochs, the dataloader was not being reloaded at the correct epoch. This was because setup_data() was overwriting _last_train_dl_reload_epoch with the current epoch during checkpoint restoration, losing the information about when the dataloader was actually last reloaded. The fix: 1. Save _last_train_dl_reload_epoch in checkpoint state 2. Restore _last_train_dl_reload_epoch from checkpoint on load 3. Only update _last_train_dl_reload_epoch when actually reloading the dataloader or during initial setup (not when resuming) This ensures _should_reload_train_dl returns the correct value after resuming from a checkpoint. Backward compatible: old checkpoints without this key will default to float('-inf'), which triggers a reload (the safest behavior). Fixes Lightning-AI#21492

codecov · 2026-01-28T02:22:04Z

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 79%. Comparing base (0c2025d) to head (267371d).
✅ All tests successful. No failed tests found.

❗ There is a different number of reports uploaded between BASE (0c2025d) and HEAD (267371d). Click for more details.

HEAD has 726 uploads less than BASE

Flag BASE (0c2025d) HEAD (267371d)

cpu 198 33

python 18 3

lightning_fabric 54 0

pytest 99 0

python3.12 54 9

python3.13 18 3

lightning 90 15

python3.11 36 6

python3.12.7 54 9

python3.10 18 3

pytorch2.8 18 6

pytorch_lightning 54 18

pytest-full 99 33

pytorch2.5.1 9 3

pytorch2.7 9 3

pytorch2.1 18 6

pytorch2.3 9 3

pytorch2.2.2 9 3

pytorch2.6 9 3

pytorch2.4.1 9 3

pytorch2.9 9 3

Additional details and impacted files

@@            Coverage Diff            @@
##           master   #21514     +/-   ##
=========================================
- Coverage      87%      79%     -8%     
=========================================
  Files         270      267      -3     
  Lines       24071    24021     -50     
=========================================
- Hits        20867    18965   -1902     
- Misses       3204     5056   +1852

littlebullGit requested review from ethanwharris, justusschock, lantiga and tchaton as code owners January 28, 2026 01:33

github-actions bot added the pl Generic label for PyTorch Lightning package label Jan 28, 2026

littlebullGit force-pushed the fix/21492-dataloader-reload-checkpoint branch from 5c24d70 to 6afeb53 Compare January 28, 2026 02:04

github-actions bot added the has conflicts label Jan 28, 2026

Merge branch 'master' into fix/21492-dataloader-reload-checkpoint

267371d

github-actions bot removed the has conflicts label Jan 30, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix dataloader not reloading when resuming from checkpoint #21514

Fix dataloader not reloading when resuming from checkpoint #21514

littlebullGit commented Jan 28, 2026 •

edited by github-actions bot

Loading

Uh oh!

codecov bot commented Jan 28, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Fix dataloader not reloading when resuming from checkpoint #21514

Are you sure you want to change the base?

Fix dataloader not reloading when resuming from checkpoint #21514

Conversation

littlebullGit commented Jan 28, 2026 • edited by github-actions bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

codecov bot commented Jan 28, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

littlebullGit commented Jan 28, 2026 •

edited by github-actions bot

Loading

codecov bot commented Jan 28, 2026 •

edited

Loading