-
Notifications
You must be signed in to change notification settings - Fork 3.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
trainer.fit_loop.setup_data()
does not refresh train dataset in LightningModule
#17327
Comments
You can do |
This solution is unsatisfactory since I want to a) avoid reloading every epoch and b) be able to reload irregularly and on command. I also think you should re-add the bug tag, since the functionality of |
This is "working as expected" given the current design of If you make that The easiest way to change this would be to add a |
An idea for this part:
You could still set def train_dataloader(self):
if condition:
# recreate
self.train_dl = DataLoader(
self.train_data,
)
return self.train_dl |
👍 to this issue.
|
Adding on this discussion, I also have a custom callback that was using the reset_xyz_dataloader that I'm migrating to Lightning (right now I'm using 2.1.3)
The on_fit_start part is working, training and validation datamodule is changed and reloaded and I receive batches coming from the newly created dataset correctly. But test is not working! Debugging I see that the test_loop setup_data is called properly and the datamodule is loaded with the new batch (_combined_loader contains a reference to the correct dataset), but when I receive the batch in the test_step it is the one coming from the original test dataset and not from the updated one, any idea on this? It seems to be a bug to me but I can't get what's causing the issue |
Bug description
PR #16726 replaces the
reset_*_dataloader()
method calls with the respectiveLoop.setup_data()
calls. This is also mentioned in the migration guide.However, on versions
<= 1.9
, callingreset_train_dataloader()
would reinstantiate the dataloader from aLightningModule
'strain_dataloader()
method. This behaviour is now gone.My specific use case is that I need to update the dataset of my model during training. I then use
on_train_epoch_end()
or a similar hook to callreset_train_dataloader()
, to have the updated dataset in the next training epoch. I posted a minimal example below. You can run this example on bothv1.9
andv2.0
to see the exact difference.v1.9
runs without problems, whereasv2.0
fails the second assertion intraining_step()
. I tested it on a fresh conda env install of both versions using python 3.10.In case I am using the wrong
loop
to callsetup_data()
or am using the new interface incorrectly, please let me know. In that case I would also recommend providing some more hints in the migration guide or on PR #16726 since the current advice is not exactly clear. (i.e. which loops are "top level"?)What version are you seeing the problem on?
2.0+
How to reproduce the bug
Error messages and logs
Environment
Current environment
More info
No response
cc @Borda @justusschock @awaelchli @carmocca
The text was updated successfully, but these errors were encountered: