Skip to content

Issues: Lightning-AI/pytorch-lightning

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Author
Filter by author
Loading
Label
Filter by label
Loading
Use alt + click/return to exclude labels
or + click/return for logical OR
Projects
Filter by project
Loading
Milestones
Filter by milestone
Loading
Assignee
Filter by who’s assigned
Sort

Issues list

StreamingDataset not working in multi-gpu environement bug Something isn't working repro needed The issue is missing a reproducible example
#20140 opened Jul 30, 2024 by davidpicard
training time increase epoch by epoch bug Something isn't working help wanted Open to be worked on performance repro needed The issue is missing a reproducible example ver: 2.2.x
#20076 opened Jul 12, 2024 by Eric-Lin-CVTE
Dataloader with >0 workers when using DDP causes a crash bug Something isn't working data handling Generic data-related topic repro needed The issue is missing a reproducible example ver: 2.2.x
#20054 opened Jul 5, 2024 by alexanderswerdlow
trainer.test() with given checkpoint logs last epoch instead of checkpoint epoch bug Something isn't working help wanted Open to be worked on repro needed The issue is missing a reproducible example
#20052 opened Jul 5, 2024 by markussteindl
The training process will stop unexpectedly bug Something isn't working needs triage Waiting to be triaged by maintainers repro needed The issue is missing a reproducible example
#19920 opened May 30, 2024 by 5huanghuai
MisconfigurationException bug Something isn't working repro needed The issue is missing a reproducible example
#19516 opened Feb 23, 2024 by moghadas76
PermissionError with ModelCheckpoints bug Something isn't working callback: model checkpoint repro needed The issue is missing a reproducible example
#19397 opened Feb 2, 2024 by aaprasad
Deepspeed Stage 3 crashes Lightning trainer bug Something isn't working repro needed The issue is missing a reproducible example strategy: deepspeed ver: 2.1.x
#19096 opened Nov 30, 2023 by m-harmonic
BatchSizeFinder throws KeyError: 'limit_eval_batches' bug Something isn't working duplicate This issue or pull request already exists help wanted Open to be worked on repro needed The issue is missing a reproducible example tuner ver: 2.1.x
#18985 opened Nov 10, 2023 by drusmanbashir
DDP + static graph can result in garbage data returned by all_gather 3rd party Related to a 3rd-party bug Something isn't working repro needed The issue is missing a reproducible example ver: 2.0.x
#18872 opened Oct 26, 2023 by mooninrain
LightningModule.to_torchscript() does not transfer check_inputs to correct device bug Something isn't working good first issue Good for newcomers repro needed The issue is missing a reproducible example ver: 2.0.x
#18824 opened Oct 19, 2023 by pfeatherstone
manual_backward and .backward() have different behaviour. bug Something isn't working repro needed The issue is missing a reproducible example ver: 2.0.x
#18740 opened Oct 6, 2023 by roedoejet
Model trained with Deepspeed stage 3 shape not match when loading bug Something isn't working repro needed The issue is missing a reproducible example strategy: deepspeed ver: 2.0.x
#18648 opened Sep 26, 2023 by yinweisu
CombinedLoader takes a long time when num_workers > 0 bug Something isn't working help wanted Open to be worked on performance repro needed The issue is missing a reproducible example ver: 2.0.x
#18584 opened Sep 19, 2023 by johnathanchiu
Can't run the pytorch lightning program packaged with pyinstaller. 3rd party Related to a 3rd-party bug Something isn't working help wanted Open to be worked on repro needed The issue is missing a reproducible example ver: 1.9.x
#18492 opened Sep 6, 2023 by laogonggong847
Model parameters don't get updated after upgrading from 1.1.4 to 2.0.7 bug Something isn't working repro needed The issue is missing a reproducible example ver: 2.0.x ver: 2.1.x
#18346 opened Aug 20, 2023 by yqin-falling-stars
load_from_checkpoint Right After fit Got FileNotFound Error bug Something isn't working repro needed The issue is missing a reproducible example ver: 1.9.x
#18328 opened Aug 16, 2023 by donglihe-hub
Incorrect batch progress saved in checkpoint at every_n_train_steps bug Something isn't working help wanted Open to be worked on loops Related to the Loop API repro needed The issue is missing a reproducible example ver: 1.9.x ver: 2.1.x
#18060 opened Jul 11, 2023 by shuaitang5
Running out of memory when resuming the training from a checkpoint bug Something isn't working checkpointing Related to checkpointing performance repro needed The issue is missing a reproducible example ver: 2.0.x
#18059 opened Jul 11, 2023 by RJPenic
RuntimeError: CUDA error: unspecified launch failure bug Something isn't working repro needed The issue is missing a reproducible example ver: 2.0.x
#18039 opened Jul 10, 2023 by Hanminghao
self.log(.., on_epoch=True) runs extremely slow bug Something isn't working logging Related to the `LoggerConnector` and `log()` performance repro needed The issue is missing a reproducible example ver: 2.0.x
#17988 opened Jul 4, 2023 by LinWeizheDragon
Expected all tensors to be on the same device bug Something isn't working repro needed The issue is missing a reproducible example ver: 2.0.x
#17851 opened Jun 16, 2023 by whatisslove11
IsADirectoryError: [Errno 21] Is a directory: '/content' bug Something isn't working repro needed The issue is missing a reproducible example ver: 2.1.x ver: 2.2.x
#17730 opened May 31, 2023 by rashidasohail
ProTip! Type g p on any issue or pull request to go back to the pull request listing page.