-
Notifications
You must be signed in to change notification settings - Fork 3.4k
Issues: Lightning-AI/pytorch-lightning
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Author
Label
Projects
Milestones
Assignee
Sort
Issues list
Confusing recommendation to use sync_dist=True even with TorchMetrics
bug
Something isn't working
help wanted
Open to be worked on
logging
Related to the `LoggerConnector` and `log()`
ver: 2.2.x
#20153
opened Aug 2, 2024 by
srprca
How to use Webdataset in DDP setting? ValueError: you need to add an explicit nodesplitter to your input pipeline for multi-node training
docs
Documentation related
help wanted
Open to be worked on
ver: 2.2.x
#20149
opened Aug 1, 2024 by
cgebbe
Support restoring callbacks' status when predicting
feature
Is an improvement or enhancement
help wanted
Open to be worked on
#20137
opened Jul 29, 2024 by
zihaozou
OptimizerLRScheduler typing does not fit examples
bug
Something isn't working
example
help wanted
Open to be worked on
ver: 2.2.x
#20106
opened Jul 19, 2024 by
MalteEbner
training time increase epoch by epoch
bug
Something isn't working
help wanted
Open to be worked on
performance
repro needed
The issue is missing a reproducible example
ver: 2.2.x
#20076
opened Jul 12, 2024 by
Eric-Lin-CVTE
Using Stochastic Weight Averaging (SWA) and LearningRateFinder simultaneously can cause issues:
bug
Something isn't working
callback: swa
help wanted
Open to be worked on
ver: 2.2.x
#20070
opened Jul 10, 2024 by
liuzeyu6
enable loading Is an improvement or enhancement
help wanted
Open to be worked on
strategy: deepspeed
universal checkpointing
checkpoint in DeepSpeedStrategy
feature
#20065
opened Jul 9, 2024 by
zhoubay
trainer.test() with given checkpoint logs last epoch instead of checkpoint epoch
bug
Something isn't working
help wanted
Open to be worked on
repro needed
The issue is missing a reproducible example
#20052
opened Jul 5, 2024 by
markussteindl
ModelCheckpoint could not find key in returned metrics
bug
Something isn't working
callback: model checkpoint
help wanted
Open to be worked on
ver: 2.1.x
#20046
opened Jul 4, 2024 by
TheAeryan
[Fabric Lightning] Named barriers
distributed
Generic distributed-related topic
feature
Is an improvement or enhancement
help wanted
Open to be worked on
#20027
opened Jun 28, 2024 by
tesslerc
Add truncated backpropagation through time (TBPTT) example
docs
Documentation related
help wanted
Open to be worked on
#19985
opened Jun 17, 2024 by
svnv-svsv-jm
Another profiling tool is already active
bug
Something isn't working
help wanted
Open to be worked on
profiler
ver: 2.2.x
#19983
opened Jun 17, 2024 by
zhaohm14
Documentation: writing custom samplers compatible with multi GPU training
docs
Documentation related
help wanted
Open to be worked on
#19964
opened Jun 10, 2024 by
fteufel
Returning num_replicas=world_size when using distributed sampler in ddp
distributed
Generic distributed-related topic
duplicate
This issue or pull request already exists
feature
Is an improvement or enhancement
help wanted
Open to be worked on
strategy: ddp
DistributedDataParallel
#19961
opened Jun 9, 2024 by
arjunagarwal899
KeyboardInterrupt raises an exception which results in a zero exit code
bug
Something isn't working
environment: slurm
help wanted
Open to be worked on
ver: 2.0.x
ver: 2.1.x
ver: 2.2.x
#19916
opened May 29, 2024 by
amarckal
Apply the ignore of the Is an improvement or enhancement
good first issue
Good for newcomers
help wanted
Open to be worked on
save_hyperparameters
function to args
feature
#19761
opened Apr 11, 2024 by
doveppp
How to use BackboneFinetuning callback?
callback: finetuning
docs
Documentation related
help wanted
Open to be worked on
#19711
opened Mar 28, 2024 by
Antoine101
neptune.ai logger produces lots of errors when logging "training/epoch"
bug
Something isn't working
help wanted
Open to be worked on
logger: neptune
#19679
opened Mar 20, 2024 by
simon-ging
configure_model
is incompatible with the BaseFinetuning
behavior when fitting
bug
#19658
opened Mar 16, 2024 by
GdoongMathew
Docs don't render LaTeX formulas
docs
Documentation related
good first issue
Good for newcomers
help wanted
Open to be worked on
#19633
opened Mar 15, 2024 by
zichunxx
Does Good for newcomers
help wanted
Open to be worked on
question
Further information is requested
ver: 2.2.x
Trainer(devices=1)
use all CPUs?
good first issue
#19595
opened Mar 7, 2024 by
MaximilienLC
Validation runs only for one iteration when restarting from checkpoint mid-epoch, wrongly reporting validation loss
bug
Something isn't working
help wanted
Open to be worked on
loops
Related to the Loop API
#19549
opened Feb 29, 2024 by
pimdh
batch_sampler.batch_size
is None with deepspeed and DataLoader(batch_size=None)
bug
Insert the Documentation related
help wanted
Open to be worked on
profiler
**profiler_kwargs
from pytorchprofiler into usual args
docs
#19410
opened Feb 5, 2024 by
gardiens
Potential off by 1 error when resuming training of mid-epoch checkpoint
bug
Something isn't working
help wanted
Open to be worked on
loops
Related to the Loop API
ver: 2.1.x
#19367
opened Jan 29, 2024 by
ivnle
Previous Next
ProTip!
Type g p on any issue or pull request to go back to the pull request listing page.