-
Notifications
You must be signed in to change notification settings - Fork 3.4k
Issues: Lightning-AI/pytorch-lightning
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Author
Label
Projects
Milestones
Assignee
Sort
Issues list
FSDP Fails with floating nn.Parameter
bug
Something isn't working
duplicate
This issue or pull request already exists
strategy: fsdp
Fully Sharded Data Parallel
ver: 2.2.x
#20138
opened Jul 30, 2024 by
schopra8
Use new state-dict APIs in FSDPStrategy
feature
Is an improvement or enhancement
refactor
strategy: fsdp
Fully Sharded Data Parallel
#20060
opened Jul 8, 2024 by
awaelchli
PyTorch Lightning FSDP takes more memory than PyTorch FSDP
question
Further information is requested
strategy: fsdp
Fully Sharded Data Parallel
#19721
opened Apr 1, 2024 by
anandhperumal
Address FSDP + manual optimization
bug
Something isn't working
strategy: fsdp
Fully Sharded Data Parallel
ver: 2.2.x
#19685
opened Mar 22, 2024 by
awaelchli
Introduce sharded checkpointing for NO_SHARD FSDP.
feature
Is an improvement or enhancement
strategy: fsdp
Fully Sharded Data Parallel
#19671
opened Mar 19, 2024 by
ps-stability
FSDP hybrid shard should checkpoint in a single node
checkpointing
Related to checkpointing
feature
Is an improvement or enhancement
strategy: fsdp
Fully Sharded Data Parallel
FSDP checkpointing uses deprecated APIs with PyTorch 2.2
bug
Something isn't working
checkpointing
Related to checkpointing
strategy: fsdp
Fully Sharded Data Parallel
Support gradient clipping by norm with FSDP
feature
Is an improvement or enhancement
strategy: fsdp
Fully Sharded Data Parallel
Trainer.validate()
after Trainer.fit()
not working with FSDP and auto_wrap_policy
bug
Support saving and loading remote paths with FSDP
checkpointing
Related to checkpointing
feature
Is an improvement or enhancement
help wanted
Open to be worked on
strategy: fsdp
Fully Sharded Data Parallel
#18786
opened Oct 12, 2023 by
schmidt-ai
[TPU] Add Trainer support for PyTorch XLA FSDP
fabric
lightning.fabric.Fabric
feature
Is an improvement or enhancement
has conflicts
pl
Generic label for PyTorch Lightning package
strategy: fsdp
Fully Sharded Data Parallel
strategy: xla
Investigate FSDP + CPU Offload performance in Trainer
performance
strategy: fsdp
Fully Sharded Data Parallel
ver: 2.1.x
2x slower training speed with FSDP when switching from lightning 1.9 to 2.0
bug
Something isn't working
performance
strategy: fsdp
Fully Sharded Data Parallel
ver: 2.0.x
#18028
opened Jul 8, 2023 by
anthonyhu
Feature request: FSDP native strategy for TPUs
accelerator: tpu
Tensor Processing Unit
feature
Is an improvement or enhancement
strategy: fsdp
Fully Sharded Data Parallel
2
Support model size calculation for FSDP & DeepSpeed (stage-3)
feature
Is an improvement or enhancement
strategy: deepspeed
strategy: fsdp
Fully Sharded Data Parallel
#10291
opened Nov 1, 2021 by
rohitgr7
ProTip!
What’s not been updated in a month: updated:<2024-09-30.