-
Notifications
You must be signed in to change notification settings - Fork 3.4k
Issues: Lightning-AI/pytorch-lightning
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Author
Label
Projects
Milestones
Assignee
Sort
Issues list
enable loading Is an improvement or enhancement
help wanted
Open to be worked on
strategy: deepspeed
universal checkpointing
checkpoint in DeepSpeedStrategy
feature
#20065
opened Jul 9, 2024 by
zhoubay
When calling trainer.test() train_dataloader is also validated, which makes no sense
bug
Something isn't working
strategy: deepspeed
#19745
opened Apr 8, 2024 by
asusdisciple
batch_sampler.batch_size
is None with deepspeed and DataLoader(batch_size=None)
bug
Add additional parameters to the DeepSpeedStrategy
feature
Is an improvement or enhancement
strategy: deepspeed
#19278
opened Jan 12, 2024 by
keunwoochoi
Deepspeed vs DDP
bug
Something isn't working
strategy: deepspeed
ver: 2.1.x
#19246
opened Jan 8, 2024 by
jpatel-bdai
Deepspeed Stage 3 crashes Lightning trainer
bug
Something isn't working
repro needed
The issue is missing a reproducible example
strategy: deepspeed
ver: 2.1.x
#19096
opened Nov 30, 2023 by
m-harmonic
Unable to chnage checkpoint in on_save_checkpoint with Deepspeed
bug
Something isn't working
checkpointing
Related to checkpointing
strategy: deepspeed
ver: 2.0.x
Deepspeed activation Partitioning
docs
Documentation related
help wanted
Open to be worked on
strategy: deepspeed
#18732
opened Oct 6, 2023 by
LogicBaron
Model trained with Deepspeed stage 3 shape not match when loading
bug
Something isn't working
repro needed
The issue is missing a reproducible example
strategy: deepspeed
ver: 2.0.x
#18648
opened Sep 26, 2023 by
yinweisu
Device error when loading from checkpoint for testing with deepspeed
bug
Something isn't working
strategy: deepspeed
ver: 2.0.x
#18478
opened Sep 4, 2023 by
dionman
Memory Leak when instantiating Fabric multiple times
bug
Something isn't working
fabric
lightning.fabric.Fabric
performance
strategy: deepspeed
ver: 2.0.x
#18356
opened Aug 21, 2023 by
vkakerbeck
CUDA OOM with DeepSpeed ZeRO Stage 3 Offload
bug
Something isn't working
strategy: deepspeed
ver: 2.1.x
#18134
opened Jul 21, 2023 by
rggs
Any plans in adding support for DeepSpeedHybridEngine
feature
Is an improvement or enhancement
strategy: deepspeed
#17682
opened May 23, 2023 by
ShaojieJiang
--hf_deepspeed_save flag to use Hugging Face Deepspeed logic and no configure_optimizers if optimizer/scheduler defined
feature
Is an improvement or enhancement
strategy: deepspeed
#17673
opened May 21, 2023 by
jamesharrisivi
Update deepspeed activation checkpointing docs
docs
Documentation related
help wanted
Open to be worked on
strategy: deepspeed
#17621
opened May 12, 2023 by
avivbrokman
Error using lightning 2.0 when i use deepspeed and torch.compile both
3rd party
Related to a 3rd-party
bug
Something isn't working
strategy: deepspeed
ver: 2.1.x
#17549
opened May 3, 2023 by
yw0nam
Refactor the DeepSpeed strategy config management
fabric
lightning.fabric.Fabric
pl
Generic label for PyTorch Lightning package
refactor
strategy: deepspeed
deepspeed strategy can't save checkpoint, TypeError: cannot pickle Related to a 3rd-party
bug
Something isn't working
repro needed
The issue is missing a reproducible example
strategy: deepspeed
ver: 2.0.x
torch._C._distributed_c10d.ProcessGroup
object
3rd party
#17369
opened Apr 13, 2023 by
dmitrymailk
Deepspeed stage 3 crashing with student + teacher
question
Further information is requested
strategy: deepspeed
ver: 2.0.x
#17319
opened Apr 10, 2023 by
andrasiani
when using huggingface pretrained model with multi-gpu, model parameters were duplicate for every gpu in ram
3rd party
Related to a 3rd-party
question
Further information is requested
strategy: deepspeed
waiting on author
Waiting on user action, correction, or update
#17043
opened Mar 12, 2023 by
linyubupa
Integration with DeepSpeed and PyG: expected scalar type Float but found Half
3rd party
Related to a 3rd-party
bug
Something isn't working
strategy: deepspeed
#16793
opened Feb 17, 2023 by
adm995
Examples for OPT, just like benchmark on minGPT
example
feature
Is an improvement or enhancement
strategy: deepspeed
#16671
opened Feb 7, 2023 by
young-chao
mixed precision with Deepspeed
bug
Something isn't working
help wanted
Open to be worked on
strategy: deepspeed
#15168
opened Oct 18, 2022 by
wangleiofficial
trainer.validate() will not load optimizer properly, different behavior from trainer.fit()
strategy: deepspeed
#14993
opened Oct 4, 2022 by
MattYoon
Previous Next
ProTip!
Mix and match filters to narrow down what you’re looking for.