-
Notifications
You must be signed in to change notification settings - Fork 4.7k
Pull requests: deepspeedai/DeepSpeed
Author
Label
Projects
Milestones
Reviews
Assignee
Sort
Pull requests list
XPU use stock pytorch instead of Intel Extension for PyTorch
#7877
opened Feb 27, 2026 by
delock
Loading…
Fix import deepspeed crash on PyTorch v2.3 + Python 3.12
#7875
opened Feb 26, 2026 by
tohtana
Loading…
Fix Evoformer arch filtering consistency for mixed targets (#7863)
#7872
opened Feb 24, 2026 by
tohtana
Loading…
fix: correct DistributedAttention output shape and pad uneven sequence lengths (#7842)
#7868
opened Feb 22, 2026 by
harshang03
•
Draft
fix: keep fp32-pinned parameters out of the bf16 cast path in ZeRO-3 (#7747)
#7867
opened Feb 22, 2026 by
harshang03
•
Draft
Revert "fix: remove premature MPI environment variable check in OpenMPIRunner"
#7864
opened Feb 21, 2026 by
mikloorbi-sys
•
Draft
Fix global .cuh ignore and enforce tracked CUDA headers
#7858
opened Feb 18, 2026 by
harshang03
•
Draft
Fix ZeRO legacy grad-hook crash when next_functions is missing
#7857
opened Feb 17, 2026 by
harshang03
•
Draft
Reject non-finite fp16 loss_scale across config and ZeRO paths
#7856
opened Feb 17, 2026 by
harshang03
•
Draft
Fix zero/division safety gaps in utility and inference paths
#7855
opened Feb 17, 2026 by
harshang03
•
Draft
Fix count_used_parameters_in_backward crash on PyTorch < 2.3 (#7756)
#7849
opened Feb 12, 2026 by
harshang03
•
Draft
[BUG] Fix: Fix gradient norm calculation and dynamic shape blocking in PP+ZeRO1 collective communication
#7847
opened Feb 12, 2026 by
Thinksky5124
Loading…
Fix no-grad grad-fn lookup in ZeRO hook counting on PyTorch 2.3 (#7830)
#7841
opened Feb 10, 2026 by
tohtana
Loading…
Fix bf16 dtype mismatch in ZeRO-3 with zero_quantized_weights
#7792
opened Jan 18, 2026 by
juyterman1000
Loading…
Fix Muon optimizer conflict with gradient clipping in ZeRO 1/2
#7776
opened Jan 12, 2026 by
fy817
Loading…
Fix: ZenFlow Adam integration for updated PyTorch backward flow (#7759)
#7771
opened Jan 11, 2026 by
Antlera
Loading…
Introduce all_reduce_hook to support gradient aggregation across replica groups.
#7764
opened Jan 7, 2026 by
zhengchenyu
Loading…
feat: add parameter-level precision control for BF16 training
#7750
opened Dec 30, 2025 by
nathon-lee
Loading…
Fix Muon optimizer checkpoint resume with bf16 mode
#7748
opened Dec 28, 2025 by
yurekami
Loading…
2 tasks done
Previous Next
ProTip!
Mix and match filters to narrow down what you’re looking for.