Pulse · deepspeedai/DeepSpeed · GitHub

July 19, 2025 – August 19, 2025

Overview

35 Active pull requests

41 Active issues

2 Releases published by 1 person

v0.17.3 v0.17.3 Patch Release
published Jul 28, 2025
v0.17.4 v0.17.4 Patch Release
published Jul 31, 2025

30 Pull requests merged by 17 people

Add index to HPU devices
#7497 merged Aug 19, 2025
Reduce performance impact of compiler.enable decorator
#7498 merged Aug 18, 2025
Fix DeepCompile for PyTorch v2.8
#7496 merged Aug 18, 2025
Fix invalid f-strings
#7457 merged Aug 16, 2025
Add Zenflow code for Stage 1 & 2
#7391 merged Aug 15, 2025
fix xpu device_id AttributeError issue
#7488 merged Aug 15, 2025
Enable forked PRs
#7486 merged Aug 14, 2025
Fix pre-compile on cpu-only machines
#7168 merged Aug 12, 2025
[TiledFusedLogitsLoss] support inference
#7477 merged Aug 11, 2025
[UlyssesSPDataLoaderAdapter] fix iterator reset
#7472 merged Aug 11, 2025
Modal CI
#7289 merged Aug 11, 2025
fix deepspeed --venv_script
#7469 merged Aug 11, 2025
Fix cpu CI
#7481 merged Aug 11, 2025
Add blog for ZenFlow
#7463 merged Aug 10, 2025
add --bind_cores_to_rank to zero offload tutorial
#7474 merged Aug 8, 2025
fix #7188
#7371 merged Aug 4, 2025
Fix all-gather duplicate params and wrong dtype
#7462 merged Aug 3, 2025
fix issues raised by Coverity scans
#7431 merged Aug 2, 2025
Add getter APIs for TP/PP/DP ranks in DeepSpeedEngine
#7427 merged Aug 1, 2025
Update README.md
#7465 merged Aug 1, 2025
Update version.txt after v0.17.4 release
#7460 merged Jul 31, 2025
TiledFusedLogitsLoss bug fix
#7459 merged Jul 31, 2025
adding TiledFusedLogitsLoss
#7437 merged Jul 30, 2025
Fix: UnboundLocalError for variable 'dim' about issue
#7449 merged Jul 28, 2025
Update version.txt after 0.17.3 release.
#7455 merged Jul 28, 2025
Fix: Adapt Llama injection policy for newer transformers versions
#7443 merged Jul 26, 2025
Remove additional unused tests (human-eval)
#7445 merged Jul 24, 2025
[ALST] fix typo in the url part2
#7446 merged Jul 23, 2025
[ALST] fix typo in the url
#7444 merged Jul 23, 2025
Remove unused yaml test configurations and update README
#7441 merged Jul 22, 2025

5 Pull requests opened by 5 people

[AMD][ROCm] Improve support of AMD
#7448 opened Jul 24, 2025
Support Muon Optimizer
#7454 opened Jul 28, 2025
Add EXAONE 4.0 model support for DeepSpeed inference v2 @
#7456 opened Jul 29, 2025
Add world-size getter in Engine
#7479 opened Aug 9, 2025
DeepCompile ZeRO-3: robust allgather for uneven shards; fix profiling…
#7489 opened Aug 15, 2025

16 Issues closed by 9 people

[REQUEST]
#7476 closed Aug 12, 2025
[BUG] Qwen3 MoE 30B-A3B training stuck
#7461 closed Aug 6, 2025
Error when installing deepspeed with pip (Not sure if this is a bug or not)
#7358 closed Aug 5, 2025
[BUG] ModuleNotFoundError: No module named 'bitsandbytes.mlu_setup'
#7392 closed Aug 5, 2025
nv-torch-nightly-v100 CI test failure
#7195 closed Aug 5, 2025
nv-ds-chat CI test failure
#7213 closed Aug 5, 2025
File not found error with pip install
#7451 closed Aug 5, 2025
[BUG]when use 'overlap_comm:True' w/ 'contiguous_gradients:True', grad_norm is NaN
#7188 closed Aug 4, 2025
[BUG] Memory leak when using adam_offload and save_checkpoint
#7370 closed Aug 1, 2025
[BUG] UnboundLocalError in ZeroLinear.backward when training only bias parameters (e.g., Bias-tune fine-tuning)
#7435 closed Jul 29, 2025
[BUG]deepspeed v0.17.1 costs too much time to print so much duplicate logs before starting training if num_workers of dataloader>0.
#7411 closed Jul 28, 2025
Gradient not accumulated across nodes
#7419 closed Jul 27, 2025
[BUG]DeepSpeed compatibility issues
#7432 closed Jul 25, 2025
[REQUEST] Communcation Logging Raw Data and Fine Grained Adjustments
#7403 closed Jul 23, 2025
DeepSpeed async_io requires libaio-0.3.112 or newer, breaks on libaio-0.3.111 (e.g., Fedora/EL9)
#7346 closed Jul 23, 2025
RuntimeError: Distributed package doesn't have NCCL built in on windows when training with huggingface transformers
#7425 closed Jul 23, 2025

25 Issues opened by 24 people

[BUG] Accuracy fluctuation with tensor parallel on different card number
#7500 opened Aug 20, 2025
[BUG]Deepspeed (v0.15.4 ～v0.16.9) Zero3 training performance is slow，compare than v0.13.1
#7499 opened Aug 19, 2025
[BUG] How to drop some batches entirely to avoid calculating backpropagation while still updating the model for the rest
#7495 opened Aug 16, 2025
[REQUEST] Add automatic logging of parallelism and ZeRO config to WandbMonitor
#7494 opened Aug 16, 2025
[BUG] No backpropagation after micro-batch-id ≥ 3 with MPI backend on Jetson Orin AGX
#7492 opened Aug 15, 2025
Model saved from deepspeed and accelerate cannot be loaded or incomeplete
#7490 opened Aug 15, 2025
[BUG] Cuda failure 700 when use deepcompile with zero stage 3
#7487 opened Aug 14, 2025
[BUG] UlyssesSPDataLoaderAdapter returns duplicate data
#7484 opened Aug 12, 2025
[BUG] FlopsProfiler will hit error when sequence parallel enabled
#7483 opened Aug 12, 2025
[BUG] GPU OOM when finetune Qwen2.5-14B with ZeRO2+offload on 4xA100 40G cards
#7482 opened Aug 11, 2025
[REQUEST]
#7480 opened Aug 9, 2025
[REQUEST] Auto-Tuning CPU Core Binding for DeepSpeed&ZenFlow
#7478 opened Aug 9, 2025
Does the open-source code for FastPersist include the last two optimizations mentioned in the paper, "parallelizing checkpoint writes over DP ranks and pipelining checkpoint writes"?
#7475 opened Aug 8, 2025
[BUG] Abnormal loss in deepspeed v0.17.2 + ulysess training, not decreasing.
#7473 opened Aug 8, 2025
[BUG] Why are the checkpoints saved during deepspeed-zero0 training larger than the safetensors of the original base model?
#7471 opened Aug 7, 2025
[BUG]deepspeed>=v0.17.3 caused an error in megatron's `initialize_model_parallel`
#7470 opened Aug 7, 2025
Failing to build with DS_BUILD_OPS=1 due to missing nccl.h file
#7468 opened Aug 6, 2025
nv-torch-nightly-v100 CI test failure
#7467 opened Aug 6, 2025
[BUG] Guard check fails after deep-compiling a model that calls tensor.expand()
#7466 opened Aug 4, 2025
[QUESTION/HELP] ZERO3 get weight participate in loss
#7464 opened Aug 1, 2025
[BUG] ZeRO-3 partition does not work in Ulysses SP tutorial
#7458 opened Jul 30, 2025
[REQUEST] Add support for EXAONE 4.0 models
#7453 opened Jul 26, 2025
[REQUEST] Defer detection of op builder compatibility until build time
#7452 opened Jul 25, 2025
[BUG]Resolving OOM Issues in ConcurrenDistributed Inference of 111B Teacher Model and Distributed Training of 8B Student Model on Multi-Node H200 GPUs
#7450 opened Jul 24, 2025
[REQUEST] Add support for non-__dict__ outputs such as MinkowskiEngine SparseTensor in ZeRO Stage 3 (DeepSpeed v0.9.2)
#7442 opened Jul 22, 2025

20 Unresolved conversations

Sometimes conversations happen on old items that aren’t yet closed. Here is a list of all the Issues and Pull Requests with unresolved conversations.

Some question of gradient accumulation
#7439 commented on Jul 23, 2025 • 0 new comments
How to properly use tensor_parallel while applying also Zero Stage 3
#7389 commented on Jul 23, 2025 • 0 new comments
[BUG] Ulysses DistributedAttention silently produces incorrect output when #GPUs does not divide global sequence length
#7384 commented on Jul 24, 2025 • 0 new comments
[BUG] AttributeError: 'Linear' object has no attribute 'ds_grads_remaining'
#7203 commented on Jul 24, 2025 • 0 new comments
Gradient of the loss w.r.t sharded parameters
#7237 commented on Jul 25, 2025 • 0 new comments
GPUUtil-0 remains 0 during the process loading a 72B model
#6970 commented on Jul 31, 2025 • 0 new comments
[BUG] Receiving CUDA error: invalid argument using pytorch 2.7 with deepspeed 0.16.4 with Cuda 12.8
#7150 commented on Aug 1, 2025 • 0 new comments
[BUG]RuntimeError: disagreement between rank0 and rank1: rank0:
#5799 commented on Aug 1, 2025 • 0 new comments
[BUG] - Multiple 5090s failing on deepspeed.initialize()
#7261 commented on Aug 1, 2025 • 0 new comments
Issue with DeepSpeed Inference - Multiple Processes for Model Loading and Memory Allocation
#4052 commented on Aug 3, 2025 • 0 new comments
[BUG] zero2 and zero3 has different behavior using the same hyperparameter to train a large model
#4298 commented on Aug 4, 2025 • 0 new comments
[BUG] Memory is enough for training by using zero-3, but OOM occurred after enabling DeepCompile
#7434 commented on Aug 6, 2025 • 0 new comments
[REQUEST] Partial weight load on demand
#4719 commented on Aug 6, 2025 • 0 new comments
[ERROR] [launch.py:321:sigkill_handler] exits with return code = -11
#5690 commented on Aug 8, 2025 • 0 new comments
nv-nightly CI test failure
#7140 commented on Aug 20, 2025 • 0 new comments
Enable python 3.11 and 3.12 tests
#7007 commented on Aug 11, 2025 • 0 new comments
Update Domino for Llama3
#7084 commented on Aug 11, 2025 • 0 new comments
gather output layout support for column parallel
#7181 commented on Jul 22, 2025 • 0 new comments
Create COMMITTERS_RESPONSIBILITY.md
#7300 commented on Aug 12, 2025 • 0 new comments
Support DeepSpeed offload and reload states with ZeRO1 and ZeRO2
#7421 commented on Aug 19, 2025 • 0 new comments