Pulse · deepspeedai/DeepSpeed · GitHub

June 17, 2025 – June 24, 2025

Overview

14 Active pull requests

14 Active issues

13 Pull requests merged by 8 people

fix(inference): Add missing dtype attribute to ParameterBase setter
#7378 merged Jun 23, 2025
Add support for ws=1 scenario
#7379 merged Jun 23, 2025
Fix dtype mismatch in TestParamPartitioningSkipInit
#7377 merged Jun 23, 2025
fix wandb.log() call by removing sync kwarg
#7383 merged Jun 23, 2025
Fix release of IPG buffer
#7376 merged Jun 22, 2025
Update latest news with DeepNVMe
#7375 merged Jun 20, 2025
Relax tolerances for FP8 unit test only for ROCm + FP16
#7373 merged Jun 20, 2025
Flops profiler support for F.interpolate
#7353 merged Jun 20, 2025
add Arctic Long Sequence Training paper reference
#7372 merged Jun 20, 2025
Enable torch.autocast with ZeRO
#6993 merged Jun 19, 2025
sequence parallel default dtype
#7364 merged Jun 19, 2025
Fix(scheduler): WarmupLR inherits optimizer lr when not specified
#7360 merged Jun 19, 2025
Restore real inputs for recompilation
#7356 merged Jun 19, 2025

1 Pull request opened by 1 person

fix #7188
#7371 opened Jun 19, 2025

9 Issues closed by 5 people

[REQUEST] Support for XLA/TPU
#6901 closed Jun 24, 2025
[BUG] AttributeError: 'UnembedParameter' object has no attribute 'dtype'
#7260 closed Jun 23, 2025
[BUG] WandbMonitor log() invocation broken with wandb 0.20.0
#7381 closed Jun 23, 2025
[BUG]Training
#7319 closed Jun 20, 2025
nv-sd CI test failure
#7310 closed Jun 20, 2025
[BUG] FLOPS compute **FAILS** for `F.interpolate` when using `scale_factor`
#4504 closed Jun 20, 2025
Bug when using optimizer and WarmupLR togather
#7303 closed Jun 19, 2025
[BUG] Universal checkpoint conversion - "Cannot find layer_01* files in there"
#5776 closed Jun 17, 2025
[BUG] No `universal_checkpoint_info` in the Accelerate+Deepspeed Checkpoint
#5430 closed Jun 17, 2025

5 Issues opened by 5 people

[BUG] Ulysses DistributedAttention silently produces incorrect output when #GPUs does not divide global sequence length
#7384 opened Jun 23, 2025
[BUG] deepspeed v0.17.1 con't run well on NPU platform!
#7380 opened Jun 23, 2025
[BUG] Memory leak when using adam_offload and save_checkpoint
#7370 opened Jun 19, 2025
[BUG] init_inference loads qwen3-32b model very slow but train model loads it quickly
#7369 opened Jun 18, 2025
FastPersist micro-benchmarks test results are inconsistent with expectations
#7368 opened Jun 18, 2025

12 Unresolved conversations

Sometimes conversations happen on old items that aren’t yet closed. Here is a list of all the Issues and Pull Requests with unresolved conversations.

HF2UCP: Converting a `pytorch_model.bin` or `.safetensors` checkpoint to UCP
#7212 commented on Jun 23, 2025 • 4 new comments
[BUG] Qwen3: model loading failed when using meta device
#7275 commented on Jun 18, 2025 • 0 new comments
Functorch support: RuntimeError: In order to use an autograd.Function with functorch transforms
#7323 commented on Jun 19, 2025 • 0 new comments
AssertionError: no sync context manager is incompatible with gradientpartitioning logic of ZeRo stage 3
#6793 commented on Jun 20, 2025 • 0 new comments
Error when installing deepspeed with pip (Not sure if this is a bug or not)
#7358 commented on Jun 23, 2025 • 0 new comments
[BUG] DeepCompile in ZeRO-1 fails to do the forward pass
#7229 commented on Jun 23, 2025 • 0 new comments
nv-torch-nightly-v100 CI test failure
#7195 commented on Jun 24, 2025 • 0 new comments
nv-nightly CI test failure
#7140 commented on Jun 24, 2025 • 0 new comments
[BUG] Receiving CUDA error: invalid argument using pytorch 2.7 with deepspeed 0.16.4 with Cuda 12.8
#7150 commented on Jun 24, 2025 • 0 new comments
[BUG] Universal Checkpoint Conversion: Resumed Training Behaves as If Model Initialized from Scratch
#6691 commented on Jun 24, 2025 • 0 new comments
Avoid graph break by enabling compile of record module
#7362 commented on Jun 23, 2025 • 0 new comments
Fix ZeRO stage 1 and add stage 2 support with DeepCompile
#7366 commented on Jun 23, 2025 • 0 new comments