-
Notifications
You must be signed in to change notification settings - Fork 4.5k
Insights: deepspeedai/DeepSpeed
Overview
Could not load contribution data
Please try again later
2 Releases published by 1 person
-
v0.17.3 v0.17.3 Patch Release
published
Jul 28, 2025 -
v0.17.4 v0.17.4 Patch Release
published
Jul 31, 2025
30 Pull requests merged by 17 people
-
Add index to HPU devices
#7497 merged
Aug 19, 2025 -
Reduce performance impact of compiler.enable decorator
#7498 merged
Aug 18, 2025 -
Fix DeepCompile for PyTorch v2.8
#7496 merged
Aug 18, 2025 -
Fix invalid f-strings
#7457 merged
Aug 16, 2025 -
Add Zenflow code for Stage 1 & 2
#7391 merged
Aug 15, 2025 -
fix xpu device_id AttributeError issue
#7488 merged
Aug 15, 2025 -
Enable forked PRs
#7486 merged
Aug 14, 2025 -
Fix pre-compile on cpu-only machines
#7168 merged
Aug 12, 2025 -
[TiledFusedLogitsLoss] support inference
#7477 merged
Aug 11, 2025 -
[UlyssesSPDataLoaderAdapter] fix iterator reset
#7472 merged
Aug 11, 2025 -
Modal CI
#7289 merged
Aug 11, 2025 -
fix
deepspeed --venv_script
#7469 merged
Aug 11, 2025 -
Fix cpu CI
#7481 merged
Aug 11, 2025 -
Add blog for ZenFlow
#7463 merged
Aug 10, 2025 -
add --bind_cores_to_rank to zero offload tutorial
#7474 merged
Aug 8, 2025 -
fix #7188
#7371 merged
Aug 4, 2025 -
Fix all-gather duplicate params and wrong dtype
#7462 merged
Aug 3, 2025 -
fix issues raised by Coverity scans
#7431 merged
Aug 2, 2025 -
Add getter APIs for TP/PP/DP ranks in DeepSpeedEngine
#7427 merged
Aug 1, 2025 -
Update README.md
#7465 merged
Aug 1, 2025 -
Update version.txt after v0.17.4 release
#7460 merged
Jul 31, 2025 -
TiledFusedLogitsLoss
bug fix#7459 merged
Jul 31, 2025 -
adding TiledFusedLogitsLoss
#7437 merged
Jul 30, 2025 -
Fix: UnboundLocalError for variable 'dim' about issue
#7449 merged
Jul 28, 2025 -
Update version.txt after 0.17.3 release.
#7455 merged
Jul 28, 2025 -
Fix: Adapt Llama injection policy for newer transformers versions
#7443 merged
Jul 26, 2025 -
Remove additional unused tests (human-eval)
#7445 merged
Jul 24, 2025 -
[ALST] fix typo in the url part2
#7446 merged
Jul 23, 2025 -
[ALST] fix typo in the url
#7444 merged
Jul 23, 2025 -
Remove unused yaml test configurations and update README
#7441 merged
Jul 22, 2025
5 Pull requests opened by 5 people
-
[AMD][ROCm] Improve support of AMD
#7448 opened
Jul 24, 2025 -
Support Muon Optimizer
#7454 opened
Jul 28, 2025 -
Add EXAONE 4.0 model support for DeepSpeed inference v2 @
#7456 opened
Jul 29, 2025 -
Add world-size getter in Engine
#7479 opened
Aug 9, 2025 -
DeepCompile ZeRO-3: robust allgather for uneven shards; fix profiling…
#7489 opened
Aug 15, 2025
16 Issues closed by 9 people
-
[REQUEST]
#7476 closed
Aug 12, 2025 -
[BUG] Qwen3 MoE 30B-A3B training stuck
#7461 closed
Aug 6, 2025 -
Error when installing deepspeed with pip (Not sure if this is a bug or not)
#7358 closed
Aug 5, 2025 -
[BUG] ModuleNotFoundError: No module named 'bitsandbytes.mlu_setup'
#7392 closed
Aug 5, 2025 -
nv-torch-nightly-v100 CI test failure
#7195 closed
Aug 5, 2025 -
nv-ds-chat CI test failure
#7213 closed
Aug 5, 2025 -
File not found error with pip install
#7451 closed
Aug 5, 2025 -
[BUG]when use 'overlap_comm:True' w/ 'contiguous_gradients:True', grad_norm is NaN
#7188 closed
Aug 4, 2025 -
[BUG] Memory leak when using adam_offload and save_checkpoint
#7370 closed
Aug 1, 2025 -
Gradient not accumulated across nodes
#7419 closed
Jul 27, 2025 -
[BUG]DeepSpeed compatibility issues
#7432 closed
Jul 25, 2025 -
[REQUEST] Communcation Logging Raw Data and Fine Grained Adjustments
#7403 closed
Jul 23, 2025 -
DeepSpeed async_io requires libaio-0.3.112 or newer, breaks on libaio-0.3.111 (e.g., Fedora/EL9)
#7346 closed
Jul 23, 2025
25 Issues opened by 24 people
-
[BUG] Accuracy fluctuation with tensor parallel on different card number
#7500 opened
Aug 20, 2025 -
[BUG]Deepspeed (v0.15.4 ~v0.16.9) Zero3 training performance is slow,compare than v0.13.1
#7499 opened
Aug 19, 2025 -
[REQUEST] Add automatic logging of parallelism and ZeRO config to WandbMonitor
#7494 opened
Aug 16, 2025 -
[BUG] No backpropagation after micro-batch-id ≥ 3 with MPI backend on Jetson Orin AGX
#7492 opened
Aug 15, 2025 -
Model saved from deepspeed and accelerate cannot be loaded or incomeplete
#7490 opened
Aug 15, 2025 -
[BUG] Cuda failure 700 when use deepcompile with zero stage 3
#7487 opened
Aug 14, 2025 -
[BUG] UlyssesSPDataLoaderAdapter returns duplicate data
#7484 opened
Aug 12, 2025 -
[BUG] FlopsProfiler will hit error when sequence parallel enabled
#7483 opened
Aug 12, 2025 -
[BUG] GPU OOM when finetune Qwen2.5-14B with ZeRO2+offload on 4xA100 40G cards
#7482 opened
Aug 11, 2025 -
[REQUEST]
#7480 opened
Aug 9, 2025 -
[REQUEST] Auto-Tuning CPU Core Binding for DeepSpeed&ZenFlow
#7478 opened
Aug 9, 2025 -
[BUG] Abnormal loss in deepspeed v0.17.2 + ulysess training, not decreasing.
#7473 opened
Aug 8, 2025 -
[BUG]deepspeed>=v0.17.3 caused an error in megatron's `initialize_model_parallel`
#7470 opened
Aug 7, 2025 -
Failing to build with DS_BUILD_OPS=1 due to missing nccl.h file
#7468 opened
Aug 6, 2025 -
nv-torch-nightly-v100 CI test failure
#7467 opened
Aug 6, 2025 -
[BUG] Guard check fails after deep-compiling a model that calls tensor.expand()
#7466 opened
Aug 4, 2025 -
[QUESTION/HELP] ZERO3 get weight participate in loss
#7464 opened
Aug 1, 2025 -
[BUG] ZeRO-3 partition does not work in Ulysses SP tutorial
#7458 opened
Jul 30, 2025 -
[REQUEST] Add support for EXAONE 4.0 models
#7453 opened
Jul 26, 2025 -
[REQUEST] Defer detection of op builder compatibility until build time
#7452 opened
Jul 25, 2025
20 Unresolved conversations
Sometimes conversations happen on old items that aren’t yet closed. Here is a list of all the Issues and Pull Requests with unresolved conversations.
-
Some question of gradient accumulation
#7439 commented on
Jul 23, 2025 • 0 new comments -
How to properly use tensor_parallel while applying also Zero Stage 3
#7389 commented on
Jul 23, 2025 • 0 new comments -
[BUG] Ulysses DistributedAttention silently produces incorrect output when #GPUs does not divide global sequence length
#7384 commented on
Jul 24, 2025 • 0 new comments -
[BUG] AttributeError: 'Linear' object has no attribute 'ds_grads_remaining'
#7203 commented on
Jul 24, 2025 • 0 new comments -
Gradient of the loss w.r.t sharded parameters
#7237 commented on
Jul 25, 2025 • 0 new comments -
GPUUtil-0 remains 0 during the process loading a 72B model
#6970 commented on
Jul 31, 2025 • 0 new comments -
[BUG] Receiving CUDA error: invalid argument using pytorch 2.7 with deepspeed 0.16.4 with Cuda 12.8
#7150 commented on
Aug 1, 2025 • 0 new comments -
[BUG]RuntimeError: disagreement between rank0 and rank1: rank0:
#5799 commented on
Aug 1, 2025 • 0 new comments -
[BUG] - Multiple 5090s failing on deepspeed.initialize()
#7261 commented on
Aug 1, 2025 • 0 new comments -
Issue with DeepSpeed Inference - Multiple Processes for Model Loading and Memory Allocation
#4052 commented on
Aug 3, 2025 • 0 new comments -
[BUG] zero2 and zero3 has different behavior using the same hyperparameter to train a large model
#4298 commented on
Aug 4, 2025 • 0 new comments -
[BUG] Memory is enough for training by using zero-3, but OOM occurred after enabling DeepCompile
#7434 commented on
Aug 6, 2025 • 0 new comments -
[REQUEST] Partial weight load on demand
#4719 commented on
Aug 6, 2025 • 0 new comments -
[ERROR] [launch.py:321:sigkill_handler] exits with return code = -11
#5690 commented on
Aug 8, 2025 • 0 new comments -
nv-nightly CI test failure
#7140 commented on
Aug 20, 2025 • 0 new comments -
Enable python 3.11 and 3.12 tests
#7007 commented on
Aug 11, 2025 • 0 new comments -
Update Domino for Llama3
#7084 commented on
Aug 11, 2025 • 0 new comments -
gather output layout support for column parallel
#7181 commented on
Jul 22, 2025 • 0 new comments -
Create COMMITTERS_RESPONSIBILITY.md
#7300 commented on
Aug 12, 2025 • 0 new comments -
Support DeepSpeed offload and reload states with ZeRO1 and ZeRO2
#7421 commented on
Aug 19, 2025 • 0 new comments