Skip to content

add --bind_cores_to_rank to zero offload tutorial #7474

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 2 commits into from
Aug 8, 2025

Conversation

delock
Copy link
Collaborator

@delock delock commented Aug 8, 2025

In ZeRO offload, significant time is spent on CPUAdam, which is CPU code. Thus use --bind_cores_to_rank in deepspeed launch command would help improve the performance of ZeRO offload. This PR add this command to ZeRO offload tutorial to increase user awareness.

For Qwen2.5-3B finetuning on 2 A100-40B cards, running on CPU host with 128 CPU cores, the average step time is as follow, near 1.3x performance improvement:
without --bind_cores_to_rank: 3084.44ms per step
with --bind_cores_to_rank: 2383.16ms per step

@delock delock requested review from loadams and tjruwase as code owners August 8, 2025 06:58
@hwchen2017 hwchen2017 merged commit f03d416 into master Aug 8, 2025
2 checks passed
@hwchen2017 hwchen2017 deleted the gma/zero_offload_doc branch August 8, 2025 17:34
LYMDLUT pushed a commit to LYMDLUT/DeepSpeed that referenced this pull request Aug 20, 2025
In ZeRO offload, significant time is spent on CPUAdam, which is CPU
code. Thus use `--bind_cores_to_rank` in deepspeed launch command would
help improve the performance of ZeRO offload. This PR add this command
to ZeRO offload tutorial to increase user awareness.

For Qwen2.5-3B finetuning on 2 A100-40B cards, running on CPU host with
128 CPU cores, the average step time is as follow, near 1.3x performance
improvement:
without `--bind_cores_to_rank`: 3084.44ms per step
with `--bind_cores_to_rank`: 2383.16ms per step

---------

Co-authored-by: Olatunji Ruwase <tjruwase@gmail.com>
Signed-off-by: lym <letusgo126@126.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants