-
Notifications
You must be signed in to change notification settings - Fork 4.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add NPU FusedAdam support #4343
Conversation
* origin/master: (48 commits) Fix autotune to support Triton 2.1 (microsoft#4340) Fix skipped inference tests (microsoft#4336) Suppress noise (microsoft#4310) Fix a bug in the implementation of dequantization for inference (microsoft#3433) DS-Chat BLOOM: Fix Attention mask (microsoft#4338) clear redundant timers (microsoft#4308) Add release version checking (microsoft#4328) Fix Zero3 contiguous grads, reduce scatter false accuracy issue (microsoft#4321) Clean up modeling code (microsoft#4320) Handle empty parameter groups (microsoft#4277) Update README.md (microsoft#4316) README update (microsoft#4303) Update release and bump patch versioning flow (microsoft#4286) added a bert-model check for triton (microsoft#4266) ZeRO-Inference v2 release bump to 0.10.4 Update index.md (microsoft#4297) fix user args parsing of string with spaces on runner (microsoft#4265) ZeRO-Inference refresh (microsoft#4197) AMD Kernel Compatibility Fixes (microsoft#3180) ...
@tjruwase @jeffra @RezaYazdaniAminabadi @cmikeh2 Sorry for annoying, can you guys review this PR ? |
* origin/master: Allow multiple inference engines in single script (microsoft#4384) adds triton flash attention2 kernel (microsoft#4337) Fix llama meta tensor loading in AutoTP and kernel injected inference (microsoft#3608) Fix min torch version (microsoft#4375) Fix multinode runner to properly append to PDSH_SSH_ARGS_APPEND (microsoft#4373) add the missing method (microsoft#4363) Openfold fix (microsoft#4368) deepspeed4science japanese blog (microsoft#4369) deepspeed4science chinese blog (microsoft#4366) Enable workflow dispatch on Torch 1.10 CI tests (microsoft#4361) Update conda env to have max pydantic version (microsoft#4362) add deepspeed4science blog link (microsoft#4364) added check to avoid undefined behavior when the input_id length is greater than max_tokens (microsoft#4349) Add the policy to run llama model from the official repo (microsoft#4313) fix deepspeed4science links (microsoft#4358) DeepSpeed4Science (microsoft#4357) Support InternLM (microsoft#4137) Pass base_dir to model files can be loaded for auto-tp/meta-tensor. (microsoft#4348)
@tjruwase Good day. This PR is approved and ready to be merged. Could you retrigger this workflow and merge it? Thanks :-) |
Sorry for the delay, however there seems to be a formatting issue. Please take a look. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Resolve format checking errors
Co-authored-by: Hz, Ji <hzji210@gmail.com>
Co-authored-by: Hz, Ji <hzji210@gmail.com>
Co-authored-by: Hz, Ji <hzji210@gmail.com>
Co-authored-by: Hz, Ji <hzji210@gmail.com>
Co-authored-by: Hz, Ji <hzji210@gmail.com>
Co-authored-by: Hz, Ji <hzji210@gmail.com>
Co-authored-by: Hz, Ji <hzji210@gmail.com>
Co-authored-by: Hz, Ji <hzji210@gmail.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As long as we modify these two blank lines, the format check error should be solved.
@CurryRice233, it is best to use this guide for formatting issues: https://github.com/microsoft/DeepSpeed/blob/master/CONTRIBUTING.md#prerequisites |
Co-authored-by: Hz, Ji <hzji210@gmail.com>
Co-authored-by: Hz, Ji <hzji210@gmail.com>
Thank you, new skill get😉. By the way, could you retrigger this workflow again? |
@tjruwase hi, could you retrigger this workflow again and merge it? Thanks😀 |
* add npu support dtypes * add npu fused_adam support * add license * Update accelerator/npu_accelerator.py Co-authored-by: Hz, Ji <hzji210@gmail.com> * Update op_builder/npu/fused_adam.py Co-authored-by: Hz, Ji <hzji210@gmail.com> * Update op_builder/npu/fused_adam.py Co-authored-by: Hz, Ji <hzji210@gmail.com> * Update op_builder/npu/fused_adam.py Co-authored-by: Hz, Ji <hzji210@gmail.com> * Update op_builder/npu/fused_adam.py Co-authored-by: Hz, Ji <hzji210@gmail.com> * Update op_builder/npu/fused_adam.py Co-authored-by: Hz, Ji <hzji210@gmail.com> * Update op_builder/npu/fused_adam.py Co-authored-by: Hz, Ji <hzji210@gmail.com> * Update op_builder/npu/fused_adam.py Co-authored-by: Hz, Ji <hzji210@gmail.com> * Update accelerator/npu_accelerator.py Co-authored-by: Hz, Ji <hzji210@gmail.com> * Update accelerator/npu_accelerator.py Co-authored-by: Hz, Ji <hzji210@gmail.com> --------- Co-authored-by: jializheng <jializheng@huawei.com> Co-authored-by: Olatunji Ruwase <olruwase@microsoft.com> Co-authored-by: Hz, Ji <hzji210@gmail.com>
* add npu support dtypes * add npu fused_adam support * add license * Update accelerator/npu_accelerator.py Co-authored-by: Hz, Ji <hzji210@gmail.com> * Update op_builder/npu/fused_adam.py Co-authored-by: Hz, Ji <hzji210@gmail.com> * Update op_builder/npu/fused_adam.py Co-authored-by: Hz, Ji <hzji210@gmail.com> * Update op_builder/npu/fused_adam.py Co-authored-by: Hz, Ji <hzji210@gmail.com> * Update op_builder/npu/fused_adam.py Co-authored-by: Hz, Ji <hzji210@gmail.com> * Update op_builder/npu/fused_adam.py Co-authored-by: Hz, Ji <hzji210@gmail.com> * Update op_builder/npu/fused_adam.py Co-authored-by: Hz, Ji <hzji210@gmail.com> * Update op_builder/npu/fused_adam.py Co-authored-by: Hz, Ji <hzji210@gmail.com> * Update accelerator/npu_accelerator.py Co-authored-by: Hz, Ji <hzji210@gmail.com> * Update accelerator/npu_accelerator.py Co-authored-by: Hz, Ji <hzji210@gmail.com> --------- Co-authored-by: jializheng <jializheng@huawei.com> Co-authored-by: Olatunji Ruwase <olruwase@microsoft.com> Co-authored-by: Hz, Ji <hzji210@gmail.com>
Add NPU FusedAdam support.