[Feature] ms_deform_attn performance optimization #2616

Wickyzheng · 2023-02-23T02:38:51Z

Thanks for your contribution and we appreciate it a lot. The following instructions would make your pull request more healthy and more easily get feedback. If you do not understand some items, don't worry, just make the pull request and seek help from maintainers.

Motivation

The motivation of the PR is to improve the performance of ms_deform_attn on Cambricon MLU backend, especially for small channel cases.

Modification

MLU src code
Modify MLU src code of ms_deform_attn_mlu_kernel in directory mmcv/ops/csrc/common/mlu/ms_deform_attn_mlu_kernel.mlu.
PyTorch adaptation
Modify ms_deform_attn for PyTorch in mmcv/ops/csrc/pytorch/mlu/ms_deform_attn_mlu.cpp.

BC-breaking (Optional)

Does the modification introduce changes that break the backward-compatibility of the downstream repositories?
If so, please describe how it breaks the compatibility and how the downstream projects should modify their code to keep compatibility with this PR.

Use cases (Optional)

If this PR introduces a new feature, it is better to list some use cases here, and update the documentation.

Checklist

Before PR:

I have read and followed the workflow indicated in the CONTRIBUTING.md to create this PR.
Pre-commit or linting tools indicated in CONTRIBUTING.md are used to fix the potential lint issues.
Bug fixes are covered by unit tests, the case that causes the bug should be added in the unit tests.
New functionalities are covered by complete unit tests. If not, please add more unit test to ensure the correctness.
The documentation has been modified accordingly, including docstring or example tutorials.

After PR:

If the modification has potential influence on downstream or other related projects, this PR should be tested with some of those projects, like MMDet or MMCls.
CLA has been signed and all committers have signed the CLA in this PR.

grimoire

LGTM

* ms_opt_v2 * ms_opt_v2_1 * optimize MultiScaleDeformableAttention ops for MLU * ms_opt_v2_1 * [Feature] ms_deform_attn performance optimization V2 * [Feature] ms_deform_attn performance optimization V2 * [Feature] ms_deform_attn performance optimization V2 * [Feature] ms_deform_attn performance optimization V2 * [Feature] ms_deform_attn performance optimization V2 * [Feature] ms_deform_attn performance optimization V2 * [Feature] ms_deform_attn performance optimization V2 --------- Co-authored-by: dongchengwei <dongchengwei@cambricon.com>

* [Feature] Support Voxelization with cambricon MLU device (#2500) * [Feature] Support hard_voxelize with cambricon MLU backend * [Feature](bangc-ops): add voxelization op * [Feature](bangc-ops): add voxelization op * [Feature](bangc-ops): add voxelization op * [Feature](bangc-ops): add voxelization op * [Feature](bangc-ops): add voxelization op * [Feature](bangc-ops): add voxelization op * [Feature](bangc-ops): add voxelization op * [Feature](bangc-ops): add voxelization op * [Enhance] Optimize the performace of ms_deform_attn for MLU device (#2510) * ms_opt * ms_opt * ms_opt * ms_opt * ms_opt * [Feature] ms_deform_attn performance optimization * [Feature] ms_deform_attn performance optimization * [Feature] ms_deform_attn performance optimization * [Feature] Support ball_query with cambricon MLU backend and mlu-ops library. (#2520) * [Feature] Support ball_query with cambricon MLU backend and mlu-ops library. * [Fix] update operator data layout setting. * [Fix] add cxx compile option to avoid symbol conflict. * [Fix] fix lint errors. * [Fix] update ops.md with info of ball_query support by MLU backend. * [Feature] Fix typo. * [Fix] Remove print. * [Fix] get mlu-ops from MMCV_MLU_OPS_PATH env. * [Fix] update MMCV_MLU_OPS_PATH check logic. * [Fix] update error info when failed to download mlu-ops. * [Fix] check mlu-ops version matching info in mmcv. * [Fix] revise wrong filename. * [Fix] remove f.close and re. * [Docs] Steps to compile mmcv-full on MLU machine (#2571) * [Docs] Steps to compile mmcv-full on MLU machine * [Docs] Adjust paragraph order * Update docs/zh_cn/get_started/build.md Co-authored-by: Zaida Zhou <58739961+zhouzaida@users.noreply.github.com> * Update docs/zh_cn/get_started/build.md Co-authored-by: Zaida Zhou <58739961+zhouzaida@users.noreply.github.com> * Update docs/en/get_started/build.md Co-authored-by: Zaida Zhou <58739961+zhouzaida@users.noreply.github.com> * Update docs/en/get_started/build.md Co-authored-by: Zaida Zhou <58739961+zhouzaida@users.noreply.github.com> * [Docs] Modify the format --------- Co-authored-by: budefei <budefei@cambricon.com> Co-authored-by: Zaida Zhou <58739961+zhouzaida@users.noreply.github.com> * [Fix] Fix tensor descriptor setting in MLU ball_query. (#2579) * [Feature] Add MLU support for Sparse Convolution op (#2589) * [Feature] Add sparse convolution MLU API * [Feature] update cpp code style * end-of-file * delete libext.a * code style * update ops.md --------- Co-authored-by: budefei <budefei@cambricon.com> * [Enhancement] Replace the implementation of deform_roi_pool with mlu-ops (#2598) * [Feature] Replace the implementation of deform_roi_pool with mlu-ops * [Feature] Modify code --------- Co-authored-by: budefei <budefei@cambricon.com> * [Enhancement] ms_deform_attn performance optimization (#2616) * ms_opt_v2 * ms_opt_v2_1 * optimize MultiScaleDeformableAttention ops for MLU * ms_opt_v2_1 * [Feature] ms_deform_attn performance optimization V2 * [Feature] ms_deform_attn performance optimization V2 * [Feature] ms_deform_attn performance optimization V2 * [Feature] ms_deform_attn performance optimization V2 * [Feature] ms_deform_attn performance optimization V2 * [Feature] ms_deform_attn performance optimization V2 * [Feature] ms_deform_attn performance optimization V2 --------- Co-authored-by: dongchengwei <dongchengwei@cambricon.com> * [Feature] Support NmsRotated with cambricon MLU backend (#2643) * [Feature] Support NmsRotated with cambricon MLU backend * [Feature] remove foolproofs in nms_rotated_mlu.cpp * [Feature] fix lint in test_nms_rotated.py * [Feature] fix kMLU not found in nms_rotated.cpp * [Feature] modify mlu support in nms.py * [Feature] modify nms_rotated support in ops.md * [Feature] modify ops/nms.py * [Enhance] Add a default value for MMCV_MLU_ARGS (#2688) * add mlu_args * add mlu_args * Modify the code --------- Co-authored-by: budefei <budefei@cambricon.com> * [Enhance] Ignore mlu-ops files (#2691) Co-authored-by: budefei <budefei@cambricon.com> --------- Co-authored-by: ZShaopeng <108382403+ZShaopeng@users.noreply.github.com> Co-authored-by: BinZheng <38182684+Wickyzheng@users.noreply.github.com> Co-authored-by: liuduanhui <103939338+DanieeelLiu@users.noreply.github.com> Co-authored-by: budefei <budefei@cambricon.com> Co-authored-by: Zaida Zhou <58739961+zhouzaida@users.noreply.github.com> Co-authored-by: duzekun <108381389+duzekunKTH@users.noreply.github.com> Co-authored-by: dongchengwei <dongchengwei@cambricon.com> Co-authored-by: liuyuan1-v <125547457+liuyuan1-v@users.noreply.github.com>

* ms_opt_v2 * ms_opt_v2_1 * optimize MultiScaleDeformableAttention ops for MLU * ms_opt_v2_1 * [Feature] ms_deform_attn performance optimization V2 * [Feature] ms_deform_attn performance optimization V2 * [Feature] ms_deform_attn performance optimization V2 * [Feature] ms_deform_attn performance optimization V2 * [Feature] ms_deform_attn performance optimization V2 * [Feature] ms_deform_attn performance optimization V2 * [Feature] ms_deform_attn performance optimization V2 --------- Co-authored-by: dongchengwei <dongchengwei@cambricon.com>

* [Feature] Support Voxelization with cambricon MLU device (open-mmlab#2500) * [Feature] Support hard_voxelize with cambricon MLU backend * [Feature](bangc-ops): add voxelization op * [Feature](bangc-ops): add voxelization op * [Feature](bangc-ops): add voxelization op * [Feature](bangc-ops): add voxelization op * [Feature](bangc-ops): add voxelization op * [Feature](bangc-ops): add voxelization op * [Feature](bangc-ops): add voxelization op * [Feature](bangc-ops): add voxelization op * [Enhance] Optimize the performace of ms_deform_attn for MLU device (open-mmlab#2510) * ms_opt * ms_opt * ms_opt * ms_opt * ms_opt * [Feature] ms_deform_attn performance optimization * [Feature] ms_deform_attn performance optimization * [Feature] ms_deform_attn performance optimization * [Feature] Support ball_query with cambricon MLU backend and mlu-ops library. (open-mmlab#2520) * [Feature] Support ball_query with cambricon MLU backend and mlu-ops library. * [Fix] update operator data layout setting. * [Fix] add cxx compile option to avoid symbol conflict. * [Fix] fix lint errors. * [Fix] update ops.md with info of ball_query support by MLU backend. * [Feature] Fix typo. * [Fix] Remove print. * [Fix] get mlu-ops from MMCV_MLU_OPS_PATH env. * [Fix] update MMCV_MLU_OPS_PATH check logic. * [Fix] update error info when failed to download mlu-ops. * [Fix] check mlu-ops version matching info in mmcv. * [Fix] revise wrong filename. * [Fix] remove f.close and re. * [Docs] Steps to compile mmcv-full on MLU machine (open-mmlab#2571) * [Docs] Steps to compile mmcv-full on MLU machine * [Docs] Adjust paragraph order * Update docs/zh_cn/get_started/build.md Co-authored-by: Zaida Zhou <58739961+zhouzaida@users.noreply.github.com> * Update docs/zh_cn/get_started/build.md Co-authored-by: Zaida Zhou <58739961+zhouzaida@users.noreply.github.com> * Update docs/en/get_started/build.md Co-authored-by: Zaida Zhou <58739961+zhouzaida@users.noreply.github.com> * Update docs/en/get_started/build.md Co-authored-by: Zaida Zhou <58739961+zhouzaida@users.noreply.github.com> * [Docs] Modify the format --------- Co-authored-by: budefei <budefei@cambricon.com> Co-authored-by: Zaida Zhou <58739961+zhouzaida@users.noreply.github.com> * [Fix] Fix tensor descriptor setting in MLU ball_query. (open-mmlab#2579) * [Feature] Add MLU support for Sparse Convolution op (open-mmlab#2589) * [Feature] Add sparse convolution MLU API * [Feature] update cpp code style * end-of-file * delete libext.a * code style * update ops.md --------- Co-authored-by: budefei <budefei@cambricon.com> * [Enhancement] Replace the implementation of deform_roi_pool with mlu-ops (open-mmlab#2598) * [Feature] Replace the implementation of deform_roi_pool with mlu-ops * [Feature] Modify code --------- Co-authored-by: budefei <budefei@cambricon.com> * [Enhancement] ms_deform_attn performance optimization (open-mmlab#2616) * ms_opt_v2 * ms_opt_v2_1 * optimize MultiScaleDeformableAttention ops for MLU * ms_opt_v2_1 * [Feature] ms_deform_attn performance optimization V2 * [Feature] ms_deform_attn performance optimization V2 * [Feature] ms_deform_attn performance optimization V2 * [Feature] ms_deform_attn performance optimization V2 * [Feature] ms_deform_attn performance optimization V2 * [Feature] ms_deform_attn performance optimization V2 * [Feature] ms_deform_attn performance optimization V2 --------- Co-authored-by: dongchengwei <dongchengwei@cambricon.com> * [Feature] Support NmsRotated with cambricon MLU backend (open-mmlab#2643) * [Feature] Support NmsRotated with cambricon MLU backend * [Feature] remove foolproofs in nms_rotated_mlu.cpp * [Feature] fix lint in test_nms_rotated.py * [Feature] fix kMLU not found in nms_rotated.cpp * [Feature] modify mlu support in nms.py * [Feature] modify nms_rotated support in ops.md * [Feature] modify ops/nms.py * [Enhance] Add a default value for MMCV_MLU_ARGS (open-mmlab#2688) * add mlu_args * add mlu_args * Modify the code --------- Co-authored-by: budefei <budefei@cambricon.com> * [Enhance] Ignore mlu-ops files (open-mmlab#2691) Co-authored-by: budefei <budefei@cambricon.com> --------- Co-authored-by: ZShaopeng <108382403+ZShaopeng@users.noreply.github.com> Co-authored-by: BinZheng <38182684+Wickyzheng@users.noreply.github.com> Co-authored-by: liuduanhui <103939338+DanieeelLiu@users.noreply.github.com> Co-authored-by: budefei <budefei@cambricon.com> Co-authored-by: Zaida Zhou <58739961+zhouzaida@users.noreply.github.com> Co-authored-by: duzekun <108381389+duzekunKTH@users.noreply.github.com> Co-authored-by: dongchengwei <dongchengwei@cambricon.com> Co-authored-by: liuyuan1-v <125547457+liuyuan1-v@users.noreply.github.com>

* ms_opt_v2 * ms_opt_v2_1 * optimize MultiScaleDeformableAttention ops for MLU * ms_opt_v2_1 * [Feature] ms_deform_attn performance optimization V2 * [Feature] ms_deform_attn performance optimization V2 * [Feature] ms_deform_attn performance optimization V2 * [Feature] ms_deform_attn performance optimization V2 * [Feature] ms_deform_attn performance optimization V2 * [Feature] ms_deform_attn performance optimization V2 * [Feature] ms_deform_attn performance optimization V2 --------- Co-authored-by: dongchengwei <dongchengwei@cambricon.com>

Wickyzheng and others added 4 commits February 21, 2023 01:59

ms_opt_v2

87ac51e

ms_opt_v2_1

aab3d3f

optimize MultiScaleDeformableAttention ops for MLU

010f500

ms_opt_v2_1

6b1867e

mm-assistant bot assigned zhouzaida Feb 23, 2023

Wickyzheng added 7 commits February 23, 2023 06:06

[Feature] ms_deform_attn performance optimization V2

e8cf394

[Feature] ms_deform_attn performance optimization V2

bc38c1c

[Feature] ms_deform_attn performance optimization V2

cd22634

[Feature] ms_deform_attn performance optimization V2

a2c86ab

[Feature] ms_deform_attn performance optimization V2

cae11bb

[Feature] ms_deform_attn performance optimization V2

694e4a5

[Feature] ms_deform_attn performance optimization V2

7851bdc

zhouzaida requested a review from grimoire March 1, 2023 02:57

grimoire approved these changes Mar 2, 2023

View reviewed changes

zhouzaida merged commit 5f12229 into open-mmlab:master Mar 3, 2023

defei-coder mentioned this pull request Mar 27, 2023

Pick MLU modifications from master (1.x) to main (2.x) #2704

Merged

7 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Feature] ms_deform_attn performance optimization #2616

[Feature] ms_deform_attn performance optimization #2616

Wickyzheng commented Feb 23, 2023

grimoire left a comment

[Feature] ms_deform_attn performance optimization #2616

[Feature] ms_deform_attn performance optimization #2616

Conversation

Wickyzheng commented Feb 23, 2023

Motivation

Modification

BC-breaking (Optional)

Use cases (Optional)

Checklist

grimoire left a comment

Choose a reason for hiding this comment