Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[CodeCamp #18] add ops bbox_overlaps #2477

Merged
merged 8 commits into from
Jan 30, 2023

Conversation

enemy1205
Copy link
Contributor

@enemy1205 enemy1205 commented Dec 7, 2022

Thanks for your contribution and we appreciate it a lot. The following instructions would make your pull request more healthy and more easily get feedback. If you do not understand some items, don't worry, just make the pull request and seek help from maintainers.

Motivation

Add CPU implementation of ops bbox_overlaps

cpu version
截图 2023-01-14 10-18-47

pure python version
截图 2023-01-14 10-18-13

Modification

Please briefly describe what modification is made in this PR.

BC-breaking (Optional)

No

Use cases (Optional)

passed tests/test_ops/test_bbox.py

Checklist

Before PR:

  • I have read and followed the workflow indicated in the CONTRIBUTING.md to create this PR.
  • Pre-commit or linting tools indicated in CONTRIBUTING.md are used to fix the potential lint issues.
  • Bug fixes are covered by unit tests, the case that causes the bug should be added in the unit tests.
  • New functionalities are covered by complete unit tests. If not, please add more unit test to ensure the correctness.
  • The documentation has been modified accordingly, including docstring or example tutorials.

After PR:

  • If the modification has potential influence on downstream or other related projects, this PR should be tested with some of those projects, like MMDet or MMCls.
  • CLA has been signed and all committers have signed the CLA in this PR.

@CLAassistant
Copy link

CLAassistant commented Dec 7, 2022

CLA assistant check
All committers have signed the CLA.

@grimoire
Copy link
Member

grimoire commented Dec 8, 2022

Since we already have a pure PyTorch implementation, could you provide a benchmark for the speed-up of the new op?

@zhouzaida zhouzaida changed the title [CodeCamp#18] add ops bbox_overlaps [CodeCamp #18] add ops bbox_overlaps Dec 8, 2022
@enemy1205
Copy link
Contributor Author

Since we already have a pure PyTorch implementation, could you provide a benchmark for the speed-up of the new op?

I tested it using pytest-benchmark as follows :
Firstly ,in file tests/test_ops/test_bbox.py , since the two calls are the same function, only the parameters are different, I commented out the semi-precision call in order to prevent it from averaging

    @pytest.mark.parametrize('device', [
        'cpu',
        pytest.param(
            'cuda',
            marks=pytest.mark.skipif(
                not IS_CUDA_AVAILABLE, reason='requires CUDA support')),
        pytest.param(
            'mlu',
            marks=pytest.mark.skipif(
                not IS_MLU_AVAILABLE, reason='requires MLU support')),
        pytest.param(
            'mps',
            marks=pytest.mark.skipif(
                not IS_MPS_AVAILABLE, reason='requires MPS support'))
    ])
    def test_bbox_overlaps_float(self, device,benchmark):
        benchmark(self._test_bbox_overlaps,device, dtype=torch.float)

    # @pytest.mark.parametrize('device', [
    #     pytest.param(
    #         'cuda',
    #         marks=pytest.mark.skipif(
    #             not IS_CUDA_AVAILABLE, reason='requires CUDA support')),
    #     pytest.param(
    #         'mlu',
    #         marks=pytest.mark.skipif(
    #             not IS_MLU_AVAILABLE, reason='requires MLU support'))
    # ])
    # def test_bbox_overlaps_half(self, device,benchmark):
    #     benchmark(self._test_bbox_overlaps,device, dtype=torch.half)

Then ,in file mmcv/ops/bbox.py
In the end of the function, I implemented calls to the C++ and pure pytorch versions by commenting out the other one in turn

    mode_dict = {'iou': 0, 'iof': 1}
    assert mode in mode_dict.keys()
    mode_flag = mode_dict[mode]
    # Either the boxes are empty or the length of boxes' last dimension is 4
    assert (bboxes1.size(-1) == 4 or bboxes1.size(0) == 0)
    assert (bboxes2.size(-1) == 4 or bboxes2.size(0) == 0)
    assert offset == 1 or offset == 0

    rows = bboxes1.size(0)
    cols = bboxes2.size(0)
    if aligned:
        assert rows == cols
        ious = bboxes1.new_zeros(rows)
    else:
        ious = bboxes1.new_zeros((rows, cols))

    if rows * cols == 0:
        return ious
    
    # Pure Pytorch
    # return _bbox_overlaps_cpu(
    #         bboxes1, bboxes2, mode=mode, aligned=aligned, offset=offset)
    
    # C++ version
    ext_module.bbox_overlaps(
        bboxes1, bboxes2, ious, mode=mode_flag, aligned=aligned, offset=offset)
    return ious

By executing pytest tests/test_ops/test_bbox.py
The test results are as follows:
image
The top is the C++ version and the bottom is the pure Pytorch version, so there is some speed optimization


void bbox_overlaps_cpu(const Tensor boxes1, const Tensor boxes2, Tensor ious,
const int mode, const bool aligned, const int offset) {
bbox_overlaps_cpu_kernel<float>(boxes1, boxes2, ious, mode, aligned, offset);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Assert the input and output datatype here.

@grimoire
Copy link
Member

grimoire commented Dec 9, 2022

The data size in the unit test is relatively small. Have you benchmark it on large datas (1000+ boxes)?

@enemy1205
Copy link
Contributor Author

benchmark(self._test_bbox_overlaps,device, dtype=torch.float)

As you say,Pytorch optimizes large amounts of data far more efficiently than C++ loops
test code:

        b1 = torch.tensor([[1.0+i, 1.0+i, 3.0+i, 3.0+i] for i in range(100000)]).to(device).type(dtype)
        b2 = torch.tensor([[2.0+i, 2.0+i, 4.0+i, 4.0+i] for i in range(100000)]).to(device).type(dtype)
        should_output = np.array([1/7]*100000)
        out = bbox_overlaps(b1, b2, aligned=True)
        assert np.allclose(out.cpu().numpy(), should_output, 1e-2)

elapsed time

// Pure C++ loop
------------------------------------------------------ benchmark: 1 tests ------------------------------------------------------
Name (time in ms)                      Min       Max      Mean   StdDev    Median      IQR  Outliers     OPS  Rounds  Iterations
--------------------------------------------------------------------------------------------------------------------------------
test_bbox_overlaps_float[cpu]     367.5079  456.5294  405.8569  36.7101  396.9816  60.0164       2;0  2.4639       5           1
--------------------------------------------------------------------------------------------------------------------------------
// C++  with openmp (#pragma omp parallel for)
------------------------------------------------------ benchmark: 1 tests ------------------------------------------------------
Name (time in ms)                      Min       Max      Mean   StdDev    Median      IQR  Outliers     OPS  Rounds  Iterations
--------------------------------------------------------------------------------------------------------------------------------
test_bbox_overlaps_float[cpu]     368.4876  410.3432  381.1380  16.7261  375.3134  14.5139       1;1  2.6237       5           1
--------------------------------------------------------------------------------------------------------------------------------
// Pure Pytorch
------------------------------------------------------ benchmark: 1 tests ------------------------------------------------------
Name (time in ms)                      Min       Max      Mean   StdDev    Median      IQR  Outliers     OPS  Rounds  Iterations
--------------------------------------------------------------------------------------------------------------------------------
test_bbox_overlaps_float[cpu]     191.3801  229.2963  204.5748  14.8945  200.2914  17.5259       1;0  4.8882       5           1
--------------------------------------------------------------------------------------------------------------------------------

So in practice, the pytorch version may indeed be more efficient

mmcv/ops/bbox.py Outdated Show resolved Hide resolved
Copy link
Member

@grimoire grimoire left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

mmcv/ops/csrc/pytorch/cpu/bbox_overlaps_cpu.cpp Outdated Show resolved Hide resolved
@zhouzaida zhouzaida merged commit 422816e into open-mmlab:master Jan 30, 2023
zhouzaida pushed a commit to zhouzaida/mmcv that referenced this pull request Mar 20, 2023
* add ops bbox_overlaps

* format code

* Return the pytorch version

* Intermediate modification

* Solve problems in parameter passing

* revise bug

* "add test case"
zhouzaida pushed a commit to zhouzaida/mmcv that referenced this pull request Mar 20, 2023
* add ops bbox_overlaps

* format code

* Return the pytorch version

* Intermediate modification

* Solve problems in parameter passing

* revise bug

* "add test case"
zhouzaida pushed a commit that referenced this pull request Mar 20, 2023
* add ops bbox_overlaps

* format code

* Return the pytorch version

* Intermediate modification

* Solve problems in parameter passing

* revise bug

* "add test case"
CokeDong pushed a commit to CokeDong/mmcv that referenced this pull request Apr 24, 2023
* add ops bbox_overlaps

* format code

* Return the pytorch version

* Intermediate modification

* Solve problems in parameter passing

* revise bug

* "add test case"
tyomj pushed a commit to tyomj/mmcv that referenced this pull request May 8, 2023
* add ops bbox_overlaps

* format code

* Return the pytorch version

* Intermediate modification

* Solve problems in parameter passing

* revise bug

* "add test case"
fwenguang pushed a commit to fwenguang/mmcv that referenced this pull request Dec 19, 2023
* add ops bbox_overlaps

* format code

* Return the pytorch version

* Intermediate modification

* Solve problems in parameter passing

* revise bug

* "add test case"
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants