Skip to content

Conversation

@codeflash-ai
Copy link

@codeflash-ai codeflash-ai bot commented Oct 29, 2025

📄 14% (0.14x) speedup for get_average_bounding_box in inference/core/workflows/core_steps/fusion/detections_consensus/v1.py

⏱️ Runtime : 2.61 milliseconds 2.29 milliseconds (best of 403 runs)

📝 Explanation and details

The optimized code replaces np.mean() with a manual calculation using np.add.reduce() followed by division. This achieves a 14% speedup by eliminating the overhead of NumPy's mean function.

Key optimization:

  • Changed np.mean(detections.xyxy, axis=0) to np.add.reduce(detections.xyxy, axis=0) / len(detections)

Why this is faster:
np.mean() internally performs additional operations like handling NaN values, dtype validation, and other statistical computations. By using np.add.reduce() (which efficiently sums along an axis) and manually dividing by the length, we bypass this overhead and perform only the essential mathematical operations needed for averaging.

Performance characteristics:

  • Small arrays (1-3 boxes): 50-61% faster, as the overhead reduction is most significant
  • Large arrays (1000+ boxes): 3-7% faster, as the computational cost dominates over function overhead
  • Edge cases (empty detections): Minimal improvement (~3%) since they hit the early return

This optimization is particularly effective for typical computer vision workloads where bounding box averaging involves small to medium-sized detection sets, making it ideal for real-time inference scenarios.

Correctness verification report:

Test Status
⚙️ Existing Unit Tests 73 Passed
🌀 Generated Regression Tests 36 Passed
⏪ Replay Tests 🔘 None Found
🔎 Concolic Coverage Tests 🔘 None Found
📊 Tests Coverage 100.0%
⚙️ Existing Unit Tests and Runtime
Test File::Test Function Original ⏱️ Optimized ⏱️ Speedup
workflows/unit_tests/core_steps/fusion/test_detections_consensus.py::test_get_average_bounding_box_when_multiple_elements_provided 27.4μs 18.3μs 49.3%✅
workflows/unit_tests/core_steps/fusion/test_detections_consensus.py::test_get_average_bounding_box_when_single_element_provided 27.2μs 18.1μs 50.1%✅
🌀 Generated Regression Tests and Runtime
import pytest
from inference.core.workflows.core_steps.fusion.detections_consensus.v1 import \
    get_average_bounding_box


# --- Helper class for test inputs ---
class Detections:
    # Simple wrapper to mimic detections object with .xyxy attribute
    def __init__(self, xyxy):
        self.xyxy = xyxy
    def __len__(self):
        return len(self.xyxy)

# --- Unit tests ---

# ----------- BASIC TEST CASES -----------

def test_single_detection():
    # One box: output should be the same box
    det = Detections([[1, 2, 3, 4]])
    codeflash_output = get_average_bounding_box(det) # 25.3μs -> 15.6μs (61.5% faster)

def test_two_identical_boxes():
    # Two identical boxes: output should be the same box
    det = Detections([[1, 2, 3, 4], [1, 2, 3, 4]])
    codeflash_output = get_average_bounding_box(det) # 22.4μs -> 14.9μs (50.1% faster)

def test_two_different_boxes():
    # Two boxes: output should be the coordinate-wise mean
    det = Detections([[0, 0, 10, 10], [10, 10, 20, 20]])
    codeflash_output = get_average_bounding_box(det); avg = codeflash_output # 21.6μs -> 13.6μs (58.6% faster)

def test_three_boxes():
    # Three boxes: output should be the coordinate-wise mean
    det = Detections([[1, 2, 3, 4], [5, 6, 7, 8], [9, 10, 11, 12]])
    codeflash_output = get_average_bounding_box(det); avg = codeflash_output # 21.6μs -> 13.4μs (60.4% faster)

def test_float_coordinates():
    # Boxes with float coordinates
    det = Detections([[1.5, 2.5, 3.5, 4.5], [2.5, 3.5, 4.5, 5.5]])
    codeflash_output = get_average_bounding_box(det); avg = codeflash_output # 20.8μs -> 13.2μs (57.1% faster)

# ----------- EDGE TEST CASES -----------

def test_empty_detections():
    # No detections: output should be all zeros
    det = Detections([])
    codeflash_output = get_average_bounding_box(det) # 665ns -> 637ns (4.40% faster)

def test_zero_area_boxes():
    # Boxes where x_min == x_max and y_min == y_max
    det = Detections([[1, 2, 1, 2], [3, 4, 3, 4]])
    codeflash_output = get_average_bounding_box(det); avg = codeflash_output # 25.5μs -> 16.3μs (56.4% faster)

def test_negative_coordinates():
    # Boxes with negative coordinates
    det = Detections([[-1, -1, 1, 1], [-2, -2, 2, 2]])
    codeflash_output = get_average_bounding_box(det); avg = codeflash_output # 22.3μs -> 14.2μs (57.4% faster)

def test_mixed_sign_coordinates():
    # Boxes with mixed positive and negative coordinates
    det = Detections([[-10, 10, 10, -10], [10, -10, -10, 10]])
    codeflash_output = get_average_bounding_box(det); avg = codeflash_output # 21.4μs -> 13.7μs (56.9% faster)

def test_large_integer_coordinates():
    # Boxes with very large integer coordinates
    det = Detections([[1000000, 2000000, 3000000, 4000000], [4000000, 3000000, 2000000, 1000000]])
    codeflash_output = get_average_bounding_box(det); avg = codeflash_output # 20.9μs -> 13.4μs (55.8% faster)

def test_minimal_float_difference():
    # Boxes with minimal float difference
    det = Detections([[1.000001, 2.000001, 3.000001, 4.000001], [1.000002, 2.000002, 3.000002, 4.000002]])
    codeflash_output = get_average_bounding_box(det); avg = codeflash_output # 20.8μs -> 13.3μs (57.0% faster)

def test_non_square_boxes():
    # Boxes with width != height
    det = Detections([[0, 0, 10, 5], [0, 0, 20, 10]])
    codeflash_output = get_average_bounding_box(det); avg = codeflash_output # 21.6μs -> 13.9μs (55.8% faster)

def test_extreme_values():
    # Boxes with extreme float values
    det = Detections([[float('inf'), float('-inf'), float('inf'), float('-inf')], [1.0, 2.0, 3.0, 4.0]])
    codeflash_output = get_average_bounding_box(det); avg = codeflash_output # 20.2μs -> 13.0μs (55.6% faster)

def test_nan_values():
    # Boxes with NaN values
    import math
    det = Detections([[float('nan'), 2.0, 3.0, 4.0], [1.0, float('nan'), 3.0, 4.0]])
    codeflash_output = get_average_bounding_box(det); avg = codeflash_output # 19.9μs -> 12.8μs (55.5% faster)

# ----------- LARGE SCALE TEST CASES -----------

def test_many_identical_boxes():
    # 1000 identical boxes: output should be the same box
    box = [1, 2, 3, 4]
    det = Detections([box for _ in range(1000)])
    codeflash_output = get_average_bounding_box(det); avg = codeflash_output # 206μs -> 194μs (6.13% faster)

def test_many_varied_boxes():
    # 1000 boxes with increasing coordinates
    det = Detections([[i, i+1, i+2, i+3] for i in range(1000)])
    codeflash_output = get_average_bounding_box(det); avg = codeflash_output # 213μs -> 198μs (7.81% faster)

def test_large_float_boxes():
    # 1000 boxes with large float values
    det = Detections([[float(i)*1e6, float(i+1)*1e6, float(i+2)*1e6, float(i+3)*1e6] for i in range(1000)])
    codeflash_output = get_average_bounding_box(det); avg = codeflash_output # 206μs -> 193μs (6.76% faster)

def test_performance_on_large_input():
    # Test that function runs quickly for 1000 boxes
    import time
    det = Detections([[i, i+1, i+2, i+3] for i in range(1000)])
    start = time.time()
    codeflash_output = get_average_bounding_box(det); avg = codeflash_output # 210μs -> 195μs (7.66% faster)
    duration = time.time() - start

def test_large_random_boxes():
    # 1000 boxes with random coordinates
    import random
    random.seed(42)
    boxes = []
    for _ in range(1000):
        x1 = random.uniform(-1000, 1000)
        y1 = random.uniform(-1000, 1000)
        x2 = x1 + random.uniform(0, 100)
        y2 = y1 + random.uniform(0, 100)
        boxes.append([x1, y1, x2, y2])
    det = Detections(boxes)
    codeflash_output = get_average_bounding_box(det); avg = codeflash_output # 198μs -> 187μs (5.85% faster)
    for v in avg:
        pass
# codeflash_output is used to check that the output of the original code is the same as that of the optimized code.
#------------------------------------------------
from typing import List, Tuple

# imports
import pytest
from inference.core.workflows.core_steps.fusion.detections_consensus.v1 import \
    get_average_bounding_box


class Detections:
    """
    Minimal Detections class for testing.
    Stores bounding boxes in xyxy format: (x_min, y_min, x_max, y_max)
    """
    def __init__(self, xyxy: List[Tuple[float, float, float, float]]):
        self.xyxy = xyxy

    def __len__(self):
        return len(self.xyxy)
from inference.core.workflows.core_steps.fusion.detections_consensus.v1 import \
    get_average_bounding_box

# ------------------ UNIT TESTS ------------------

# ---- Basic Test Cases ----

def test_single_box():
    # One box, result should be the same box
    detections = Detections([(1.0, 2.0, 3.0, 4.0)])
    codeflash_output = get_average_bounding_box(detections) # 25.6μs -> 17.1μs (50.2% faster)

def test_two_identical_boxes():
    # Two identical boxes, average should be the same as the box
    detections = Detections([(5.0, 5.0, 10.0, 10.0), (5.0, 5.0, 10.0, 10.0)])
    codeflash_output = get_average_bounding_box(detections) # 23.2μs -> 15.0μs (54.6% faster)

def test_two_different_boxes():
    # Two different boxes, average should be the mean of each coordinate
    detections = Detections([(0.0, 0.0, 10.0, 10.0), (10.0, 10.0, 20.0, 20.0)])
    expected = (5.0, 5.0, 15.0, 15.0)
    codeflash_output = get_average_bounding_box(detections) # 21.0μs -> 13.5μs (54.9% faster)

def test_three_boxes():
    # Three boxes, average should be the mean of each coordinate
    detections = Detections([
        (0.0, 0.0, 10.0, 10.0),
        (10.0, 10.0, 20.0, 20.0),
        (5.0, 5.0, 15.0, 15.0)
    ])
    expected = (
        (0.0 + 10.0 + 5.0) / 3,
        (0.0 + 10.0 + 5.0) / 3,
        (10.0 + 20.0 + 15.0) / 3,
        (10.0 + 20.0 + 15.0) / 3,
    )
    codeflash_output = get_average_bounding_box(detections) # 20.8μs -> 13.6μs (53.3% faster)

# ---- Edge Test Cases ----

def test_empty_detections():
    # No detections, should return (0.0, 0.0, 0.0, 0.0)
    detections = Detections([])
    codeflash_output = get_average_bounding_box(detections) # 663ns -> 644ns (2.95% faster)

def test_negative_coordinates():
    # Boxes with negative coordinates
    detections = Detections([(-1.0, -2.0, -3.0, -4.0), (1.0, 2.0, 3.0, 4.0)])
    expected = (0.0, 0.0, 0.0, 0.0)
    codeflash_output = get_average_bounding_box(detections) # 23.7μs -> 15.3μs (54.9% faster)

def test_mixed_sign_coordinates():
    # Boxes with mixed positive and negative coordinates
    detections = Detections([(-10.0, 10.0, 20.0, -20.0), (10.0, -10.0, -20.0, 20.0)])
    expected = (
        (-10.0 + 10.0) / 2,
        (10.0 + -10.0) / 2,
        (20.0 + -20.0) / 2,
        (-20.0 + 20.0) / 2
    )
    codeflash_output = get_average_bounding_box(detections) # 20.9μs -> 13.3μs (56.9% faster)

def test_zero_area_boxes():
    # Boxes with zero area (x_min == x_max, y_min == y_max)
    detections = Detections([
        (5.0, 5.0, 5.0, 5.0),
        (10.0, 10.0, 10.0, 10.0)
    ])
    expected = ((5.0 + 10.0)/2, (5.0 + 10.0)/2, (5.0 + 10.0)/2, (5.0 + 10.0)/2)
    codeflash_output = get_average_bounding_box(detections) # 20.7μs -> 13.1μs (57.9% faster)

def test_float_precision():
    # Boxes with float coordinates, check precision
    detections = Detections([
        (0.1, 0.2, 0.3, 0.4),
        (0.5, 0.6, 0.7, 0.8)
    ])
    expected = (
        (0.1 + 0.5) / 2,
        (0.2 + 0.6) / 2,
        (0.3 + 0.7) / 2,
        (0.4 + 0.8) / 2
    )
    codeflash_output = get_average_bounding_box(detections); result = codeflash_output # 19.8μs -> 12.5μs (58.2% faster)

def test_large_coordinates():
    # Boxes with very large coordinates
    detections = Detections([
        (1e9, 2e9, 3e9, 4e9),
        (1e9, 2e9, 3e9, 4e9)
    ])
    expected = (1e9, 2e9, 3e9, 4e9)
    codeflash_output = get_average_bounding_box(detections) # 19.6μs -> 12.6μs (55.1% faster)

def test_small_coordinates():
    # Boxes with very small coordinates
    detections = Detections([
        (1e-9, 2e-9, 3e-9, 4e-9),
        (1e-9, 2e-9, 3e-9, 4e-9)
    ])
    expected = (1e-9, 2e-9, 3e-9, 4e-9)
    codeflash_output = get_average_bounding_box(detections) # 19.3μs -> 12.5μs (53.9% faster)

def test_non_integer_length():
    # Boxes with fractional values, odd number of boxes
    detections = Detections([
        (0.0, 0.0, 1.0, 1.0),
        (2.0, 2.0, 3.0, 3.0),
        (4.0, 4.0, 5.0, 5.0)
    ])
    expected = (
        (0.0 + 2.0 + 4.0)/3,
        (0.0 + 2.0 + 4.0)/3,
        (1.0 + 3.0 + 5.0)/3,
        (1.0 + 3.0 + 5.0)/3
    )
    codeflash_output = get_average_bounding_box(detections) # 20.1μs -> 12.7μs (57.6% faster)

# ---- Large Scale Test Cases ----

def test_many_boxes_identical():
    # Many identical boxes, average should be the same as the box
    box = (1.0, 2.0, 3.0, 4.0)
    detections = Detections([box] * 1000)
    codeflash_output = get_average_bounding_box(detections) # 190μs -> 184μs (3.40% faster)

def test_many_boxes_varied():
    # Many boxes with varied coordinates, average should be correct
    boxes = [(float(i), float(i+1), float(i+2), float(i+3)) for i in range(1000)]
    detections = Detections(boxes)
    n = 1000
    sum_x_min = sum(float(i) for i in range(n))
    sum_y_min = sum(float(i+1) for i in range(n))
    sum_x_max = sum(float(i+2) for i in range(n))
    sum_y_max = sum(float(i+3) for i in range(n))
    expected = (
        sum_x_min / n,
        sum_y_min / n,
        sum_x_max / n,
        sum_y_max / n,
    )
    codeflash_output = get_average_bounding_box(detections); result = codeflash_output # 197μs -> 187μs (5.43% faster)

def test_large_scale_negative_and_positive():
    # Large number of boxes with both negative and positive coordinates
    boxes = [(-i, i, i*2, -i*2) for i in range(1, 1001)]
    detections = Detections(boxes)
    n = 1000
    sum_x_min = sum(-i for i in range(1, n+1))
    sum_y_min = sum(i for i in range(1, n+1))
    sum_x_max = sum(i*2 for i in range(1, n+1))
    sum_y_max = sum(-i*2 for i in range(1, n+1))
    expected = (
        sum_x_min / n,
        sum_y_min / n,
        sum_x_max / n,
        sum_y_max / n,
    )
    codeflash_output = get_average_bounding_box(detections); result = codeflash_output # 210μs -> 196μs (7.22% faster)

def test_large_scale_float_precision():
    # Large number of boxes with small float values
    boxes = [(i * 1e-6, i * 2e-6, i * 3e-6, i * 4e-6) for i in range(1000)]
    detections = Detections(boxes)
    n = 1000
    sum_x_min = sum(i * 1e-6 for i in range(n))
    sum_y_min = sum(i * 2e-6 for i in range(n))
    sum_x_max = sum(i * 3e-6 for i in range(n))
    sum_y_max = sum(i * 4e-6 for i in range(n))
    expected = (
        sum_x_min / n,
        sum_y_min / n,
        sum_x_max / n,
        sum_y_max / n,
    )
    codeflash_output = get_average_bounding_box(detections); result = codeflash_output # 194μs -> 184μs (5.40% faster)

def test_large_scale_zero_area():
    # Large number of zero-area boxes
    boxes = [(i, i, i, i) for i in range(1000)]
    detections = Detections(boxes)
    n = 1000
    avg = sum(i for i in range(n)) / n
    expected = (avg, avg, avg, avg)
    codeflash_output = get_average_bounding_box(detections); result = codeflash_output # 205μs -> 192μs (6.27% faster)
# codeflash_output is used to check that the output of the original code is the same as that of the optimized code.

To edit these changes git checkout codeflash/optimize-get_average_bounding_box-mhbv90i8 and push.

Codeflash

The optimized code replaces `np.mean()` with a manual calculation using `np.add.reduce()` followed by division. This achieves a 14% speedup by eliminating the overhead of NumPy's mean function.

**Key optimization:**
- Changed `np.mean(detections.xyxy, axis=0)` to `np.add.reduce(detections.xyxy, axis=0) / len(detections)`

**Why this is faster:**
`np.mean()` internally performs additional operations like handling NaN values, dtype validation, and other statistical computations. By using `np.add.reduce()` (which efficiently sums along an axis) and manually dividing by the length, we bypass this overhead and perform only the essential mathematical operations needed for averaging.

**Performance characteristics:**
- Small arrays (1-3 boxes): 50-61% faster, as the overhead reduction is most significant
- Large arrays (1000+ boxes): 3-7% faster, as the computational cost dominates over function overhead
- Edge cases (empty detections): Minimal improvement (~3%) since they hit the early return

This optimization is particularly effective for typical computer vision workloads where bounding box averaging involves small to medium-sized detection sets, making it ideal for real-time inference scenarios.
@codeflash-ai codeflash-ai bot requested a review from mashraf-222 October 29, 2025 10:42
@codeflash-ai codeflash-ai bot added ⚡️ codeflash Optimization PR opened by Codeflash AI 🎯 Quality: High Optimization Quality according to Codeflash labels Oct 29, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

⚡️ codeflash Optimization PR opened by Codeflash AI 🎯 Quality: High Optimization Quality according to Codeflash

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant