Skip to content

Conversation

@codeflash-ai
Copy link

@codeflash-ai codeflash-ai bot commented Oct 29, 2025

📄 88% (0.88x) speedup for calculate_least_squares_polygon in inference/core/workflows/core_steps/transformations/dynamic_zones/v1.py

⏱️ Runtime : 70.7 milliseconds 37.7 milliseconds (best of 47 runs)

📝 Explanation and details

The optimized code achieves an 87% speedup through three key improvements:

1. Efficient Distance Calculations

  • Replaced np.linalg.norm(contour - point, axis=1) with np.sqrt(np.einsum('ij,ij->i', contour - point, contour - point)) in find_closest_index
  • Einstein summation (einsum) avoids intermediate array allocations and is more cache-friendly than broadcasting operations

2. Precomputed Index Caching

  • Calculates closest contour indices for all polygon vertices upfront and stores them in closest_indices array
  • Eliminates redundant find_closest_index calls during segment processing
  • Provides better memory locality when accessing precomputed values

3. Optimized Matrix Construction

  • In least_squares_line, replaces np.vstack([x, np.ones_like(x)]).T with direct array filling using np.empty() and column assignment
  • Avoids creating intermediate arrays (ones_like, vstack) and transpose operations
  • Reduces memory allocations and improves cache performance

Test Case Performance
The optimizations are particularly effective for:

  • Large-scale scenarios (1000+ points): 95% speedup
  • Circle/polygon approximations: 65-70% speedup
  • Medium complexity cases (100-500 points): 40-50% speedup
  • Small cases still benefit: 15-22% speedup

The improvements scale well with input size since they reduce algorithmic complexity from O(n²) redundant distance calculations to O(n) precomputed lookups.

Correctness verification report:

Test Status
⚙️ Existing Unit Tests 5 Passed
🌀 Generated Regression Tests 20 Passed
⏪ Replay Tests 🔘 None Found
🔎 Concolic Coverage Tests 🔘 None Found
📊 Tests Coverage 100.0%
⚙️ Existing Unit Tests and Runtime
Test File::Test Function Original ⏱️ Optimized ⏱️ Speedup
workflows/unit_tests/core_steps/transformations/test_dynamic_zones.py::test_calculate_least_squares_polygon 264μs 208μs 26.9%✅
workflows/unit_tests/core_steps/transformations/test_dynamic_zones.py::test_calculate_least_squares_polygon_with_midpoint_fraction 233μs 187μs 24.8%✅
🌀 Generated Regression Tests and Runtime
import numpy as np
# imports
import pytest  # used for our unit tests
from inference.core.workflows.core_steps.transformations.dynamic_zones.v1 import \
    calculate_least_squares_polygon

# unit tests

# ------------------ BASIC TEST CASES ------------------


def test_triangle_contour_and_polygon():
    # Test with a triangle contour and polygon
    contour = np.array([[0,0],[5,10],[10,0],[0,0]])
    polygon = np.array([[0,0],[5,10],[10,0]])
    codeflash_output = calculate_least_squares_polygon(contour, polygon); result = codeflash_output # 222μs -> 187μs (18.5% faster)
    expected = np.array([[0,0],[5,10],[10,0]])



def test_contour_with_two_points():
    # Contour with only two points, should return None for intersections
    contour = np.array([[0,0],[10,0]])
    polygon = np.array([[0,0],[10,0]])
    codeflash_output = calculate_least_squares_polygon(contour, polygon); result = codeflash_output # 192μs -> 167μs (15.0% faster)

def test_polygon_with_one_point():
    # Polygon with one point, should return array with one nan intersection
    contour = np.array([[0,0],[1,1],[2,2]])
    polygon = np.array([[1,1]])
    codeflash_output = calculate_least_squares_polygon(contour, polygon); result = codeflash_output # 49.9μs -> 46.4μs (7.57% faster)

def test_contour_with_colinear_points():
    # Contour points are colinear, lines are parallel, intersections should be nan
    contour = np.array([[0,0],[5,0],[10,0]])
    polygon = np.array([[0,0],[5,0],[10,0]])
    codeflash_output = calculate_least_squares_polygon(contour, polygon); result = codeflash_output # 186μs -> 160μs (16.3% faster)




def test_degenerate_polygon_all_same_point():
    # Polygon where all points are the same
    contour = np.array([[0,0],[1,1],[2,2]])
    polygon = np.array([[1,1],[1,1],[1,1]])
    codeflash_output = calculate_least_squares_polygon(contour, polygon); result = codeflash_output # 88.7μs -> 75.0μs (18.4% faster)

def test_empty_contour():
    # Empty contour, should fail gracefully
    contour = np.array([])
    polygon = np.array([[0,0],[1,1],[2,2]])
    with pytest.raises(ValueError):
        codeflash_output = calculate_least_squares_polygon(contour, polygon); _ = codeflash_output # 17.0μs -> 14.6μs (16.1% faster)


def test_large_circle_contour_and_polygon():
    # Large circle contour, polygon is regular octagon inscribed in circle
    N = 500
    theta = np.linspace(0, 2*np.pi, N, endpoint=False)
    contour = np.stack([50*np.cos(theta)+100, 50*np.sin(theta)+100], axis=-1)
    # Octagon vertices
    oct_theta = np.linspace(0, 2*np.pi, 8, endpoint=False)
    polygon = np.stack([50*np.cos(oct_theta)+100, 50*np.sin(oct_theta)+100], axis=-1)
    codeflash_output = calculate_least_squares_polygon(contour, polygon); result = codeflash_output # 500μs -> 338μs (47.8% faster)
    # Should be close to octagon vertices
    expected = np.round(polygon).astype(int)

def test_large_random_contour_and_polygon():
    # Large random contour, polygon is random subset of contour points
    np.random.seed(42)
    contour = np.random.randint(0, 1000, size=(900,2))
    idxs = np.sort(np.random.choice(np.arange(900), size=10, replace=False))
    polygon = contour[idxs]
    codeflash_output = calculate_least_squares_polygon(contour, polygon); result = codeflash_output # 734μs -> 437μs (67.8% faster)
    # Should be close to polygon points
    expected = np.round(polygon).astype(int)

def test_large_scale_midpoint_fraction():
    # Large contour and polygon, with midpoint_fraction < 1
    N = 800
    contour = np.array([[i, 2*i] for i in range(N)])
    polygon = np.array([[0,0],[N//2, N],[N-1,2*(N-1)]])
    codeflash_output = calculate_least_squares_polygon(contour, polygon, midpoint_fraction=0.5); result = codeflash_output # 286μs -> 204μs (40.0% faster)
    # Should be close to polygon points
    expected = np.array([[0,0],[N//2,N],[N-1,2*(N-1)]])

def test_large_polygon_many_vertices():
    # Large polygon, contour is circle, polygon is 100-point regular polygon
    N = 500
    contour_theta = np.linspace(0, 2*np.pi, N, endpoint=False)
    contour = np.stack([100*np.cos(contour_theta)+200, 100*np.sin(contour_theta)+200], axis=-1)
    M = 100
    poly_theta = np.linspace(0, 2*np.pi, M, endpoint=False)
    polygon = np.stack([100*np.cos(poly_theta)+200, 100*np.sin(poly_theta)+200], axis=-1)
    codeflash_output = calculate_least_squares_polygon(contour, polygon); result = codeflash_output # 4.56ms -> 2.68ms (70.0% faster)
    expected = np.round(polygon).astype(int)
# codeflash_output is used to check that the output of the original code is the same as that of the optimized code.
#------------------------------------------------
import numpy as np
# imports
import pytest
from inference.core.workflows.core_steps.transformations.dynamic_zones.v1 import \
    calculate_least_squares_polygon

# unit tests

# ---------------- BASIC TEST CASES ----------------


def test_triangle_contour_triangle_polygon():
    # Triangle contour and polygon
    contour = np.array([
        [0, 0], [5, 10], [10, 0], [0, 0]
    ])
    polygon = np.array([
        [0, 0], [5, 10], [10, 0]
    ])
    codeflash_output = calculate_least_squares_polygon(contour, polygon); result = codeflash_output # 223μs -> 189μs (18.4% faster)
    expected = np.array([
        [0, 0], [5, 10], [10, 0]
    ])



def test_polygon_with_two_points():
    # Polygon with only two points (degenerate)
    contour = np.array([
        [0, 0], [10, 0], [10, 10], [0, 10], [0, 0]
    ])
    polygon = np.array([
        [0, 0], [10, 10]
    ])
    codeflash_output = calculate_least_squares_polygon(contour, polygon); result = codeflash_output # 187μs -> 163μs (14.8% faster)


def test_contour_with_single_point():
    # Contour with only one point
    contour = np.array([
        [5, 5]
    ])
    polygon = np.array([
        [5, 5], [10, 10], [10, 5]
    ])
    codeflash_output = calculate_least_squares_polygon(contour, polygon); result = codeflash_output # 88.0μs -> 74.2μs (18.6% faster)

def test_empty_contour():
    # Empty contour
    contour = np.empty((0, 2))
    polygon = np.array([
        [0, 0], [10, 0], [10, 10]
    ])
    with pytest.raises(ValueError):
        calculate_least_squares_polygon(contour, polygon) # 28.2μs -> 23.1μs (21.8% faster)


def test_non_integer_coordinates():
    # Test with float coordinates
    contour = np.array([
        [0.5, 0.5], [0.5, 10.5], [10.5, 10.5], [10.5, 0.5], [0.5, 0.5]
    ])
    polygon = np.array([
        [0.5, 0.5], [0.5, 10.5], [10.5, 10.5], [10.5, 0.5]
    ])
    codeflash_output = calculate_least_squares_polygon(contour, polygon); result = codeflash_output # 250μs -> 204μs (22.2% faster)
    expected = np.array([
        [0, 0], [0, 10], [10, 10], [10, 0]
    ])

def test_polygon_colinear_points():
    # Polygon with all points colinear
    contour = np.array([
        [0, 0], [5, 0], [10, 0]
    ])
    polygon = np.array([
        [0, 0], [5, 0], [10, 0]
    ])
    codeflash_output = calculate_least_squares_polygon(contour, polygon); result = codeflash_output # 182μs -> 153μs (18.9% faster)

# ---------------- LARGE SCALE TEST CASES ----------------

def test_large_circle_contour_polygon():
    # Large circle contour, regular polygon approximation
    N = 1000
    theta = np.linspace(0, 2 * np.pi, N, endpoint=False)
    contour = np.stack([50 + 40 * np.cos(theta), 50 + 40 * np.sin(theta)], axis=1)
    # Regular octagon inscribed in circle
    M = 8
    theta_poly = np.linspace(0, 2 * np.pi, M, endpoint=False)
    polygon = np.stack([50 + 40 * np.cos(theta_poly), 50 + 40 * np.sin(theta_poly)], axis=1)
    codeflash_output = calculate_least_squares_polygon(contour, polygon); result = codeflash_output # 612μs -> 371μs (64.9% faster)
    # Should be close to the polygon vertices
    for pt in result:
        pass

def test_large_scale_midpoint_fraction():
    # Large contour, test midpoint_fraction
    N = 500
    contour = np.zeros((N, 2))
    contour[:, 0] = np.linspace(0, 100, N)
    contour[:, 1] = np.linspace(0, 100, N)
    polygon = np.array([
        [0, 0], [100, 0], [100, 100], [0, 100]
    ])
    # Use only central 50% of each segment
    codeflash_output = calculate_least_squares_polygon(contour, polygon, midpoint_fraction=0.5); result = codeflash_output # 290μs -> 215μs (34.8% faster)
    # Should be close to the corners
    for pt in result:
        pass

def test_large_polygon_large_contour():
    # Large polygon and contour
    N = 200
    contour = np.zeros((N, 2))
    contour[:, 0] = np.linspace(0, 100, N)
    contour[:, 1] = np.linspace(0, 100, N)
    # Polygon is a regular 20-gon
    M = 20
    theta_poly = np.linspace(0, 2 * np.pi, M, endpoint=False)
    polygon = np.stack([50 + 40 * np.cos(theta_poly), 50 + 40 * np.sin(theta_poly)], axis=1)
    codeflash_output = calculate_least_squares_polygon(contour, polygon); result = codeflash_output # 765μs -> 520μs (47.0% faster)

def test_large_scale_performance():
    # Performance test: contour and polygon with 1000 points each
    N = 1000
    contour = np.zeros((N, 2))
    contour[:, 0] = np.linspace(0, 100, N)
    contour[:, 1] = np.linspace(0, 100, N)
    polygon = np.zeros((N, 2))
    polygon[:, 0] = np.linspace(0, 100, N)
    polygon[:, 1] = np.linspace(0, 100, N)
    codeflash_output = calculate_least_squares_polygon(contour, polygon); result = codeflash_output # 60.7ms -> 31.1ms (95.5% faster)
# codeflash_output is used to check that the output of the original code is the same as that of the optimized code.

To edit these changes git checkout codeflash/optimize-calculate_least_squares_polygon-mhc04ip1 and push.

Codeflash

The optimized code achieves an **87% speedup** through three key improvements:

**1. Efficient Distance Calculations**
- Replaced `np.linalg.norm(contour - point, axis=1)` with `np.sqrt(np.einsum('ij,ij->i', contour - point, contour - point))` in `find_closest_index`
- Einstein summation (`einsum`) avoids intermediate array allocations and is more cache-friendly than broadcasting operations

**2. Precomputed Index Caching**
- Calculates closest contour indices for all polygon vertices upfront and stores them in `closest_indices` array
- Eliminates redundant `find_closest_index` calls during segment processing
- Provides better memory locality when accessing precomputed values

**3. Optimized Matrix Construction**
- In `least_squares_line`, replaces `np.vstack([x, np.ones_like(x)]).T` with direct array filling using `np.empty()` and column assignment
- Avoids creating intermediate arrays (`ones_like`, `vstack`) and transpose operations
- Reduces memory allocations and improves cache performance

**Test Case Performance**
The optimizations are particularly effective for:
- Large-scale scenarios (1000+ points): **95% speedup** 
- Circle/polygon approximations: **65-70% speedup**
- Medium complexity cases (100-500 points): **40-50% speedup**
- Small cases still benefit: **15-22% speedup**

The improvements scale well with input size since they reduce algorithmic complexity from O(n²) redundant distance calculations to O(n) precomputed lookups.
@codeflash-ai codeflash-ai bot requested a review from mashraf-222 October 29, 2025 12:58
@codeflash-ai codeflash-ai bot added ⚡️ codeflash Optimization PR opened by Codeflash AI 🎯 Quality: High Optimization Quality according to Codeflash labels Oct 29, 2025
Copy link

@misrasaurabh1 misrasaurabh1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can remove the inner function thats unsued, but the optimization seems sound

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

⚡️ codeflash Optimization PR opened by Codeflash AI 🎯 Quality: High Optimization Quality according to Codeflash

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants