Skip to content

Conversation

@codeflash-ai
Copy link

@codeflash-ai codeflash-ai bot commented Oct 29, 2025

📄 33% (0.33x) speedup for PathDeviationAnalyticsBlockV2._compute_distance in inference/core/workflows/core_steps/analytics/path_deviation/v2.py

⏱️ Runtime : 531 microseconds 399 microseconds (best of 5 runs)

📝 Explanation and details

The optimization replaces the manual Euclidean distance calculation np.sqrt(np.sum((point1 - point2) ** 2)) with np.linalg.norm(point1 - point2), achieving a 32% speedup.

Key optimization:

  • NumPy's linalg.norm() is significantly faster than the manual sqrt/sum approach because it uses optimized BLAS routines internally and avoids intermediate array allocations that occur with (point1 - point2) ** 2 followed by np.sum().

Why this works:

  • The manual approach creates temporary arrays for the squared differences and then sums them, requiring multiple memory operations
  • np.linalg.norm() computes the L2 norm directly in optimized C code, eliminating these intermediate steps
  • For small vectors (typical 2D/3D points in path analysis), this optimization is particularly effective

Test case performance:

  • Shows consistent 25-50% improvements across all distance calculations
  • Particularly effective for the core use cases: 2D/3D point comparisons in path deviation analysis
  • Maintains identical numerical results and exception behavior
  • Benefits scale well with both single point comparisons and complex multi-point path calculations

The optimization preserves all functionality while leveraging NumPy's optimized linear algebra routines for better performance.

Correctness verification report:

Test Status
⚙️ Existing Unit Tests 🔘 None Found
🌀 Generated Regression Tests 59 Passed
⏪ Replay Tests 🔘 None Found
🔎 Concolic Coverage Tests 🔘 None Found
📊 Tests Coverage 100.0%
🌀 Generated Regression Tests and Runtime
import numpy as np
# imports
import pytest  # used for our unit tests
from inference.core.workflows.core_steps.analytics.path_deviation.v2 import \
    PathDeviationAnalyticsBlockV2

# unit tests

@pytest.fixture
def block():
    # Fixture to provide a fresh instance of the class for each test
    return PathDeviationAnalyticsBlockV2()

# --- Basic Test Cases ---

def test_identical_single_point_paths(block):
    # Both paths are identical, single point
    path1 = np.array([[0, 0]])
    path2 = np.array([[0, 0]])
    dist_matrix = np.full((1, 1), -1.0)
    # The distance should be zero
    codeflash_output = block._compute_distance(dist_matrix, 0, 0, path1, path2); result = codeflash_output # 22.4μs -> 14.8μs (51.1% faster)

def test_different_single_point_paths(block):
    # Both paths are single points, but different
    path1 = np.array([[0, 0]])
    path2 = np.array([[3, 4]])
    dist_matrix = np.full((1, 1), -1.0)
    # The distance should be the Euclidean distance (5.0)
    codeflash_output = block._compute_distance(dist_matrix, 0, 0, path1, path2); result = codeflash_output # 16.9μs -> 12.2μs (38.4% faster)

def test_two_point_paths_identical(block):
    # Both paths have two identical points
    path1 = np.array([[1, 2], [3, 4]])
    path2 = np.array([[1, 2], [3, 4]])
    dist_matrix = np.full((2, 2), -1.0)
    # The max deviation should be 0.0
    codeflash_output = block._compute_distance(dist_matrix, 1, 1, path1, path2); result = codeflash_output # 30.7μs -> 24.8μs (23.9% faster)

def test_two_point_paths_different(block):
    # Each path has two points, second point differs
    path1 = np.array([[0, 0], [1, 1]])
    path2 = np.array([[0, 0], [2, 2]])
    dist_matrix = np.full((2, 2), -1.0)
    # The max deviation should be sqrt(2^2 + 2^2) = sqrt(8) = 2.828...
    codeflash_output = block._compute_distance(dist_matrix, 1, 1, path1, path2); result = codeflash_output # 29.2μs -> 22.8μs (28.4% faster)

def test_three_point_paths(block):
    # Both paths have three points, with a deviation at the last point
    path1 = np.array([[0, 0], [1, 1], [2, 2]])
    path2 = np.array([[0, 0], [1, 1], [3, 2]])
    dist_matrix = np.full((3, 3), -1.0)
    # The max deviation should be 1.0 at last point
    codeflash_output = block._compute_distance(dist_matrix, 2, 2, path1, path2); result = codeflash_output # 51.0μs -> 36.3μs (40.7% faster)

# --- Edge Test Cases ---

def test_empty_paths(block):
    # Edge case: empty paths
    path1 = np.empty((0, 2))
    path2 = np.empty((0, 2))
    dist_matrix = np.full((0, 0), -1.0)
    # Should raise IndexError due to empty access
    with pytest.raises(IndexError):
        block._compute_distance(dist_matrix, 0, 0, path1, path2) # 2.90μs -> 2.96μs (1.76% slower)

def test_one_empty_one_nonempty_path(block):
    # One path is empty, one is not
    path1 = np.empty((0, 2))
    path2 = np.array([[1, 1]])
    dist_matrix = np.full((0, 1), -1.0)
    with pytest.raises(IndexError):
        block._compute_distance(dist_matrix, 0, 0, path1, path2) # 2.26μs -> 2.51μs (10.1% slower)

def test_negative_indices(block):
    # Negative indices should result in inf
    path1 = np.array([[0, 0]])
    path2 = np.array([[0, 0]])
    dist_matrix = np.full((1, 1), -1.0)
    codeflash_output = block._compute_distance(dist_matrix, -1, -1, path1, path2); result = codeflash_output # 3.54μs -> 3.35μs (5.66% faster)

def test_non_2d_points(block):
    # Points with more than 2 dimensions
    path1 = np.array([[1, 2, 3], [4, 5, 6]])
    path2 = np.array([[1, 2, 3], [7, 8, 9]])
    dist_matrix = np.full((2, 2), -1.0)
    # Last point deviation sqrt((4-7)^2 + (5-8)^2 + (6-9)^2) = sqrt(27) = ~5.196
    codeflash_output = block._compute_distance(dist_matrix, 1, 1, path1, path2); result = codeflash_output # 36.4μs -> 29.0μs (25.5% faster)

def test_non_square_dist_matrix(block):
    # Non-square dist_matrix, path lengths differ
    path1 = np.array([[0, 0], [1, 1], [2, 2]])
    path2 = np.array([[0, 0], [1, 1]])
    dist_matrix = np.full((3, 2), -1.0)
    codeflash_output = block._compute_distance(dist_matrix, 2, 1, path1, path2); result = codeflash_output # 37.4μs -> 27.8μs (34.4% faster)
    # Should be max deviation between path1[2] and path2[1] and previous steps
    expected = max(
        min(
            block._euclidean_distance(path1[1], path2[1]),
            block._euclidean_distance(path1[1], path2[0]),
            block._euclidean_distance(path1[2], path2[0]),
        ),
        block._euclidean_distance(path1[2], path2[1])
    )

def test_large_deviation_at_start(block):
    # Large deviation at start, small at end
    path1 = np.array([[100, 100], [1, 1]])
    path2 = np.array([[0, 0], [1, 1]])
    dist_matrix = np.full((2, 2), -1.0)
    codeflash_output = block._compute_distance(dist_matrix, 1, 1, path1, path2); result = codeflash_output # 29.6μs -> 22.3μs (32.7% faster)

# --- Large Scale Test Cases ---






#------------------------------------------------
import numpy as np
# imports
import pytest  # used for our unit tests
from inference.core.workflows.core_steps.analytics.path_deviation.v2 import \
    PathDeviationAnalyticsBlockV2


# unit tests
class TestComputeDistance:
    # Helper to create a fresh dist_matrix for given sizes
    def make_dist_matrix(self, m, n):
        # All entries initialized to -1 (uncomputed)
        return np.full((m, n), -1.0)
    
    # Helper to create paths from list of tuples
    def make_path(self, points):
        return np.array(points, dtype=float)
    
    # BASIC TEST CASES

    def test_identical_single_point_paths(self):
        # Both paths are the same single point
        block = PathDeviationAnalyticsBlockV2()
        path1 = self.make_path([(0, 0)])
        path2 = self.make_path([(0, 0)])
        dist_matrix = self.make_dist_matrix(1, 1)
        codeflash_output = block._compute_distance(dist_matrix, 0, 0, path1, path2); result = codeflash_output # 24.6μs -> 17.9μs (37.8% faster)

    def test_different_single_point_paths(self):
        # Both paths are single points, but different
        block = PathDeviationAnalyticsBlockV2()
        path1 = self.make_path([(0, 0)])
        path2 = self.make_path([(3, 4)])
        dist_matrix = self.make_dist_matrix(1, 1)
        codeflash_output = block._compute_distance(dist_matrix, 0, 0, path1, path2); result = codeflash_output # 16.0μs -> 11.3μs (41.6% faster)

    def test_two_point_paths_identical(self):
        # Both paths have two points, identical
        block = PathDeviationAnalyticsBlockV2()
        path1 = self.make_path([(1, 2), (3, 4)])
        path2 = self.make_path([(1, 2), (3, 4)])
        dist_matrix = self.make_dist_matrix(2, 2)
        codeflash_output = block._compute_distance(dist_matrix, 1, 1, path1, path2); result = codeflash_output # 30.1μs -> 21.4μs (40.8% faster)

    def test_two_point_paths_different(self):
        # Both paths have two points, but different
        block = PathDeviationAnalyticsBlockV2()
        path1 = self.make_path([(0, 0), (2, 0)])
        path2 = self.make_path([(0, 1), (2, 1)])
        dist_matrix = self.make_dist_matrix(2, 2)
        codeflash_output = block._compute_distance(dist_matrix, 1, 1, path1, path2); result = codeflash_output # 26.2μs -> 20.3μs (28.9% faster)

    def test_three_point_paths_partial_overlap(self):
        # Paths overlap at start, diverge at end
        block = PathDeviationAnalyticsBlockV2()
        path1 = self.make_path([(0, 0), (1, 1), (2, 2)])
        path2 = self.make_path([(0, 0), (1, 2), (2, 4)])
        dist_matrix = self.make_dist_matrix(3, 3)
        codeflash_output = block._compute_distance(dist_matrix, 2, 2, path1, path2); result = codeflash_output # 45.4μs -> 33.9μs (33.8% faster)

    # EDGE TEST CASES

    def test_empty_paths(self):
        # Both paths are empty: should return inf for any i, j
        block = PathDeviationAnalyticsBlockV2()
        path1 = self.make_path([])
        path2 = self.make_path([])
        dist_matrix = self.make_dist_matrix(0, 0)
        # There's no valid i, j, but let's check that requesting (0,0) raises
        with pytest.raises(IndexError):
            block._compute_distance(dist_matrix, 0, 0, path1, path2) # 2.92μs -> 2.93μs (0.205% slower)

    def test_path1_empty(self):
        # path1 is empty, path2 has points
        block = PathDeviationAnalyticsBlockV2()
        path1 = self.make_path([])
        path2 = self.make_path([(1, 1)])
        dist_matrix = self.make_dist_matrix(0, 1)
        with pytest.raises(IndexError):
            block._compute_distance(dist_matrix, 0, 0, path1, path2) # 2.37μs -> 2.10μs (12.8% faster)

    def test_path2_empty(self):
        # path2 is empty, path1 has points
        block = PathDeviationAnalyticsBlockV2()
        path1 = self.make_path([(1, 1)])
        path2 = self.make_path([])
        dist_matrix = self.make_dist_matrix(1, 0)
        with pytest.raises(IndexError):
            block._compute_distance(dist_matrix, 0, 0, path1, path2) # 2.29μs -> 2.24μs (2.05% faster)

    def test_negative_indices(self):
        # Negative indices: should return inf according to function
        block = PathDeviationAnalyticsBlockV2()
        path1 = self.make_path([(0, 0)])
        path2 = self.make_path([(0, 0)])
        dist_matrix = self.make_dist_matrix(1, 1)
        codeflash_output = block._compute_distance(dist_matrix, -1, -1, path1, path2); result = codeflash_output # 3.55μs -> 3.21μs (10.7% faster)

    def test_non_integer_indices(self):
        # Non-integer indices: should raise index error
        block = PathDeviationAnalyticsBlockV2()
        path1 = self.make_path([(0, 0)])
        path2 = self.make_path([(0, 0)])
        dist_matrix = self.make_dist_matrix(1, 1)
        with pytest.raises(TypeError):
            block._compute_distance(dist_matrix, 0.5, 0, path1, path2)

    def test_high_dimensional_points(self):
        # Points in higher dimensions (e.g., 3D)
        block = PathDeviationAnalyticsBlockV2()
        path1 = self.make_path([(1, 2, 3), (4, 5, 6)])
        path2 = self.make_path([(1, 2, 3), (7, 8, 9)])
        dist_matrix = self.make_dist_matrix(2, 2)
        codeflash_output = block._compute_distance(dist_matrix, 1, 1, path1, path2); result = codeflash_output # 38.5μs -> 28.5μs (34.9% faster)

    def test_non_square_matrix(self):
        # dist_matrix is not square; paths of different lengths
        block = PathDeviationAnalyticsBlockV2()
        path1 = self.make_path([(0, 0), (1, 1), (2, 2)])
        path2 = self.make_path([(0, 0), (1, 2)])
        dist_matrix = self.make_dist_matrix(3, 2)
        codeflash_output = block._compute_distance(dist_matrix, 2, 1, path1, path2); result = codeflash_output # 34.7μs -> 26.3μs (32.0% faster)

    # LARGE SCALE TEST CASES

    def test_large_paths(self):
        # Large paths with 1000 points each, all identical
        block = PathDeviationAnalyticsBlockV2()
        points = [(i, i) for i in range(1000)]
        path1 = self.make_path(points)
        path2 = self.make_path(points)
        dist_matrix = self.make_dist_matrix(1000, 1000)
        codeflash_output = block._compute_distance(dist_matrix, 999, 999, path1, path2); result = codeflash_output

    def test_large_paths_max_deviation(self):
        # Large paths, but path2 is offset by 10 units in y
        block = PathDeviationAnalyticsBlockV2()
        points1 = [(i, i) for i in range(1000)]
        points2 = [(i, i + 10) for i in range(1000)]
        path1 = self.make_path(points1)
        path2 = self.make_path(points2)
        dist_matrix = self.make_dist_matrix(1000, 1000)
        codeflash_output = block._compute_distance(dist_matrix, 999, 999, path1, path2); result = codeflash_output

    def test_large_paths_partial_overlap(self):
        # Large paths, path2 is reversed
        block = PathDeviationAnalyticsBlockV2()
        points1 = [(i, i) for i in range(1000)]
        points2 = [(999 - i, 999 - i) for i in range(1000)]
        path1 = self.make_path(points1)
        path2 = self.make_path(points2)
        dist_matrix = self.make_dist_matrix(1000, 1000)
        codeflash_output = block._compute_distance(dist_matrix, 999, 999, path1, path2); result = codeflash_output
        # Largest deviation is at the endpoints: (999,999) vs (0,0): sqrt(999^2 + 999^2)
        expected = np.sqrt(999**2 + 999**2)

    def test_large_paths_non_matching_lengths(self):
        # Large paths, but different lengths
        block = PathDeviationAnalyticsBlockV2()
        path1 = self.make_path([(i, 0) for i in range(1000)])
        path2 = self.make_path([(i, 0) for i in range(500)])
        dist_matrix = self.make_dist_matrix(1000, 500)
        codeflash_output = block._compute_distance(dist_matrix, 999, 499, path1, path2); result = codeflash_output

    def test_large_paths_high_dimensional(self):
        # Large paths in 5D space
        block = PathDeviationAnalyticsBlockV2()
        path1 = self.make_path([(i, i+1, i+2, i+3, i+4) for i in range(1000)])
        path2 = self.make_path([(i, i+1, i+2, i+3, i+4) for i in range(1000)])
        dist_matrix = self.make_dist_matrix(1000, 1000)
        codeflash_output = block._compute_distance(dist_matrix, 999, 999, path1, path2); result = codeflash_output

    # Additional edge: test for caching in dist_matrix
    def test_caching_in_dist_matrix(self):
        # Ensure that repeated calls do not recompute
        block = PathDeviationAnalyticsBlockV2()
        path1 = self.make_path([(0, 0), (1, 1)])
        path2 = self.make_path([(0, 0), (2, 2)])
        dist_matrix = self.make_dist_matrix(2, 2)
        # First call computes and stores
        codeflash_output = block._compute_distance(dist_matrix, 1, 1, path1, path2); result1 = codeflash_output # 41.3μs -> 29.7μs (38.9% faster)
        # Manually set the matrix to a different value
        dist_matrix[1, 1] = 42.0
        # Second call should return cached value, not recompute
        codeflash_output = block._compute_distance(dist_matrix, 1, 1, path1, path2); result2 = codeflash_output # 499ns -> 495ns (0.808% faster)
# codeflash_output is used to check that the output of the original code is the same as that of the optimized code.

To edit these changes git checkout codeflash/optimize-PathDeviationAnalyticsBlockV2._compute_distance-mhby7lcc and push.

Codeflash

The optimization replaces the manual Euclidean distance calculation `np.sqrt(np.sum((point1 - point2) ** 2))` with `np.linalg.norm(point1 - point2)`, achieving a **32% speedup**.

**Key optimization:**
- **NumPy's `linalg.norm()` is significantly faster** than the manual sqrt/sum approach because it uses optimized BLAS routines internally and avoids intermediate array allocations that occur with `(point1 - point2) ** 2` followed by `np.sum()`.

**Why this works:**
- The manual approach creates temporary arrays for the squared differences and then sums them, requiring multiple memory operations
- `np.linalg.norm()` computes the L2 norm directly in optimized C code, eliminating these intermediate steps
- For small vectors (typical 2D/3D points in path analysis), this optimization is particularly effective

**Test case performance:**
- Shows consistent 25-50% improvements across all distance calculations
- Particularly effective for the core use cases: 2D/3D point comparisons in path deviation analysis
- Maintains identical numerical results and exception behavior
- Benefits scale well with both single point comparisons and complex multi-point path calculations

The optimization preserves all functionality while leveraging NumPy's optimized linear algebra routines for better performance.
@codeflash-ai codeflash-ai bot requested a review from mashraf-222 October 29, 2025 12:05
@codeflash-ai codeflash-ai bot added ⚡️ codeflash Optimization PR opened by Codeflash AI 🎯 Quality: High Optimization Quality according to Codeflash labels Oct 29, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

⚡️ codeflash Optimization PR opened by Codeflash AI 🎯 Quality: High Optimization Quality according to Codeflash

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants