⚡️ Speed up method `PathDeviationAnalyticsBlockV2._compute_distance` by 33% #635

codeflash-ai · 2025-10-29T12:05:04Z

📄 33% (0.33x) speedup for `PathDeviationAnalyticsBlockV2._compute_distance` in `inference/core/workflows/core_steps/analytics/path_deviation/v2.py`

⏱️ Runtime : 531 microseconds → 399 microseconds (best of 5 runs)

📝 Explanation and details

The optimization replaces the manual Euclidean distance calculation np.sqrt(np.sum((point1 - point2) ** 2)) with np.linalg.norm(point1 - point2), achieving a 32% speedup.

Key optimization:

NumPy's linalg.norm() is significantly faster than the manual sqrt/sum approach because it uses optimized BLAS routines internally and avoids intermediate array allocations that occur with (point1 - point2) ** 2 followed by np.sum().

Why this works:

The manual approach creates temporary arrays for the squared differences and then sums them, requiring multiple memory operations
np.linalg.norm() computes the L2 norm directly in optimized C code, eliminating these intermediate steps
For small vectors (typical 2D/3D points in path analysis), this optimization is particularly effective

Test case performance:

Shows consistent 25-50% improvements across all distance calculations
Particularly effective for the core use cases: 2D/3D point comparisons in path deviation analysis
Maintains identical numerical results and exception behavior
Benefits scale well with both single point comparisons and complex multi-point path calculations

The optimization preserves all functionality while leveraging NumPy's optimized linear algebra routines for better performance.

✅ Correctness verification report:

Test	Status
⚙️ Existing Unit Tests	🔘 None Found
🌀 Generated Regression Tests	✅ 59 Passed
⏪ Replay Tests	🔘 None Found
🔎 Concolic Coverage Tests	🔘 None Found
📊 Tests Coverage	100.0%

🌀 Generated Regression Tests and Runtime

import numpy as np
# imports
import pytest  # used for our unit tests
from inference.core.workflows.core_steps.analytics.path_deviation.v2 import \
    PathDeviationAnalyticsBlockV2

# unit tests

@pytest.fixture
def block():
    # Fixture to provide a fresh instance of the class for each test
    return PathDeviationAnalyticsBlockV2()

# --- Basic Test Cases ---

def test_identical_single_point_paths(block):
    # Both paths are identical, single point
    path1 = np.array([[0, 0]])
    path2 = np.array([[0, 0]])
    dist_matrix = np.full((1, 1), -1.0)
    # The distance should be zero
    codeflash_output = block._compute_distance(dist_matrix, 0, 0, path1, path2); result = codeflash_output # 22.4μs -> 14.8μs (51.1% faster)

def test_different_single_point_paths(block):
    # Both paths are single points, but different
    path1 = np.array([[0, 0]])
    path2 = np.array([[3, 4]])
    dist_matrix = np.full((1, 1), -1.0)
    # The distance should be the Euclidean distance (5.0)
    codeflash_output = block._compute_distance(dist_matrix, 0, 0, path1, path2); result = codeflash_output # 16.9μs -> 12.2μs (38.4% faster)

def test_two_point_paths_identical(block):
    # Both paths have two identical points
    path1 = np.array([[1, 2], [3, 4]])
    path2 = np.array([[1, 2], [3, 4]])
    dist_matrix = np.full((2, 2), -1.0)
    # The max deviation should be 0.0
    codeflash_output = block._compute_distance(dist_matrix, 1, 1, path1, path2); result = codeflash_output # 30.7μs -> 24.8μs (23.9% faster)

def test_two_point_paths_different(block):
    # Each path has two points, second point differs
    path1 = np.array([[0, 0], [1, 1]])
    path2 = np.array([[0, 0], [2, 2]])
    dist_matrix = np.full((2, 2), -1.0)
    # The max deviation should be sqrt(2^2 + 2^2) = sqrt(8) = 2.828...
    codeflash_output = block._compute_distance(dist_matrix, 1, 1, path1, path2); result = codeflash_output # 29.2μs -> 22.8μs (28.4% faster)

def test_three_point_paths(block):
    # Both paths have three points, with a deviation at the last point
    path1 = np.array([[0, 0], [1, 1], [2, 2]])
    path2 = np.array([[0, 0], [1, 1], [3, 2]])
    dist_matrix = np.full((3, 3), -1.0)
    # The max deviation should be 1.0 at last point
    codeflash_output = block._compute_distance(dist_matrix, 2, 2, path1, path2); result = codeflash_output # 51.0μs -> 36.3μs (40.7% faster)

# --- Edge Test Cases ---

def test_empty_paths(block):
    # Edge case: empty paths
    path1 = np.empty((0, 2))
    path2 = np.empty((0, 2))
    dist_matrix = np.full((0, 0), -1.0)
    # Should raise IndexError due to empty access
    with pytest.raises(IndexError):
        block._compute_distance(dist_matrix, 0, 0, path1, path2) # 2.90μs -> 2.96μs (1.76% slower)

def test_one_empty_one_nonempty_path(block):
    # One path is empty, one is not
    path1 = np.empty((0, 2))
    path2 = np.array([[1, 1]])
    dist_matrix = np.full((0, 1), -1.0)
    with pytest.raises(IndexError):
        block._compute_distance(dist_matrix, 0, 0, path1, path2) # 2.26μs -> 2.51μs (10.1% slower)

def test_negative_indices(block):
    # Negative indices should result in inf
    path1 = np.array([[0, 0]])
    path2 = np.array([[0, 0]])
    dist_matrix = np.full((1, 1), -1.0)
    codeflash_output = block._compute_distance(dist_matrix, -1, -1, path1, path2); result = codeflash_output # 3.54μs -> 3.35μs (5.66% faster)

def test_non_2d_points(block):
    # Points with more than 2 dimensions
    path1 = np.array([[1, 2, 3], [4, 5, 6]])
    path2 = np.array([[1, 2, 3], [7, 8, 9]])
    dist_matrix = np.full((2, 2), -1.0)
    # Last point deviation sqrt((4-7)^2 + (5-8)^2 + (6-9)^2) = sqrt(27) = ~5.196
    codeflash_output = block._compute_distance(dist_matrix, 1, 1, path1, path2); result = codeflash_output # 36.4μs -> 29.0μs (25.5% faster)

def test_non_square_dist_matrix(block):
    # Non-square dist_matrix, path lengths differ
    path1 = np.array([[0, 0], [1, 1], [2, 2]])
    path2 = np.array([[0, 0], [1, 1]])
    dist_matrix = np.full((3, 2), -1.0)
    codeflash_output = block._compute_distance(dist_matrix, 2, 1, path1, path2); result = codeflash_output # 37.4μs -> 27.8μs (34.4% faster)
    # Should be max deviation between path1[2] and path2[1] and previous steps
    expected = max(
        min(
            block._euclidean_distance(path1[1], path2[1]),
            block._euclidean_distance(path1[1], path2[0]),
            block._euclidean_distance(path1[2], path2[0]),
        ),
        block._euclidean_distance(path1[2], path2[1])
    )

def test_large_deviation_at_start(block):
    # Large deviation at start, small at end
    path1 = np.array([[100, 100], [1, 1]])
    path2 = np.array([[0, 0], [1, 1]])
    dist_matrix = np.full((2, 2), -1.0)
    codeflash_output = block._compute_distance(dist_matrix, 1, 1, path1, path2); result = codeflash_output # 29.6μs -> 22.3μs (32.7% faster)

# --- Large Scale Test Cases ---






#------------------------------------------------
import numpy as np
# imports
import pytest  # used for our unit tests
from inference.core.workflows.core_steps.analytics.path_deviation.v2 import \
    PathDeviationAnalyticsBlockV2


# unit tests
class TestComputeDistance:
    # Helper to create a fresh dist_matrix for given sizes
    def make_dist_matrix(self, m, n):
        # All entries initialized to -1 (uncomputed)
        return np.full((m, n), -1.0)
    
    # Helper to create paths from list of tuples
    def make_path(self, points):
        return np.array(points, dtype=float)
    
    # BASIC TEST CASES

    def test_identical_single_point_paths(self):
        # Both paths are the same single point
        block = PathDeviationAnalyticsBlockV2()
        path1 = self.make_path([(0, 0)])
        path2 = self.make_path([(0, 0)])
        dist_matrix = self.make_dist_matrix(1, 1)
        codeflash_output = block._compute_distance(dist_matrix, 0, 0, path1, path2); result = codeflash_output # 24.6μs -> 17.9μs (37.8% faster)

    def test_different_single_point_paths(self):
        # Both paths are single points, but different
        block = PathDeviationAnalyticsBlockV2()
        path1 = self.make_path([(0, 0)])
        path2 = self.make_path([(3, 4)])
        dist_matrix = self.make_dist_matrix(1, 1)
        codeflash_output = block._compute_distance(dist_matrix, 0, 0, path1, path2); result = codeflash_output # 16.0μs -> 11.3μs (41.6% faster)

    def test_two_point_paths_identical(self):
        # Both paths have two points, identical
        block = PathDeviationAnalyticsBlockV2()
        path1 = self.make_path([(1, 2), (3, 4)])
        path2 = self.make_path([(1, 2), (3, 4)])
        dist_matrix = self.make_dist_matrix(2, 2)
        codeflash_output = block._compute_distance(dist_matrix, 1, 1, path1, path2); result = codeflash_output # 30.1μs -> 21.4μs (40.8% faster)

    def test_two_point_paths_different(self):
        # Both paths have two points, but different
        block = PathDeviationAnalyticsBlockV2()
        path1 = self.make_path([(0, 0), (2, 0)])
        path2 = self.make_path([(0, 1), (2, 1)])
        dist_matrix = self.make_dist_matrix(2, 2)
        codeflash_output = block._compute_distance(dist_matrix, 1, 1, path1, path2); result = codeflash_output # 26.2μs -> 20.3μs (28.9% faster)

    def test_three_point_paths_partial_overlap(self):
        # Paths overlap at start, diverge at end
        block = PathDeviationAnalyticsBlockV2()
        path1 = self.make_path([(0, 0), (1, 1), (2, 2)])
        path2 = self.make_path([(0, 0), (1, 2), (2, 4)])
        dist_matrix = self.make_dist_matrix(3, 3)
        codeflash_output = block._compute_distance(dist_matrix, 2, 2, path1, path2); result = codeflash_output # 45.4μs -> 33.9μs (33.8% faster)

    # EDGE TEST CASES

    def test_empty_paths(self):
        # Both paths are empty: should return inf for any i, j
        block = PathDeviationAnalyticsBlockV2()
        path1 = self.make_path([])
        path2 = self.make_path([])
        dist_matrix = self.make_dist_matrix(0, 0)
        # There's no valid i, j, but let's check that requesting (0,0) raises
        with pytest.raises(IndexError):
            block._compute_distance(dist_matrix, 0, 0, path1, path2) # 2.92μs -> 2.93μs (0.205% slower)

    def test_path1_empty(self):
        # path1 is empty, path2 has points
        block = PathDeviationAnalyticsBlockV2()
        path1 = self.make_path([])
        path2 = self.make_path([(1, 1)])
        dist_matrix = self.make_dist_matrix(0, 1)
        with pytest.raises(IndexError):
            block._compute_distance(dist_matrix, 0, 0, path1, path2) # 2.37μs -> 2.10μs (12.8% faster)

    def test_path2_empty(self):
        # path2 is empty, path1 has points
        block = PathDeviationAnalyticsBlockV2()
        path1 = self.make_path([(1, 1)])
        path2 = self.make_path([])
        dist_matrix = self.make_dist_matrix(1, 0)
        with pytest.raises(IndexError):
            block._compute_distance(dist_matrix, 0, 0, path1, path2) # 2.29μs -> 2.24μs (2.05% faster)

    def test_negative_indices(self):
        # Negative indices: should return inf according to function
        block = PathDeviationAnalyticsBlockV2()
        path1 = self.make_path([(0, 0)])
        path2 = self.make_path([(0, 0)])
        dist_matrix = self.make_dist_matrix(1, 1)
        codeflash_output = block._compute_distance(dist_matrix, -1, -1, path1, path2); result = codeflash_output # 3.55μs -> 3.21μs (10.7% faster)

    def test_non_integer_indices(self):
        # Non-integer indices: should raise index error
        block = PathDeviationAnalyticsBlockV2()
        path1 = self.make_path([(0, 0)])
        path2 = self.make_path([(0, 0)])
        dist_matrix = self.make_dist_matrix(1, 1)
        with pytest.raises(TypeError):
            block._compute_distance(dist_matrix, 0.5, 0, path1, path2)

    def test_high_dimensional_points(self):
        # Points in higher dimensions (e.g., 3D)
        block = PathDeviationAnalyticsBlockV2()
        path1 = self.make_path([(1, 2, 3), (4, 5, 6)])
        path2 = self.make_path([(1, 2, 3), (7, 8, 9)])
        dist_matrix = self.make_dist_matrix(2, 2)
        codeflash_output = block._compute_distance(dist_matrix, 1, 1, path1, path2); result = codeflash_output # 38.5μs -> 28.5μs (34.9% faster)

    def test_non_square_matrix(self):
        # dist_matrix is not square; paths of different lengths
        block = PathDeviationAnalyticsBlockV2()
        path1 = self.make_path([(0, 0), (1, 1), (2, 2)])
        path2 = self.make_path([(0, 0), (1, 2)])
        dist_matrix = self.make_dist_matrix(3, 2)
        codeflash_output = block._compute_distance(dist_matrix, 2, 1, path1, path2); result = codeflash_output # 34.7μs -> 26.3μs (32.0% faster)

    # LARGE SCALE TEST CASES

    def test_large_paths(self):
        # Large paths with 1000 points each, all identical
        block = PathDeviationAnalyticsBlockV2()
        points = [(i, i) for i in range(1000)]
        path1 = self.make_path(points)
        path2 = self.make_path(points)
        dist_matrix = self.make_dist_matrix(1000, 1000)
        codeflash_output = block._compute_distance(dist_matrix, 999, 999, path1, path2); result = codeflash_output

    def test_large_paths_max_deviation(self):
        # Large paths, but path2 is offset by 10 units in y
        block = PathDeviationAnalyticsBlockV2()
        points1 = [(i, i) for i in range(1000)]
        points2 = [(i, i + 10) for i in range(1000)]
        path1 = self.make_path(points1)
        path2 = self.make_path(points2)
        dist_matrix = self.make_dist_matrix(1000, 1000)
        codeflash_output = block._compute_distance(dist_matrix, 999, 999, path1, path2); result = codeflash_output

    def test_large_paths_partial_overlap(self):
        # Large paths, path2 is reversed
        block = PathDeviationAnalyticsBlockV2()
        points1 = [(i, i) for i in range(1000)]
        points2 = [(999 - i, 999 - i) for i in range(1000)]
        path1 = self.make_path(points1)
        path2 = self.make_path(points2)
        dist_matrix = self.make_dist_matrix(1000, 1000)
        codeflash_output = block._compute_distance(dist_matrix, 999, 999, path1, path2); result = codeflash_output
        # Largest deviation is at the endpoints: (999,999) vs (0,0): sqrt(999^2 + 999^2)
        expected = np.sqrt(999**2 + 999**2)

    def test_large_paths_non_matching_lengths(self):
        # Large paths, but different lengths
        block = PathDeviationAnalyticsBlockV2()
        path1 = self.make_path([(i, 0) for i in range(1000)])
        path2 = self.make_path([(i, 0) for i in range(500)])
        dist_matrix = self.make_dist_matrix(1000, 500)
        codeflash_output = block._compute_distance(dist_matrix, 999, 499, path1, path2); result = codeflash_output

    def test_large_paths_high_dimensional(self):
        # Large paths in 5D space
        block = PathDeviationAnalyticsBlockV2()
        path1 = self.make_path([(i, i+1, i+2, i+3, i+4) for i in range(1000)])
        path2 = self.make_path([(i, i+1, i+2, i+3, i+4) for i in range(1000)])
        dist_matrix = self.make_dist_matrix(1000, 1000)
        codeflash_output = block._compute_distance(dist_matrix, 999, 999, path1, path2); result = codeflash_output

    # Additional edge: test for caching in dist_matrix
    def test_caching_in_dist_matrix(self):
        # Ensure that repeated calls do not recompute
        block = PathDeviationAnalyticsBlockV2()
        path1 = self.make_path([(0, 0), (1, 1)])
        path2 = self.make_path([(0, 0), (2, 2)])
        dist_matrix = self.make_dist_matrix(2, 2)
        # First call computes and stores
        codeflash_output = block._compute_distance(dist_matrix, 1, 1, path1, path2); result1 = codeflash_output # 41.3μs -> 29.7μs (38.9% faster)
        # Manually set the matrix to a different value
        dist_matrix[1, 1] = 42.0
        # Second call should return cached value, not recompute
        codeflash_output = block._compute_distance(dist_matrix, 1, 1, path1, path2); result2 = codeflash_output # 499ns -> 495ns (0.808% faster)
# codeflash_output is used to check that the output of the original code is the same as that of the optimized code.

To edit these changes git checkout codeflash/optimize-PathDeviationAnalyticsBlockV2._compute_distance-mhby7lcc and push.

The optimization replaces the manual Euclidean distance calculation `np.sqrt(np.sum((point1 - point2) ** 2))` with `np.linalg.norm(point1 - point2)`, achieving a **32% speedup**. **Key optimization:** - **NumPy's `linalg.norm()` is significantly faster** than the manual sqrt/sum approach because it uses optimized BLAS routines internally and avoids intermediate array allocations that occur with `(point1 - point2) ** 2` followed by `np.sum()`. **Why this works:** - The manual approach creates temporary arrays for the squared differences and then sums them, requiring multiple memory operations - `np.linalg.norm()` computes the L2 norm directly in optimized C code, eliminating these intermediate steps - For small vectors (typical 2D/3D points in path analysis), this optimization is particularly effective **Test case performance:** - Shows consistent 25-50% improvements across all distance calculations - Particularly effective for the core use cases: 2D/3D point comparisons in path deviation analysis - Maintains identical numerical results and exception behavior - Benefits scale well with both single point comparisons and complex multi-point path calculations The optimization preserves all functionality while leveraging NumPy's optimized linear algebra routines for better performance.

codeflash-ai bot requested a review from mashraf-222 October 29, 2025 12:05

codeflash-ai bot added ⚡️ codeflash Optimization PR opened by Codeflash AI 🎯 Quality: High Optimization Quality according to Codeflash labels Oct 29, 2025

misrasaurabh1 approved these changes Oct 29, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

⚡️ Speed up method `PathDeviationAnalyticsBlockV2._compute_distance` by 33% #635

⚡️ Speed up method `PathDeviationAnalyticsBlockV2._compute_distance` by 33% #635

Uh oh!

codeflash-ai bot commented Oct 29, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

⚡️ Speed up method PathDeviationAnalyticsBlockV2._compute_distance by 33% #635

Are you sure you want to change the base?

⚡️ Speed up method PathDeviationAnalyticsBlockV2._compute_distance by 33% #635

Uh oh!

Conversation

codeflash-ai bot commented Oct 29, 2025

📄 33% (0.33x) speedup for PathDeviationAnalyticsBlockV2._compute_distance in inference/core/workflows/core_steps/analytics/path_deviation/v2.py

📝 Explanation and details

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

⚡️ Speed up method `PathDeviationAnalyticsBlockV2._compute_distance` by 33% #635

⚡️ Speed up method `PathDeviationAnalyticsBlockV2._compute_distance` by 33% #635

📄 33% (0.33x) speedup for `PathDeviationAnalyticsBlockV2._compute_distance` in `inference/core/workflows/core_steps/analytics/path_deviation/v2.py`