Skip to content

Conversation

@codeflash-ai
Copy link

@codeflash-ai codeflash-ai bot commented Oct 29, 2025

📄 8% (0.08x) speedup for RelativeStaticCropBlockV1.run in inference/core/workflows/core_steps/transformations/relative_static_crop/v1.py

⏱️ Runtime : 1.38 milliseconds 1.28 milliseconds (best of 278 runs)

📝 Explanation and details

The optimized code achieves a 7% speedup through several micro-optimizations in the take_static_crop function:

Key optimizations:

  1. Reduced attribute access: Caches image.numpy_image.shape in a local variable, avoiding repeated property lookups (from 3 accesses to 1).

  2. Integer division optimization: Replaces floating-point division (width / 2) with integer division (crop_width // 2) for half-width/half-height calculations, which is faster for integral results.

  3. Deferred UUID generation: Moves the expensive uuid4() call after the empty crop check, avoiding UUID generation for invalid crops that return None.

  4. Better variable naming: Uses more descriptive names like crop_x_center and half_width that make the computation clearer.

Performance characteristics from tests:

  • Best gains on invalid crops: 13-35% faster when crops are empty or out-of-bounds due to avoiding UUID generation
  • Solid gains on large batches: 10% faster on 100-image batches, 5% on mixed valid/invalid batches
  • Modest gains on regular crops: 1-3% faster on typical center/corner crops
  • Large images: 2-3% improvement on high-resolution images

The optimizations are most effective when processing batches with many invalid crops or when UUID generation overhead becomes significant relative to the cropping computation.

Correctness verification report:

Test Status
⚙️ Existing Unit Tests 🔘 None Found
🌀 Generated Regression Tests 70 Passed
⏪ Replay Tests 🔘 None Found
🔎 Concolic Coverage Tests 🔘 None Found
📊 Tests Coverage 100.0%
🌀 Generated Regression Tests and Runtime
from copy import copy
from dataclasses import dataclass, replace
from typing import Any, List, Optional
from uuid import uuid4

import numpy as np
# imports
import pytest
from inference.core.workflows.core_steps.transformations.relative_static_crop.v1 import \
    RelativeStaticCropBlockV1

# --- Minimal stubs for dependencies to make 'run' testable ---

@dataclass
class OriginCoordinatesSystem:
    left_top_x: int
    left_top_y: int
    origin_width: int
    origin_height: int

@dataclass
class ImageParentMetadata:
    parent_id: str
    origin_coordinates: OriginCoordinatesSystem

@dataclass
class VideoMetadata:
    video_identifier: str

class WorkflowImageData:
    def __init__(
        self,
        parent_metadata: ImageParentMetadata,
        workflow_root_ancestor_metadata: Optional[ImageParentMetadata] = None,
        image_reference: Optional[str] = None,
        base64_image: Optional[str] = None,
        numpy_image: Optional[np.ndarray] = None,
        video_metadata: Optional[VideoMetadata] = None,
    ):
        if not base64_image and numpy_image is None and not image_reference:
            raise ValueError("Could not initialise empty `WorkflowImageData`.")
        self.parent_metadata = parent_metadata
        self.workflow_root_ancestor_metadata = (
            workflow_root_ancestor_metadata
            if workflow_root_ancestor_metadata
            else self.parent_metadata
        )
        self.image_reference = image_reference
        self.base64_image = base64_image
        self.numpy_image = numpy_image
        self._video_metadata = video_metadata

    @classmethod
    def create_crop(
        cls,
        origin_image_data: "WorkflowImageData",
        crop_identifier: str,
        cropped_image: np.ndarray,
        offset_x: int,
        offset_y: int,
        preserve_video_metadata: bool = False,
    ) -> "WorkflowImageData":
        parent_metadata = ImageParentMetadata(
            parent_id=crop_identifier,
            origin_coordinates=OriginCoordinatesSystem(
                left_top_x=offset_x,
                left_top_y=offset_y,
                origin_width=origin_image_data.numpy_image.shape[1],
                origin_height=origin_image_data.numpy_image.shape[0],
            ),
        )
        workflow_root_ancestor_coordinates = replace(
            origin_image_data.workflow_root_ancestor_metadata.origin_coordinates,
            left_top_x=origin_image_data.workflow_root_ancestor_metadata.origin_coordinates.left_top_x
            + offset_x,
            left_top_y=origin_image_data.workflow_root_ancestor_metadata.origin_coordinates.left_top_y
            + offset_y,
        )
        workflow_root_ancestor_metadata = ImageParentMetadata(
            parent_id=origin_image_data.workflow_root_ancestor_metadata.parent_id,
            origin_coordinates=workflow_root_ancestor_coordinates,
        )
        video_metadata = None
        if preserve_video_metadata and origin_image_data._video_metadata is not None:
            video_metadata = copy(origin_image_data._video_metadata)
            video_metadata.video_identifier = (
                f"{video_metadata.video_identifier} | crop: {crop_identifier}"
            )
        return WorkflowImageData(
            parent_metadata=parent_metadata,
            workflow_root_ancestor_metadata=workflow_root_ancestor_metadata,
            numpy_image=cropped_image,
            video_metadata=video_metadata,
        )
from inference.core.workflows.core_steps.transformations.relative_static_crop.v1 import \
    RelativeStaticCropBlockV1

# --- Unit Tests ---

# Helper to create a synthetic WorkflowImageData
def make_image(width, height, value=1, parent_id="root", left_top_x=0, left_top_y=0):
    arr = np.full((height, width, 3), value, dtype=np.uint8)
    coords = OriginCoordinatesSystem(left_top_x=left_top_x, left_top_y=left_top_y, origin_width=width, origin_height=height)
    meta = ImageParentMetadata(parent_id=parent_id, origin_coordinates=coords)
    return WorkflowImageData(parent_metadata=meta, numpy_image=arr)

@pytest.fixture
def block():
    return RelativeStaticCropBlockV1()

# --- 1. Basic Test Cases ---

def test_basic_center_crop(block):
    # Crop a 10x10 image at center with width=0.5, height=0.5 (should yield 5x5 crop)
    img = make_image(10, 10)
    codeflash_output = block.run([img], x_center=0.5, y_center=0.5, width=0.5, height=0.5); result = codeflash_output # 23.0μs -> 22.6μs (1.58% faster)
    crop = result[0]["crops"]

def test_basic_full_crop(block):
    # Crop the entire image (width=1, height=1)
    img = make_image(8, 6)
    codeflash_output = block.run([img], x_center=0.5, y_center=0.5, width=1.0, height=1.0); result = codeflash_output # 20.6μs -> 20.3μs (1.36% faster)
    crop = result[0]["crops"]

def test_basic_multiple_images(block):
    # Crop two images at different centers
    img1 = make_image(10, 10, value=5)
    img2 = make_image(12, 8, value=7)
    codeflash_output = block.run([img1, img2], x_center=0.5, y_center=0.5, width=0.5, height=0.5); result = codeflash_output # 28.0μs -> 27.4μs (2.14% faster)
    crop1 = result[0]["crops"]
    crop2 = result[1]["crops"]

# --- 2. Edge Test Cases ---

def test_zero_size_crop(block):
    # width or height = 0 should return None
    img = make_image(10, 10)
    codeflash_output = block.run([img], x_center=0.5, y_center=0.5, width=0.0, height=0.5); result = codeflash_output # 4.78μs -> 4.01μs (19.2% faster)
    codeflash_output = block.run([img], x_center=0.5, y_center=0.5, width=0.5, height=0.0); result = codeflash_output # 2.13μs -> 1.78μs (19.7% faster)

def test_crop_outside_bounds(block):
    # Crop completely outside image (center at -0.5)
    img = make_image(10, 10)
    codeflash_output = block.run([img], x_center=-0.5, y_center=-0.5, width=0.2, height=0.2); result = codeflash_output # 19.8μs -> 19.9μs (0.071% slower)

def test_crop_partially_outside(block):
    # Crop partially outside (center near edge)
    img = make_image(10, 10)
    codeflash_output = block.run([img], x_center=0.95, y_center=0.95, width=0.2, height=0.2); result = codeflash_output # 19.0μs -> 18.6μs (2.03% faster)
    crop = result[0]["crops"]

def test_crop_minimal_image(block):
    # 1x1 image, crop center with full size
    img = make_image(1, 1)
    codeflash_output = block.run([img], x_center=0.5, y_center=0.5, width=1.0, height=1.0); result = codeflash_output # 18.9μs -> 18.5μs (2.21% faster)
    crop = result[0]["crops"]

def test_crop_non_square_image(block):
    img = make_image(20, 10)
    codeflash_output = block.run([img], x_center=0.5, y_center=0.5, width=0.5, height=0.5); result = codeflash_output # 18.2μs -> 18.4μs (1.06% slower)
    crop = result[0]["crops"]

def test_crop_with_video_metadata(block):
    # Check that video metadata is preserved and updated
    img = make_image(10, 10)
    video_meta = VideoMetadata(video_identifier="vid123")
    img._video_metadata = video_meta
    codeflash_output = block.run([img], x_center=0.5, y_center=0.5, width=0.5, height=0.5); result = codeflash_output # 26.2μs -> 25.9μs (1.01% faster)
    crop = result[0]["crops"]

def test_crop_invalid_input_raises():
    # Should raise ValueError if image is not initialised
    coords = OriginCoordinatesSystem(0, 0, 1, 1)
    meta = ImageParentMetadata(parent_id="root", origin_coordinates=coords)
    with pytest.raises(ValueError):
        WorkflowImageData(parent_metadata=meta)

# --- 3. Large Scale Test Cases ---

def test_large_batch(block):
    # Test cropping a batch of 100 images
    images = [make_image(20, 20, value=i) for i in range(100)]
    codeflash_output = block.run(images, x_center=0.5, y_center=0.5, width=0.5, height=0.5); result = codeflash_output # 637μs -> 577μs (10.3% faster)
    for i, res in enumerate(result):
        crop = res["crops"]

def test_large_image_crop(block):
    # Crop a large image (1000x500)
    img = make_image(1000, 500)
    codeflash_output = block.run([img], x_center=0.5, y_center=0.5, width=0.8, height=0.8); result = codeflash_output # 24.7μs -> 24.8μs (0.580% slower)
    crop = result[0]["crops"]

def test_large_image_edge_crop(block):
    # Crop near the edge of a large image
    img = make_image(999, 999)
    codeflash_output = block.run([img], x_center=0.99, y_center=0.99, width=0.2, height=0.2); result = codeflash_output # 27.5μs -> 27.5μs (0.073% faster)
    crop = result[0]["crops"]
# codeflash_output is used to check that the output of the original code is the same as that of the optimized code.
#------------------------------------------------
from copy import copy
from dataclasses import dataclass, replace
from typing import List, Optional
from uuid import uuid4

import numpy as np
# imports
import pytest
from inference.core.workflows.core_steps.transformations.relative_static_crop.v1 import \
    RelativeStaticCropBlockV1

# --- Minimal stubs for required classes ---

@dataclass
class OriginCoordinatesSystem:
    left_top_x: int
    left_top_y: int
    origin_width: int
    origin_height: int

@dataclass
class ImageParentMetadata:
    parent_id: str
    origin_coordinates: OriginCoordinatesSystem

@dataclass
class VideoMetadata:
    video_identifier: str

class WorkflowImageData:
    def __init__(
        self,
        parent_metadata: ImageParentMetadata,
        workflow_root_ancestor_metadata: Optional[ImageParentMetadata] = None,
        image_reference: Optional[str] = None,
        base64_image: Optional[str] = None,
        numpy_image: Optional[np.ndarray] = None,
        video_metadata: Optional[VideoMetadata] = None,
    ):
        if not base64_image and numpy_image is None and not image_reference:
            raise ValueError("Could not initialise empty `WorkflowImageData`.")
        self._parent_metadata = parent_metadata
        self._workflow_root_ancestor_metadata = (
            workflow_root_ancestor_metadata
            if workflow_root_ancestor_metadata
            else self._parent_metadata
        )
        self._image_reference = image_reference
        self._base64_image = base64_image
        self.numpy_image = numpy_image
        self._video_metadata = video_metadata

    @property
    def workflow_root_ancestor_metadata(self):
        return self._workflow_root_ancestor_metadata

    @classmethod
    def create_crop(
        cls,
        origin_image_data: "WorkflowImageData",
        crop_identifier: str,
        cropped_image: np.ndarray,
        offset_x: int,
        offset_y: int,
        preserve_video_metadata: bool = False,
    ) -> "WorkflowImageData":
        parent_metadata = ImageParentMetadata(
            parent_id=crop_identifier,
            origin_coordinates=OriginCoordinatesSystem(
                left_top_x=offset_x,
                left_top_y=offset_y,
                origin_width=origin_image_data.numpy_image.shape[1],
                origin_height=origin_image_data.numpy_image.shape[0],
            ),
        )
        workflow_root_ancestor_coordinates = replace(
            origin_image_data.workflow_root_ancestor_metadata.origin_coordinates,
            left_top_x=origin_image_data.workflow_root_ancestor_metadata.origin_coordinates.left_top_x
            + offset_x,
            left_top_y=origin_image_data.workflow_root_ancestor_metadata.origin_coordinates.left_top_y
            + offset_y,
        )
        workflow_root_ancestor_metadata = ImageParentMetadata(
            parent_id=origin_image_data.workflow_root_ancestor_metadata.parent_id,
            origin_coordinates=workflow_root_ancestor_coordinates,
        )
        video_metadata = None
        if preserve_video_metadata and origin_image_data._video_metadata is not None:
            video_metadata = copy(origin_image_data._video_metadata)
            video_metadata.video_identifier = (
                f"{video_metadata.video_identifier} | crop: {crop_identifier}"
            )
        return WorkflowImageData(
            parent_metadata=parent_metadata,
            workflow_root_ancestor_metadata=workflow_root_ancestor_metadata,
            numpy_image=cropped_image,
            video_metadata=video_metadata,
        )
from inference.core.workflows.core_steps.transformations.relative_static_crop.v1 import \
    RelativeStaticCropBlockV1

# --- Helper for generating test images ---

def make_image(shape=(10, 10), fill=1, parent_id="root", offset_x=0, offset_y=0, video_id=None):
    arr = np.full(shape, fill, dtype=np.uint8)
    coords = OriginCoordinatesSystem(
        left_top_x=offset_x,
        left_top_y=offset_y,
        origin_width=shape[1],
        origin_height=shape[0],
    )
    parent_metadata = ImageParentMetadata(parent_id=parent_id, origin_coordinates=coords)
    video_metadata = VideoMetadata(video_id) if video_id else None
    return WorkflowImageData(parent_metadata, None, numpy_image=arr, video_metadata=video_metadata)

# --- Unit tests ---

@pytest.fixture
def block():
    return RelativeStaticCropBlockV1()

# --- 1. Basic Test Cases ---

def test_center_crop_returns_expected_shape(block):
    # Crop the center 50% of a 10x10 image
    img = make_image(shape=(10, 10), fill=2)
    codeflash_output = block.run([img], x_center=0.5, y_center=0.5, width=0.5, height=0.5); result = codeflash_output # 19.8μs -> 20.0μs (0.727% slower)
    crop = result[0]['crops']

def test_crop_top_left_corner(block):
    # Crop the top left 20% of a 10x10 image
    img = make_image(shape=(10, 10), fill=3)
    codeflash_output = block.run([img], x_center=0.1, y_center=0.1, width=0.2, height=0.2); result = codeflash_output # 18.4μs -> 18.5μs (0.724% slower)
    crop = result[0]['crops']

def test_crop_bottom_right_corner(block):
    # Crop the bottom right 30% of a 10x10 image
    img = make_image(shape=(10, 10), fill=4)
    codeflash_output = block.run([img], x_center=0.9, y_center=0.9, width=0.3, height=0.3); result = codeflash_output # 18.7μs -> 18.1μs (3.24% faster)
    crop = result[0]['crops']

def test_multiple_images_basic(block):
    # Crop multiple images in batch
    img1 = make_image(shape=(10, 10), fill=5)
    img2 = make_image(shape=(10, 10), fill=6)
    codeflash_output = block.run([img1, img2], x_center=0.5, y_center=0.5, width=0.4, height=0.4); result = codeflash_output # 26.6μs -> 26.5μs (0.279% faster)
    crop1 = result[0]['crops']
    crop2 = result[1]['crops']

# --- 2. Edge Test Cases ---

def test_crop_entire_image(block):
    # Crop the entire image
    img = make_image(shape=(8, 8), fill=7)
    codeflash_output = block.run([img], x_center=0.5, y_center=0.5, width=1.0, height=1.0); result = codeflash_output # 17.4μs -> 17.4μs (0.011% slower)
    crop = result[0]['crops']

def test_crop_zero_width_height_returns_none(block):
    # Crop with zero width/height should return None
    img = make_image(shape=(10, 10), fill=8)
    codeflash_output = block.run([img], x_center=0.5, y_center=0.5, width=0.0, height=0.0); result = codeflash_output # 4.74μs -> 4.17μs (13.6% faster)
    crop = result[0]['crops']

def test_crop_completely_outside_image_returns_none(block):
    # Crop far outside image bounds
    img = make_image(shape=(10, 10), fill=9)
    codeflash_output = block.run([img], x_center=2.0, y_center=2.0, width=0.2, height=0.2); result = codeflash_output # 4.64μs -> 3.99μs (16.2% faster)
    crop = result[0]['crops']

def test_crop_partial_outside_image(block):
    # Crop partially outside image should return only the valid part
    img = make_image(shape=(10, 10), fill=10)
    # This crop will start at -2, -2 and end at 3,3 (so only 0:3,0:3 is valid)
    codeflash_output = block.run([img], x_center=0.0, y_center=0.0, width=0.5, height=0.5); result = codeflash_output # 4.79μs -> 4.10μs (16.8% faster)
    crop = result[0]['crops']

def test_crop_minimum_size(block):
    # Crop a single pixel
    img = make_image(shape=(10, 10), fill=11)
    codeflash_output = block.run([img], x_center=0.5, y_center=0.5, width=0.1, height=0.1); result = codeflash_output # 21.4μs -> 21.1μs (1.71% faster)
    crop = result[0]['crops']

def test_crop_non_square_image(block):
    # Crop a non-square image
    img = make_image(shape=(20, 10), fill=12)
    codeflash_output = block.run([img], x_center=0.5, y_center=0.5, width=0.5, height=0.5); result = codeflash_output # 19.8μs -> 19.7μs (0.512% faster)
    crop = result[0]['crops']

def test_crop_with_video_metadata_preserved(block):
    # Check that video metadata is preserved and updated
    img = make_image(shape=(10, 10), fill=13, video_id="vid1")
    codeflash_output = block.run([img], x_center=0.5, y_center=0.5, width=0.5, height=0.5); result = codeflash_output # 26.4μs -> 26.7μs (1.13% slower)
    crop = result[0]['crops']

def test_crop_with_negative_center(block):
    # Negative center should result in crop outside image and None
    img = make_image(shape=(10, 10), fill=14)
    codeflash_output = block.run([img], x_center=-0.5, y_center=-0.5, width=0.2, height=0.2); result = codeflash_output # 19.0μs -> 19.5μs (2.44% slower)
    crop = result[0]['crops']

def test_crop_with_large_width_height(block):
    # Large width/height should crop only what's inside image
    img = make_image(shape=(10, 10), fill=15)
    codeflash_output = block.run([img], x_center=0.5, y_center=0.5, width=2.0, height=2.0); result = codeflash_output # 18.9μs -> 18.9μs (0.053% slower)
    crop = result[0]['crops']

def test_crop_on_image_with_nonzero_offset(block):
    # Test that offset metadata is correctly updated
    img = make_image(shape=(10, 10), fill=16, parent_id="root", offset_x=5, offset_y=7)
    codeflash_output = block.run([img], x_center=0.5, y_center=0.5, width=0.5, height=0.5); result = codeflash_output # 18.9μs -> 18.4μs (2.41% faster)
    crop = result[0]['crops']

# --- 3. Large Scale Test Cases ---

def test_large_batch_performance(block):
    # Test with a batch of 500 images
    images = [make_image(shape=(20, 20), fill=i) for i in range(500)]
    codeflash_output = block.run(images, x_center=0.5, y_center=0.5, width=0.4, height=0.4); result = codeflash_output
    for i, r in enumerate(result):
        crop = r['crops']

def test_large_image_crop(block):
    # Crop a large image
    img = make_image(shape=(1000, 1000), fill=17)
    codeflash_output = block.run([img], x_center=0.5, y_center=0.5, width=0.6, height=0.6); result = codeflash_output # 29.5μs -> 28.7μs (2.56% faster)
    crop = result[0]['crops']

def test_large_batch_with_some_invalid(block):
    # Batch with some images that will return None
    images = [make_image(shape=(10, 10), fill=18) for _ in range(10)]
    images += [make_image(shape=(10, 10), fill=19) for _ in range(10)]
    # First 10 will be valid, next 10 will be invalid (crop outside)
    codeflash_output = block.run(images, x_center=0.5, y_center=0.5, width=0.5, height=0.5); result = codeflash_output # 140μs -> 134μs (5.17% faster)
    for i in range(10):
        pass
    for i in range(10, 20):
        # Use crop outside bounds
        codeflash_output = block.run([images[i]], x_center=2.0, y_center=2.0, width=0.2, height=0.2); result_invalid = codeflash_output # 14.8μs -> 11.6μs (28.1% faster)

def test_large_batch_all_none(block):
    # All crops are outside bounds, should all be None
    images = [make_image(shape=(10, 10), fill=20) for _ in range(100)]
    codeflash_output = block.run(images, x_center=2.0, y_center=2.0, width=0.2, height=0.2); result = codeflash_output # 88.0μs -> 64.8μs (35.8% faster)
# codeflash_output is used to check that the output of the original code is the same as that of the optimized code.

To edit these changes git checkout codeflash/optimize-RelativeStaticCropBlockV1.run-mhbw8pry and push.

Codeflash

The optimized code achieves a 7% speedup through several micro-optimizations in the `take_static_crop` function:

**Key optimizations:**

1. **Reduced attribute access**: Caches `image.numpy_image.shape` in a local variable, avoiding repeated property lookups (from 3 accesses to 1).

2. **Integer division optimization**: Replaces floating-point division (`width / 2`) with integer division (`crop_width // 2`) for half-width/half-height calculations, which is faster for integral results.

3. **Deferred UUID generation**: Moves the expensive `uuid4()` call after the empty crop check, avoiding UUID generation for invalid crops that return `None`.

4. **Better variable naming**: Uses more descriptive names like `crop_x_center` and `half_width` that make the computation clearer.

**Performance characteristics from tests:**
- **Best gains on invalid crops**: 13-35% faster when crops are empty or out-of-bounds due to avoiding UUID generation
- **Solid gains on large batches**: 10% faster on 100-image batches, 5% on mixed valid/invalid batches  
- **Modest gains on regular crops**: 1-3% faster on typical center/corner crops
- **Large images**: 2-3% improvement on high-resolution images

The optimizations are most effective when processing batches with many invalid crops or when UUID generation overhead becomes significant relative to the cropping computation.
@codeflash-ai codeflash-ai bot requested a review from mashraf-222 October 29, 2025 11:10
@codeflash-ai codeflash-ai bot added ⚡️ codeflash Optimization PR opened by Codeflash AI 🎯 Quality: High Optimization Quality according to Codeflash labels Oct 29, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

⚡️ codeflash Optimization PR opened by Codeflash AI 🎯 Quality: High Optimization Quality according to Codeflash

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant