Skip to content

Conversation

@codeflash-ai
Copy link

@codeflash-ai codeflash-ai bot commented Oct 29, 2025

📄 31% (0.31x) speedup for BlurVisualizationBlockV1.getAnnotator in inference/core/workflows/core_steps/visualizations/blur/v1.py

⏱️ Runtime : 1.26 milliseconds 955 microseconds (best of 293 runs)

📝 Explanation and details

The optimization achieves a 31% speedup through three key micro-optimizations in the getAnnotator method:

Key optimizations:

  1. Simplified cache key generation: Replaced "_".join(map(str, [kernel_size])) with direct str(kernel_size). Since there's only one parameter, the join operation with map() was unnecessary overhead.

  2. Reduced attribute lookups: Cached self.annotatorCache in a local variable annotatorCache to avoid repeated dictionary attribute access on self.

  3. Optimized cache lookup pattern: Replaced the key not in dict check followed by dict[key] = assignment with dict.get(key) followed by conditional assignment. This reduces hash table lookups from 2-3 operations to 1-2 operations per cache access.

Performance characteristics:

  • Cache hits show the largest improvements (40-76% faster in tests), as the optimized lookup pattern is most beneficial when items are already cached
  • Cache misses still see solid gains (20-40% faster) from the simplified key generation and reduced attribute access
  • The optimizations are particularly effective for repeated calls with the same kernel_size, which is a common usage pattern in visualization workflows
  • Performance gains are consistent across different data types (integers, floats, strings, objects) due to the universal nature of the optimizations

These micro-optimizations compound effectively because getAnnotator is typically called frequently during batch processing of visualizations.

Correctness verification report:

Test Status
⚙️ Existing Unit Tests 🔘 None Found
🌀 Generated Regression Tests 2184 Passed
⏪ Replay Tests 🔘 None Found
🔎 Concolic Coverage Tests 🔘 None Found
📊 Tests Coverage 100.0%
🌀 Generated Regression Tests and Runtime
from abc import ABC

# imports
import pytest
from inference.core.workflows.core_steps.visualizations.blur.v1 import \
    BlurVisualizationBlockV1

# --- Minimal stubs for dependencies to enable testing ---

# Simulate the base annotator class
class BaseAnnotator:
    pass

# Simulate the BlurAnnotator, storing kernel_size for verification
class BlurAnnotator(BaseAnnotator):
    def __init__(self, kernel_size):
        self.kernel_size = kernel_size

# Simulate the supervision module and its structure
class sv:
    class annotators:
        class base:
            BaseAnnotator = BaseAnnotator
    BlurAnnotator = BlurAnnotator

# Simulate VisualizationBlock base class
class VisualizationBlock:
    def __init__(self, *args, **kwargs):
        pass

class PredictionsVisualizationBlock(VisualizationBlock, ABC):
    def __init__(self, *args, **kwargs):
        super().__init__(*args, **kwargs)
from inference.core.workflows.core_steps.visualizations.blur.v1 import \
    BlurVisualizationBlockV1


# --- Unit tests ---
class TestGetAnnotator:
    # --------- 1. Basic Test Cases ---------

    def test_basic_positive_integer(self):
        """Test with a typical positive kernel_size."""
        block = BlurVisualizationBlockV1()
        codeflash_output = block.getAnnotator(5); annotator = codeflash_output # 1.59μs -> 1.27μs (25.3% faster)

    def test_basic_zero(self):
        """Test with kernel_size=0 (edge of valid integer range)."""
        block = BlurVisualizationBlockV1()
        codeflash_output = block.getAnnotator(0); annotator = codeflash_output # 1.64μs -> 1.24μs (32.6% faster)

    def test_basic_multiple_calls_same_kernel(self):
        """Test that repeated calls with the same kernel_size return the same object (cache)."""
        block = BlurVisualizationBlockV1()
        codeflash_output = block.getAnnotator(3); annotator1 = codeflash_output # 1.62μs -> 1.22μs (33.2% faster)
        codeflash_output = block.getAnnotator(3); annotator2 = codeflash_output # 658ns -> 464ns (41.8% faster)

    def test_basic_multiple_calls_different_kernel(self):
        """Test that calls with different kernel_size return different objects."""
        block = BlurVisualizationBlockV1()
        codeflash_output = block.getAnnotator(2); annotator1 = codeflash_output # 1.60μs -> 1.24μs (29.3% faster)
        codeflash_output = block.getAnnotator(4); annotator2 = codeflash_output # 930ns -> 777ns (19.7% faster)

    # --------- 2. Edge Test Cases ---------

    def test_negative_kernel_size(self):
        """Test with a negative kernel_size (should be accepted as per current implementation)."""
        block = BlurVisualizationBlockV1()
        codeflash_output = block.getAnnotator(-1); annotator = codeflash_output # 1.64μs -> 1.29μs (27.2% faster)

    def test_large_kernel_size(self):
        """Test with a large kernel_size value."""
        block = BlurVisualizationBlockV1()
        large_value = 999
        codeflash_output = block.getAnnotator(large_value); annotator = codeflash_output # 1.64μs -> 1.24μs (32.4% faster)

    def test_non_integer_kernel_size(self):
        """Test with a float kernel_size (should still work as per key string conversion)."""
        block = BlurVisualizationBlockV1()
        codeflash_output = block.getAnnotator(2.5); annotator = codeflash_output # 2.96μs -> 2.44μs (21.6% faster)

    def test_string_kernel_size(self):
        """Test with a string kernel_size (should work, but not type safe)."""
        block = BlurVisualizationBlockV1()
        codeflash_output = block.getAnnotator("7"); annotator = codeflash_output # 1.66μs -> 1.15μs (44.5% faster)

    def test_tuple_kernel_size(self):
        """Test with a tuple as kernel_size (should work due to str conversion)."""
        block = BlurVisualizationBlockV1()
        codeflash_output = block.getAnnotator((1, 2)); annotator = codeflash_output # 2.60μs -> 2.15μs (21.2% faster)

    def test_none_kernel_size(self):
        """Test with None as kernel_size (should work as key is str(None))."""
        block = BlurVisualizationBlockV1()
        codeflash_output = block.getAnnotator(None); annotator = codeflash_output # 1.74μs -> 1.29μs (34.4% faster)

    def test_object_kernel_size(self):
        """Test with an object as kernel_size (should work, str(object))."""
        block = BlurVisualizationBlockV1()
        obj = object()
        codeflash_output = block.getAnnotator(obj); annotator = codeflash_output # 2.82μs -> 2.53μs (11.6% faster)

    # --------- 3. Large Scale Test Cases ---------

    def test_cache_with_many_kernel_sizes(self):
        """Test cache behavior with many different kernel_sizes."""
        block = BlurVisualizationBlockV1()
        max_size = 500  # Under 1000 per instructions
        annotators = []
        for i in range(max_size):
            annotators.append(block.getAnnotator(i)) # 279μs -> 211μs (31.8% faster)
        # All annotators should be unique objects
        kernel_sizes = set(a.kernel_size for a in annotators)

    def test_cache_reuse_large(self):
        """Test that cache reuses objects for repeated kernel_sizes in large scale."""
        block = BlurVisualizationBlockV1()
        for i in range(100):
            codeflash_output = block.getAnnotator(i); a1 = codeflash_output # 60.1μs -> 46.8μs (28.6% faster)
            codeflash_output = block.getAnnotator(i); a2 = codeflash_output # 37.5μs -> 23.5μs (59.9% faster)

    def test_cache_memory_efficiency(self):
        """Test that cache does not grow for repeated kernel_sizes."""
        block = BlurVisualizationBlockV1()
        for i in range(100):
            block.getAnnotator(i) # 59.1μs -> 45.5μs (30.1% faster)
        initial_cache_size = len(block.annotatorCache)
        # Repeated calls with same kernel_sizes
        for i in range(100):
            block.getAnnotator(i) # 35.2μs -> 22.1μs (59.4% faster)

    def test_large_non_integer_keys(self):
        """Test cache with many non-integer kernel_sizes."""
        block = BlurVisualizationBlockV1()
        keys = [f"ks_{i}" for i in range(500)]
        for k in keys:
            codeflash_output = block.getAnnotator(k); annotator = codeflash_output # 267μs -> 205μs (30.5% faster)

    def test_large_float_kernel_sizes(self):
        """Test cache with many float kernel_sizes."""
        block = BlurVisualizationBlockV1()
        for i in range(500):
            k = i + 0.5
            codeflash_output = block.getAnnotator(k); annotator = codeflash_output # 352μs -> 277μs (27.1% faster)

    # --------- Additional Edge Cases ---------

    def test_cache_key_collision(self):
        """Test that different types with same string representation collide in cache."""
        block = BlurVisualizationBlockV1()
        codeflash_output = block.getAnnotator(1); annotator_int = codeflash_output # 1.89μs -> 1.48μs (28.2% faster)
        codeflash_output = block.getAnnotator("1"); annotator_str = codeflash_output # 580ns -> 393ns (47.6% faster)
        # They should be different keys ('1' vs '1'), but since str(1) == '1', keys will collide
        # However, in the current implementation, since key is str(1) == str("1"), they will collide
        # So the first call will determine the kernel_size stored
        block2 = BlurVisualizationBlockV1()
        codeflash_output = block2.getAnnotator(1); a1 = codeflash_output # 743ns -> 598ns (24.2% faster)
        codeflash_output = block2.getAnnotator("1"); a2 = codeflash_output # 351ns -> 199ns (76.4% faster)

    def test_cache_key_uniqueness_tuple_vs_str(self):
        """Test that tuple and its string representation are different keys."""
        block = BlurVisualizationBlockV1()
        codeflash_output = block.getAnnotator((1, 2)); a_tuple = codeflash_output # 2.52μs -> 2.20μs (14.9% faster)
        codeflash_output = block.getAnnotator("(1, 2)"); a_str = codeflash_output # 566ns -> 387ns (46.3% faster)
# codeflash_output is used to check that the output of the original code is the same as that of the optimized code.
#------------------------------------------------
from abc import ABC

# imports
import pytest
from inference.core.workflows.core_steps.visualizations.blur.v1 import \
    BlurVisualizationBlockV1

# --- Minimal stubs for dependencies to allow testing ---
# These are required because the actual 'supervision' library and its classes are not available.
# They are minimal, deterministic, and do not mock any external state.

class BaseAnnotator:
    def __init__(self, kernel_size):
        self.kernel_size = kernel_size

class BlurAnnotator(BaseAnnotator):
    pass

# Simulate the 'supervision' package structure
class sv:
    class annotators:
        class base:
            BaseAnnotator = BaseAnnotator
        BlurAnnotator = BlurAnnotator

# Minimal VisualizationBlock and PredictionsVisualizationBlock for inheritance
class VisualizationBlock:
    def __init__(self, *args, **kwargs):
        pass


class PredictionsVisualizationBlock(VisualizationBlock, ABC):
    def __init__(self, *args, **kwargs):
        super().__init__(*args, **kwargs)
from inference.core.workflows.core_steps.visualizations.blur.v1 import \
    BlurVisualizationBlockV1

# --- Unit tests ---

# ----------- BASIC TEST CASES -----------

def test_basic_single_call_returns_blur_annotator():
    """Test that getAnnotator returns a BlurAnnotator instance with correct kernel_size."""
    block = BlurVisualizationBlockV1()
    codeflash_output = block.getAnnotator(3); annotator = codeflash_output # 1.89μs -> 1.36μs (38.5% faster)

def test_basic_multiple_calls_same_kernel_size_returns_same_instance():
    """Test that repeated calls with the same kernel_size return the same instance (cache works)."""
    block = BlurVisualizationBlockV1()
    codeflash_output = block.getAnnotator(5); annotator1 = codeflash_output # 1.74μs -> 1.27μs (36.4% faster)
    codeflash_output = block.getAnnotator(5); annotator2 = codeflash_output # 685ns -> 453ns (51.2% faster)

def test_basic_multiple_calls_different_kernel_sizes_return_different_instances():
    """Test that calls with different kernel_size values return different instances."""
    block = BlurVisualizationBlockV1()
    codeflash_output = block.getAnnotator(2); annotator1 = codeflash_output # 1.69μs -> 1.24μs (36.6% faster)
    codeflash_output = block.getAnnotator(4); annotator2 = codeflash_output # 905ns -> 736ns (23.0% faster)

# ----------- EDGE TEST CASES -----------

def test_edge_kernel_size_zero():
    """Test behavior when kernel_size is zero (edge case)."""
    block = BlurVisualizationBlockV1()
    codeflash_output = block.getAnnotator(0); annotator = codeflash_output # 1.64μs -> 1.21μs (35.3% faster)

def test_edge_kernel_size_negative():
    """Test behavior when kernel_size is negative (edge case)."""
    block = BlurVisualizationBlockV1()
    codeflash_output = block.getAnnotator(-1); annotator = codeflash_output # 1.80μs -> 1.35μs (33.3% faster)

def test_edge_kernel_size_large_int():
    """Test behavior with a very large kernel_size."""
    block = BlurVisualizationBlockV1()
    large_kernel = 10**6
    codeflash_output = block.getAnnotator(large_kernel); annotator = codeflash_output # 1.78μs -> 1.31μs (36.0% faster)

def test_edge_kernel_size_float():
    """Test behavior when kernel_size is a float (should cache separately from int)."""
    block = BlurVisualizationBlockV1()
    codeflash_output = block.getAnnotator(3); annotator1 = codeflash_output # 1.73μs -> 1.26μs (37.7% faster)
    codeflash_output = block.getAnnotator(3.0); annotator2 = codeflash_output # 1.45μs -> 1.22μs (18.5% faster)

def test_edge_kernel_size_non_integer_string():
    """Test behavior when kernel_size is a string (should cache separately)."""
    block = BlurVisualizationBlockV1()
    codeflash_output = block.getAnnotator("7"); annotator1 = codeflash_output # 1.54μs -> 1.09μs (40.6% faster)
    codeflash_output = block.getAnnotator(7); annotator2 = codeflash_output # 748ns -> 525ns (42.5% faster)

def test_edge_cache_independence_between_instances():
    """Test that cache is not shared between different BlurVisualizationBlockV1 instances."""
    block1 = BlurVisualizationBlockV1()
    block2 = BlurVisualizationBlockV1()
    codeflash_output = block1.getAnnotator(11); annotator1 = codeflash_output # 1.63μs -> 1.18μs (39.0% faster)
    codeflash_output = block2.getAnnotator(11); annotator2 = codeflash_output # 848ns -> 688ns (23.3% faster)

def test_edge_cache_key_uniqueness():
    """Test that cache keys are unique for different values and types."""
    block = BlurVisualizationBlockV1()
    codeflash_output = block.getAnnotator(8); annotator_int = codeflash_output # 1.55μs -> 1.16μs (33.7% faster)
    codeflash_output = block.getAnnotator(8.0); annotator_float = codeflash_output # 1.41μs -> 1.17μs (20.4% faster)
    codeflash_output = block.getAnnotator("8"); annotator_str = codeflash_output # 484ns -> 317ns (52.7% faster)

# ----------- LARGE SCALE TEST CASES -----------


def test_large_scale_cache_memory_efficiency():
    """Test that the cache does not grow with repeated calls for same kernel_size."""
    block = BlurVisualizationBlockV1()
    for _ in range(100):
        block.getAnnotator(42) # 37.1μs -> 24.1μs (53.6% faster)
    key = "_".join(map(str, [42]))


def test_large_scale_cache_keys_are_strings():
    """Test that all cache keys are strings and unique for each kernel_size."""
    block = BlurVisualizationBlockV1()
    n = 100
    for i in range(n):
        block.getAnnotator(i) # 60.4μs -> 47.1μs (28.2% faster)
    for key in block.annotatorCache.keys():
        pass

# ----------- ADDITIONAL EDGE CASES -----------

def test_edge_kernel_size_tuple():
    """Test behavior when kernel_size is a tuple (should cache separately)."""
    block = BlurVisualizationBlockV1()
    codeflash_output = block.getAnnotator((1,2)); annotator1 = codeflash_output # 2.75μs -> 2.17μs (26.9% faster)
    codeflash_output = block.getAnnotator("1_2"); annotator2 = codeflash_output # 805ns -> 682ns (18.0% faster)

def test_edge_kernel_size_none():
    """Test behavior when kernel_size is None."""
    block = BlurVisualizationBlockV1()
    codeflash_output = block.getAnnotator(None); annotator = codeflash_output # 1.71μs -> 1.28μs (33.2% faster)

def test_edge_kernel_size_bool():
    """Test behavior when kernel_size is a boolean."""
    block = BlurVisualizationBlockV1()
    codeflash_output = block.getAnnotator(True); annotator_true = codeflash_output # 1.73μs -> 1.29μs (33.4% faster)
    codeflash_output = block.getAnnotator(False); annotator_false = codeflash_output # 815ns -> 697ns (16.9% faster)

def test_edge_kernel_size_object():
    """Test behavior when kernel_size is a custom object."""
    class Dummy:
        def __str__(self):
            return "dummy"
    block = BlurVisualizationBlockV1()
    dummy = Dummy()
    codeflash_output = block.getAnnotator(dummy); annotator = codeflash_output # 2.08μs -> 1.70μs (22.1% faster)

# ----------- DETERMINISM TEST -----------

def test_determinism_repeated_runs():
    """Test that repeated runs with the same input yield the same output instance."""
    block = BlurVisualizationBlockV1()
    for _ in range(5):
        codeflash_output = block.getAnnotator(99); annotator = codeflash_output # 3.29μs -> 2.45μs (34.5% faster)
# codeflash_output is used to check that the output of the original code is the same as that of the optimized code.

To edit these changes git checkout codeflash/optimize-BlurVisualizationBlockV1.getAnnotator-mhbu0u4v and push.

Codeflash

The optimization achieves a **31% speedup** through three key micro-optimizations in the `getAnnotator` method:

**Key optimizations:**

1. **Simplified cache key generation**: Replaced `"_".join(map(str, [kernel_size]))` with direct `str(kernel_size)`. Since there's only one parameter, the join operation with map() was unnecessary overhead.

2. **Reduced attribute lookups**: Cached `self.annotatorCache` in a local variable `annotatorCache` to avoid repeated dictionary attribute access on `self`.

3. **Optimized cache lookup pattern**: Replaced the `key not in dict` check followed by `dict[key] =` assignment with `dict.get(key)` followed by conditional assignment. This reduces hash table lookups from 2-3 operations to 1-2 operations per cache access.

**Performance characteristics:**
- **Cache hits** show the largest improvements (40-76% faster in tests), as the optimized lookup pattern is most beneficial when items are already cached
- **Cache misses** still see solid gains (20-40% faster) from the simplified key generation and reduced attribute access
- The optimizations are particularly effective for repeated calls with the same `kernel_size`, which is a common usage pattern in visualization workflows
- Performance gains are consistent across different data types (integers, floats, strings, objects) due to the universal nature of the optimizations

These micro-optimizations compound effectively because `getAnnotator` is typically called frequently during batch processing of visualizations.
@codeflash-ai codeflash-ai bot requested a review from mashraf-222 October 29, 2025 10:07
@codeflash-ai codeflash-ai bot added ⚡️ codeflash Optimization PR opened by Codeflash AI 🎯 Quality: High Optimization Quality according to Codeflash labels Oct 29, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

⚡️ codeflash Optimization PR opened by Codeflash AI 🎯 Quality: High Optimization Quality according to Codeflash

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant