⚡️ Speed up method `InferenceModelsDepthAnythingV2Adapter.predict` by 403% in PR #1959 (`feature/the-great-unification-of-inference`) by codeflash-ai[bot] · Pull Request #1973 · roboflow/inference

codeflash-ai · 2026-02-04T22:37:29Z

⚡️ This pull request contains optimizations for PR #1959

If you approve this dependent PR, these changes will be merged into the original PR branch feature/the-great-unification-of-inference.

This PR will be automatically closed if the original PR is merged.

📄 403% (4.03x) speedup for `InferenceModelsDepthAnythingV2Adapter.predict` in `inference/models/depth_anything_v2/depth_anything_v2_inference_models.py`

⏱️ Runtime : 218 milliseconds → 43.4 milliseconds (best of 21 runs)

📝 Explanation and details

The optimization replaces matplotlib's colormap application with OpenCV's cv2.applyColorMap, achieving a 4x speedup (403% faster, from 218ms to 43.4ms).

Key Changes

What was changed:

Removed matplotlib.pyplot import and plt.get_cmap("viridis") call
Added cv2 import
Replaced the line colored_depth = (cmap(depth_for_viz)[:, :, :3] * 255).astype(np.uint8) with two OpenCV operations:
- cv2.applyColorMap(depth_for_viz, cv2.COLORMAP_VIRIDIS)
- cv2.cvtColor(colored_depth, cv2.COLOR_BGR2RGB) (to convert OpenCV's BGR output to RGB)

Why This is Faster

Original bottleneck: The line profiler shows that matplotlib's colormap application consumed 84% of total runtime (200ms out of 238ms). Matplotlib's cmap(array) applies the colormap through Python-level iteration and normalization, which is extremely slow for per-pixel operations.

OpenCV's advantage: cv2.applyColorMap is a compiled C++ function optimized for image processing. It directly maps uint8 values to color values using a pre-computed lookup table, avoiding Python overhead entirely. The additional cvtColor operation is also highly optimized and adds minimal overhead (4.1ms, only 7.1% of new total time).

Performance breakdown: In the optimized version, the colormap operations now take only ~20ms combined (26.7% + 7.1%), compared to 200ms previously—a 10x improvement in the visualization step itself.

Impact on Workloads

This optimization is particularly valuable for:

High-throughput depth estimation pipelines where the predict method is called repeatedly
Real-time applications that need fast depth map visualization
Large-scale batch processing - as shown in the test with 600x600 images where runtime improved from 4.48ms to 840μs (434% faster)

The optimization preserves all functionality including normalization behavior and output format. All test cases show either speedups or minimal variations within measurement noise, confirming correctness while delivering consistent performance improvements across different input sizes and depth value ranges.

✅ Correctness verification report:

Test	Status
⏪ Replay Tests	🔘 None Found
⚙️ Existing Unit Tests	🔘 None Found
🔎 Concolic Coverage Tests	🔘 None Found
🌀 Generated Regression Tests	✅ 10 Passed
📊 Tests Coverage	100.0%

🌀 Click to see Generated Regression Tests

from unittest.mock import (
    patch,
)  # used to patch AutoModel.from_pretrained so tests are deterministic

import matplotlib.pyplot as plt  # used to recompute expected colored image in assertions
import numpy as np  # used to construct inputs and compute expected outputs

# imports
import pytest  # used for our unit tests
import torch  # used to construct tensors returned by the mocked model call
from inference.core.workflows.execution_engine.entities.base import (
    ImageParentMetadata,
    WorkflowImageData,
)
from inference.models.depth_anything_v2.depth_anything_v2_inference_models import (
    InferenceModelsDepthAnythingV2Adapter,
)


# Helper to create a callable model that returns a single-element tuple (predictions_tensor,)
def make_model_returning_tensor(tensor: torch.Tensor):
    # Return a simple callable (lambda) which when called returns (tensor,)
    return lambda inputs: (tensor,)


def test_basic_predict_normalization_and_image_generation():
    # Create a small numpy input (structure isn't used by our stubbed model, but pass for signature)
    input_np = np.zeros((2, 2), dtype=np.uint8)

    # Construct a predictions tensor with a known range [1.0, 4.0] shaped (2,2)
    predictions = torch.tensor([[1.0, 2.0], [3.0, 4.0]], dtype=torch.float32)

    # Patch the AutoModel.from_pretrained used in adapter's __init__ to return a callable model
    module_path = (
        "inference.models.depth_anything_v2.depth_anything_v2_inference_models"
    )
    attr_to_patch = f"{module_path}.AutoModel.from_pretrained"
    with patch(attr_to_patch) as mock_from_pretrained:
        # Make the adapter use a model that returns our predictions tensor
        mock_from_pretrained.return_value = make_model_returning_tensor(predictions)

        # Instantiate adapter (will call patched from_pretrained)
        adapter = InferenceModelsDepthAnythingV2Adapter(model_id="dummy-model")

    # Call predict with our input
    codeflash_output = adapter.predict(input_np)
    result_tuple = codeflash_output  # 233μs -> 249μs (6.27% slower)
    result = result_tuple[0]

    # Check normalized_depth correctness: expected = (predictions - 1) / (4 - 1)
    expected_normalized = (predictions.numpy() - 1.0) / 3.0
    returned_normalized = result["normalized_depth"]
    image_obj = result["image"]

    # Recompute expected colored image exactly the same way adapter does
    depth_for_viz = (returned_normalized * 255.0).astype(np.uint8)
    cmap = plt.get_cmap("viridis")
    expected_colored = (cmap(depth_for_viz)[:, :, :3] * 255).astype(np.uint8)
    returned_colored = image_obj.numpy_image


def test_predict_raises_on_constant_depth_map():
    input_np = np.zeros((4, 4), dtype=np.uint8)

    # Create a constant tensor where all values are 5.0 -> min == max
    constant_tensor = torch.full((4, 4), 5.0, dtype=torch.float32)

    module_path = (
        "inference.models.depth_anything_v2.depth_anything_v2_inference_models"
    )
    attr_to_patch = f"{module_path}.AutoModel.from_pretrained"
    with patch(attr_to_patch) as mock_from_pretrained:
        mock_from_pretrained.return_value = make_model_returning_tensor(constant_tensor)
        adapter = InferenceModelsDepthAnythingV2Adapter(model_id="dummy-model")

    # Expect a ValueError because normalization is impossible when min == max
    with pytest.raises(ValueError) as excinfo:
        adapter.predict(input_np)  # 28.1μs -> 27.8μs (1.23% faster)


def test_predict_handles_negative_values_and_normalizes_correctly():
    input_np = np.zeros((3, 3), dtype=np.uint8)

    # Predictions include negative and positive values
    predictions = torch.tensor(
        [[-10.0, 0.0, 10.0], [-5.0, 5.0, 15.0], [20.0, -20.0, 2.5]], dtype=torch.float32
    )

    module_path = (
        "inference.models.depth_anything_v2.depth_anything_v2_inference_models"
    )
    attr_to_patch = f"{module_path}.AutoModel.from_pretrained"
    with patch(attr_to_patch) as mock_from_pretrained:
        mock_from_pretrained.return_value = make_model_returning_tensor(predictions)
        adapter = InferenceModelsDepthAnythingV2Adapter(model_id="dummy-model")

    codeflash_output = adapter.predict(input_np)
    result_tuple = codeflash_output  # 234μs -> 251μs (7.03% slower)
    result = result_tuple[0]

    returned_normalized = result["normalized_depth"]
    # Manual normalization: (x - min) / (max - min)
    p = predictions.numpy()
    expected = (p - p.min()) / (p.max() - p.min())


def test_large_scale_predict_performs_on_large_but_safe_tensor():
    input_np = np.zeros((600, 600), dtype=np.uint8)  # input placeholder

    # Create a large predictions tensor with a varying gradient across the image
    # 600x600 float32: ~1.44MB in memory, safely under 100MB limit
    h, w = 600, 600
    # Create a gradient so min != max
    y = np.linspace(0.0, 1.0, h, dtype=np.float32)[:, None]
    x = np.linspace(0.0, 1.0, w, dtype=np.float32)[None, :]
    gradient = torch.from_numpy((x + y).astype(np.float32))  # shape (h,w)
    # Ensure some spread beyond simple small values
    gradient = gradient * 100.0 - 50.0  # now ranges roughly [-50, 150]

    module_path = (
        "inference.models.depth_anything_v2.depth_anything_v2_inference_models"
    )
    attr_to_patch = f"{module_path}.AutoModel.from_pretrained"
    with patch(attr_to_patch) as mock_from_pretrained:
        mock_from_pretrained.return_value = make_model_returning_tensor(gradient)
        adapter = InferenceModelsDepthAnythingV2Adapter(model_id="dummy-model")

    # Call predict and ensure it completes and returns expected keys and shapes
    codeflash_output = adapter.predict(input_np)
    result_tuple = codeflash_output  # 4.48ms -> 840μs (434% faster)
    result = result_tuple[0]
    nd = result["normalized_depth"]

    # Check image properties
    image_obj = result["image"]
    img = image_obj.numpy_image


def test_parent_metadata_parent_id_is_present_and_non_empty():
    input_np = np.zeros((2, 2), dtype=np.uint8)
    predictions = torch.tensor([[0.0, 1.0], [2.0, 3.0]], dtype=torch.float32)

    module_path = (
        "inference.models.depth_anything_v2.depth_anything_v2_inference_models"
    )
    attr_to_patch = f"{module_path}.AutoModel.from_pretrained"
    with patch(attr_to_patch) as mock_from_pretrained:
        mock_from_pretrained.return_value = make_model_returning_tensor(predictions)
        adapter = InferenceModelsDepthAnythingV2Adapter(model_id="dummy-model")

    codeflash_output = adapter.predict(input_np)
    result_tuple = codeflash_output  # 284μs -> 249μs (14.0% faster)
    image_obj = result_tuple[0]["image"]
    parent_meta = image_obj.parent_metadata


# codeflash_output is used to check that the output of the original code is the same as that of the optimized code.

To edit these changes git checkout codeflash/optimize-pr1959-2026-02-04T22.37.23 and push.

The optimization replaces matplotlib's colormap application with OpenCV's `cv2.applyColorMap`, achieving a **4x speedup** (403% faster, from 218ms to 43.4ms). ## Key Changes **What was changed:** - Removed `matplotlib.pyplot` import and `plt.get_cmap("viridis")` call - Added `cv2` import - Replaced the line `colored_depth = (cmap(depth_for_viz)[:, :, :3] * 255).astype(np.uint8)` with two OpenCV operations: - `cv2.applyColorMap(depth_for_viz, cv2.COLORMAP_VIRIDIS)` - `cv2.cvtColor(colored_depth, cv2.COLOR_BGR2RGB)` (to convert OpenCV's BGR output to RGB) ## Why This is Faster **Original bottleneck:** The line profiler shows that matplotlib's colormap application consumed **84% of total runtime** (200ms out of 238ms). Matplotlib's `cmap(array)` applies the colormap through Python-level iteration and normalization, which is extremely slow for per-pixel operations. **OpenCV's advantage:** `cv2.applyColorMap` is a compiled C++ function optimized for image processing. It directly maps uint8 values to color values using a pre-computed lookup table, avoiding Python overhead entirely. The additional `cvtColor` operation is also highly optimized and adds minimal overhead (4.1ms, only 7.1% of new total time). **Performance breakdown:** In the optimized version, the colormap operations now take only ~20ms combined (26.7% + 7.1%), compared to 200ms previously—a **10x improvement** in the visualization step itself. ## Impact on Workloads This optimization is particularly valuable for: - **High-throughput depth estimation pipelines** where the `predict` method is called repeatedly - **Real-time applications** that need fast depth map visualization - **Large-scale batch processing** - as shown in the test with 600x600 images where runtime improved from 4.48ms to 840μs (434% faster) The optimization preserves all functionality including normalization behavior and output format. All test cases show either speedups or minimal variations within measurement noise, confirming correctness while delivering consistent performance improvements across different input sizes and depth value ranges.

aseembits93 · 2026-02-04T22:41:35Z

inference/models/depth_anything_v2/depth_anything_v2_inference_models.py

+
+        # Convert numpy array to WorkflowImageData


Suggested change

# Convert numpy array to WorkflowImageData

codeflash-ai bot added the ⚡️ codeflash Optimization PR opened by Codeflash AI label Feb 4, 2026

codeflash-ai bot requested review from PawelPeczek-Roboflow, grzegorz-roboflow, hansent, probicheaux and yeldarby as code owners February 4, 2026 22:37

codeflash-ai bot added the 🎯 Quality: High Optimization Quality according to codeflash label Feb 4, 2026

codeflash-ai bot mentioned this pull request Feb 4, 2026

inference 1.0 RC1 #1959

Open

7 tasks

aseembits93 reviewed Feb 4, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

⚡️ Speed up method `InferenceModelsDepthAnythingV2Adapter.predict` by 403% in PR #1959 (`feature/the-great-unification-of-inference`)#1973

⚡️ Speed up method `InferenceModelsDepthAnythingV2Adapter.predict` by 403% in PR #1959 (`feature/the-great-unification-of-inference`)#1973
codeflash-ai[bot] wants to merge 1 commit intofeature/the-great-unification-of-inferencefrom
codeflash/optimize-pr1959-2026-02-04T22.37.23

codeflash-ai bot commented Feb 4, 2026

Uh oh!

aseembits93 Feb 4, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

codeflash-ai bot commented Feb 4, 2026

⚡️ This pull request contains optimizations for PR #1959

📄 403% (4.03x) speedup for InferenceModelsDepthAnythingV2Adapter.predict in inference/models/depth_anything_v2/depth_anything_v2_inference_models.py

📝 Explanation and details

Key Changes

Why This is Faster

Impact on Workloads

Uh oh!

aseembits93 Feb 4, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

📄 403% (4.03x) speedup for `InferenceModelsDepthAnythingV2Adapter.predict` in `inference/models/depth_anything_v2/depth_anything_v2_inference_models.py`