Skip to content

Conversation

@codeflash-ai
Copy link

@codeflash-ai codeflash-ai bot commented Oct 29, 2025

📄 156% (1.56x) speedup for create_model_info_response in litellm/proxy/utils.py

⏱️ Runtime : 125 milliseconds 48.7 milliseconds (best of 79 runs)

📝 Explanation and details

The optimization introduces a fast-path lookup in the get_all_fallbacks function that dramatically improves performance for exact model matches.

Key Optimization:

  • Added a preprocessing loop that checks for direct dictionary key matches before calling the expensive get_fallback_model_group function
  • When a fallback config item is a dictionary and its first key exactly matches the requested model, the function immediately returns the corresponding fallback list
  • Only falls back to the original get_fallback_model_group logic for complex cases (generic fallbacks, stripped matches, string entries)

Why This Works:
The line profiler shows that get_fallback_model_group was consuming 99.9% of execution time (634ms → 251ms reduction). This function performs expensive pattern matching, string operations, and list manipulations. The fast-path optimization bypasses this entirely for the common case of exact model name matches.

Performance Impact by Test Case:

  • Exact matches (like test_large_scale_many_fallbacks): 22,770% speedup - the fast-path completely avoids the expensive lookup
  • Generic fallbacks (like test_large_scale_generic_fallback_used): 2.95% speedup - still needs the full lookup logic
  • String-based fallbacks: 31% slower - the preprocessing loop adds overhead when it can't help

Import reorganization in create_model_info_response moves the import to module level, eliminating repeated import overhead during function calls.

The optimization is most effective for applications with large fallback configurations where models have direct dictionary mappings, which appears to be the common production use case based on the test results.

Correctness verification report:

Test Status
⚙️ Existing Unit Tests 🔘 None Found
🌀 Generated Regression Tests 298 Passed
⏪ Replay Tests 🔘 None Found
🔎 Concolic Coverage Tests 🔘 None Found
📊 Tests Coverage 100.0%
🌀 Generated Regression Tests and Runtime
import pytest
from fastapi import HTTPException
from litellm.proxy.utils import create_model_info_response

# --- Mocks and constants needed for testing ---

DEFAULT_MODEL_CREATED_AT_TIME = 1672531200  # Example default timestamp

# Minimal mock for Router class
class Router:
    def __init__(
        self, fallbacks=None, context_window_fallbacks=None, content_policy_fallbacks=None
    ):
        self.fallbacks = fallbacks if fallbacks is not None else []
        self.context_window_fallbacks = context_window_fallbacks if context_window_fallbacks is not None else []
        self.content_policy_fallbacks = content_policy_fallbacks if content_policy_fallbacks is not None else []
from litellm.proxy.utils import create_model_info_response

# --- Unit Tests ---

# ---- BASIC TEST CASES ----

def test_basic_no_metadata():
    """Basic: Response without metadata, just id and provider."""
    codeflash_output = create_model_info_response("gpt-4", "openai"); result = codeflash_output # 2.00μs -> 1.91μs (4.61% faster)

def test_basic_with_metadata_and_no_router():
    """Basic: Metadata requested, but no router provided (should fallback to empty list)."""
    codeflash_output = create_model_info_response("gpt-4", "openai", include_metadata=True); result = codeflash_output # 3.04μs -> 2.93μs (3.82% faster)

def test_basic_with_metadata_and_router_general():
    """Basic: Metadata with router and general fallback."""
    router = Router(fallbacks=[{"gpt-4": ["gpt-3.5-turbo", "gpt-3"]}])
    codeflash_output = create_model_info_response("gpt-4", "openai", include_metadata=True, llm_router=router); result = codeflash_output # 5.42μs -> 4.45μs (21.6% faster)

def test_basic_with_metadata_and_router_context_window():
    """Basic: Metadata with router and context_window fallback."""
    router = Router(context_window_fallbacks=[{"gpt-4": ["gpt-3.5-turbo"]}])
    codeflash_output = create_model_info_response(
        "gpt-4", "openai", include_metadata=True, fallback_type="context_window", llm_router=router
    ); result = codeflash_output # 5.53μs -> 4.69μs (17.9% faster)

def test_basic_with_metadata_and_router_content_policy():
    """Basic: Metadata with router and content_policy fallback."""
    router = Router(content_policy_fallbacks=[{"gpt-4": ["gpt-3.5-turbo", "gpt-3"]}])
    codeflash_output = create_model_info_response(
        "gpt-4", "openai", include_metadata=True, fallback_type="content_policy", llm_router=router
    ); result = codeflash_output # 5.42μs -> 4.69μs (15.6% faster)

# ---- EDGE TEST CASES ----

def test_edge_invalid_fallback_type():
    """Edge: Invalid fallback_type should raise HTTPException."""
    router = Router()
    with pytest.raises(HTTPException) as excinfo:
        create_model_info_response(
            "gpt-4", "openai", include_metadata=True, fallback_type="not_a_type", llm_router=router
        ) # 7.45μs -> 7.27μs (2.55% faster)

def test_edge_empty_model_id_and_provider():
    """Edge: Empty strings for model_id and provider."""
    codeflash_output = create_model_info_response("", "", include_metadata=False); result = codeflash_output # 2.34μs -> 2.32μs (0.689% faster)

def test_edge_none_model_id_and_provider():
    """Edge: None values for model_id and provider."""
    codeflash_output = create_model_info_response(None, None); result = codeflash_output # 2.06μs -> 1.94μs (6.03% faster)

def test_edge_router_with_no_matching_fallback():
    """Edge: Router has fallback configs, but no match for model_id."""
    router = Router(fallbacks=[{"other-model": ["gpt-3.5-turbo"]}])
    codeflash_output = create_model_info_response("gpt-4", "openai", include_metadata=True, llm_router=router); result = codeflash_output # 44.1μs -> 42.8μs (3.05% faster)

def test_edge_router_with_generic_fallback():
    """Edge: Router has a generic fallback '*'."""
    router = Router(fallbacks=[{"*": ["gpt-3.5-turbo", "gpt-3"]}])
    codeflash_output = create_model_info_response("gpt-4", "openai", include_metadata=True, llm_router=router); result = codeflash_output # 38.8μs -> 38.5μs (0.860% faster)

def test_edge_router_with_string_fallback():
    """Edge: Router fallback config is a string."""
    router = Router(fallbacks=["gpt-3.5-turbo"])
    codeflash_output = create_model_info_response("gpt-4", "openai", include_metadata=True, llm_router=router); result = codeflash_output # 4.87μs -> 5.10μs (4.57% slower)

def test_edge_router_with_mixed_fallback_types():
    """Edge: Router fallback config mixes dict and string."""
    router = Router(fallbacks=[{"gpt-4": ["gpt-3.5-turbo"]}, "gpt-3"])
    codeflash_output = create_model_info_response("gpt-4", "openai", include_metadata=True, llm_router=router); result = codeflash_output # 5.16μs -> 4.35μs (18.6% faster)

def test_edge_router_with_empty_fallbacks():
    """Edge: Router provided but fallback config is empty list."""
    router = Router(fallbacks=[])
    codeflash_output = create_model_info_response("gpt-4", "openai", include_metadata=True, llm_router=router); result = codeflash_output # 3.54μs -> 3.74μs (5.48% slower)

def test_edge_router_with_none_fallbacks():
    """Edge: Router provided but fallback config is None."""
    router = Router(fallbacks=None)
    codeflash_output = create_model_info_response("gpt-4", "openai", include_metadata=True, llm_router=router); result = codeflash_output # 3.56μs -> 3.59μs (1.03% slower)

def test_edge_metadata_with_no_router_and_non_general_type():
    """Edge: Metadata requested with non-general fallback_type and no router."""
    codeflash_output = create_model_info_response(
        "gpt-4", "openai", include_metadata=True, fallback_type="context_window", llm_router=None
    ); result = codeflash_output # 3.42μs -> 3.29μs (3.89% faster)

# ---- LARGE SCALE TEST CASES ----

def test_large_scale_many_fallbacks():
    """Large scale: Router with 1000 fallback entries, only one matches."""
    fallback_list = [{"model-{}".format(i): ["fallback-{}".format(i)]} for i in range(999)]
    # Add a generic fallback at the end
    fallback_list.append({"*": ["generic-fallback"]})
    router = Router(fallbacks=fallback_list)
    codeflash_output = create_model_info_response("model-500", "provider", include_metadata=True, llm_router=router); result = codeflash_output # 14.8ms -> 64.6μs (22770% faster)

def test_large_scale_generic_fallback_used():
    """Large scale: Router with 1000 fallback entries, no match, uses generic fallback."""
    fallback_list = [{"model-{}".format(i): ["fallback-{}".format(i)]} for i in range(999)]
    fallback_list.append({"*": ["generic-fallback"]})
    router = Router(fallbacks=fallback_list)
    codeflash_output = create_model_info_response("unknown-model", "provider", include_metadata=True, llm_router=router); result = codeflash_output # 29.7ms -> 28.8ms (2.95% faster)

def test_large_scale_empty_fallbacks():
    """Large scale: Router with empty fallback list."""
    router = Router(fallbacks=[])
    codeflash_output = create_model_info_response("model-1", "provider", include_metadata=True, llm_router=router); result = codeflash_output # 4.12μs -> 4.06μs (1.58% faster)

def test_large_scale_all_string_fallbacks():
    """Large scale: Router with 1000 string fallback entries (should use first string fallback)."""
    fallback_list = ["model-{}".format(i) for i in range(1000)]
    router = Router(fallbacks=fallback_list)
    codeflash_output = create_model_info_response("gpt-4", "openai", include_metadata=True, llm_router=router); result = codeflash_output # 68.2μs -> 98.9μs (31.1% slower)

def test_large_scale_context_window_fallbacks():
    """Large scale: Router with 1000 context_window fallback entries."""
    fallback_list = [{"model-{}".format(i): ["fallback-{}".format(i)]} for i in range(1000)]
    router = Router(context_window_fallbacks=fallback_list)
    codeflash_output = create_model_info_response(
        "model-999", "provider", include_metadata=True, fallback_type="context_window", llm_router=router
    ); result = codeflash_output # 29.7ms -> 120μs (24463% faster)

def test_large_scale_content_policy_fallbacks():
    """Large scale: Router with 1000 content_policy fallback entries."""
    fallback_list = [{"model-{}".format(i): ["fallback-{}".format(i)]} for i in range(1000)]
    router = Router(content_policy_fallbacks=fallback_list)
    codeflash_output = create_model_info_response(
        "model-123", "provider", include_metadata=True, fallback_type="content_policy", llm_router=router
    ); result = codeflash_output # 3.66ms -> 20.5μs (17758% faster)

def test_large_scale_no_router():
    """Large scale: No router, include_metadata True, should always fallback to empty list."""
    codeflash_output = create_model_info_response("model-1", "provider", include_metadata=True, llm_router=None); result = codeflash_output # 3.73μs -> 3.38μs (10.5% faster)

# ---- DETERMINISM TEST ----

def test_determinism_multiple_calls_same_args():
    """Determinism: Multiple calls with same args return same result."""
    router = Router(fallbacks=[{"gpt-4": ["gpt-3.5-turbo"]}])
    codeflash_output = create_model_info_response("gpt-4", "openai", include_metadata=True, llm_router=router); result1 = codeflash_output # 5.68μs -> 4.54μs (25.1% faster)
    codeflash_output = create_model_info_response("gpt-4", "openai", include_metadata=True, llm_router=router); result2 = codeflash_output # 2.31μs -> 1.98μs (16.6% faster)

# ---- CLEANUP TEST ----

def test_metadata_not_included_when_flag_false():
    """Metadata: Should not include metadata if include_metadata is False."""
    router = Router(fallbacks=[{"gpt-4": ["gpt-3.5-turbo"]}])
    codeflash_output = create_model_info_response("gpt-4", "openai", include_metadata=False, llm_router=router); result = codeflash_output # 2.14μs -> 2.20μs (2.73% slower)
# codeflash_output is used to check that the output of the original code is the same as that of the optimized code.
#------------------------------------------------
import pytest
from litellm.proxy.utils import create_model_info_response

# --- Mocks and constants for isolated testing ---

# Constant as defined in litellm.constants
DEFAULT_MODEL_CREATED_AT_TIME = 1677610602

# --- Minimal mock for Router ---
class MockRouter:
    # Accepts fallback config lists for each type
    def __init__(
        self,
        fallbacks=None,
        context_window_fallbacks=None,
        content_policy_fallbacks=None,
    ):
        self.fallbacks = fallbacks if fallbacks is not None else []
        self.context_window_fallbacks = context_window_fallbacks if context_window_fallbacks is not None else []
        self.content_policy_fallbacks = content_policy_fallbacks if content_policy_fallbacks is not None else []

# --- HTTPException mock ---
class HTTPException(Exception):
    def __init__(self, status_code, detail):
        self.status_code = status_code
        self.detail = detail
        super().__init__(f"{status_code}: {detail}")
from litellm.proxy.utils import create_model_info_response

# --- Unit tests ---
# 1. Basic Test Cases

def test_basic_no_metadata():
    """Test basic response without metadata."""
    codeflash_output = create_model_info_response("gpt-4", "openai"); resp = codeflash_output # 1.89μs -> 2.00μs (5.94% slower)

def test_basic_with_metadata_and_no_router():
    """Test metadata included, but router is None (should fallback to empty list)."""
    codeflash_output = create_model_info_response("gpt-3.5-turbo", "openai", include_metadata=True); resp = codeflash_output # 3.24μs -> 3.30μs (1.70% slower)

def test_basic_with_metadata_and_router_no_fallbacks():
    """Test with router, but no fallbacks configured."""
    router = MockRouter()
    codeflash_output = create_model_info_response("gpt-3.5-turbo", "openai", include_metadata=True, llm_router=router); resp = codeflash_output # 3.52μs -> 3.50μs (0.486% faster)

def test_basic_with_metadata_and_router_with_fallbacks():
    """Test with router and a fallback configured for the model."""
    router = MockRouter(fallbacks=[{"gpt-3.5-turbo": ["gpt-3", "gpt-4"]}])
    codeflash_output = create_model_info_response("gpt-3.5-turbo", "openai", include_metadata=True, llm_router=router); resp = codeflash_output # 5.33μs -> 4.45μs (19.9% faster)

def test_basic_with_metadata_and_router_with_generic_fallback():
    """Test with router and a generic fallback configured."""
    router = MockRouter(fallbacks=[{"*": ["gpt-3", "gpt-4"]}])
    codeflash_output = create_model_info_response("unknown-model", "openai", include_metadata=True, llm_router=router); resp = codeflash_output # 43.9μs -> 43.4μs (1.10% faster)

def test_basic_with_metadata_and_router_with_stripped_fallback():
    """Test with router and a fallback for a stripped model group."""
    router = MockRouter(fallbacks=[{"gpt": ["gpt-3", "gpt-4"]}])
    codeflash_output = create_model_info_response("gpt-3.5-turbo", "openai", include_metadata=True, llm_router=router); resp = codeflash_output # 39.1μs -> 38.1μs (2.70% faster)

def test_basic_with_metadata_and_router_with_fallback_type_context_window():
    """Test with context_window fallback type."""
    router = MockRouter(context_window_fallbacks=[{"gpt-4": ["gpt-3.5-turbo"]}])
    codeflash_output = create_model_info_response("gpt-4", "openai", include_metadata=True, fallback_type="context_window", llm_router=router); resp = codeflash_output # 5.46μs -> 4.64μs (17.8% faster)

def test_basic_with_metadata_and_router_with_fallback_type_content_policy():
    """Test with content_policy fallback type."""
    router = MockRouter(content_policy_fallbacks=[{"gpt-4": ["gpt-3.5-turbo"]}])
    codeflash_output = create_model_info_response("gpt-4", "openai", include_metadata=True, fallback_type="content_policy", llm_router=router); resp = codeflash_output # 5.22μs -> 4.81μs (8.44% faster)

# 2. Edge Test Cases


def test_edge_empty_model_id_and_provider():
    """Test with empty model_id and provider."""
    codeflash_output = create_model_info_response("", "", include_metadata=False); resp = codeflash_output # 3.11μs -> 3.23μs (3.68% slower)

def test_edge_model_id_not_in_fallbacks():
    """Test with a model_id not present in any fallback config."""
    router = MockRouter(fallbacks=[{"gpt-3.5-turbo": ["gpt-3"]}])
    codeflash_output = create_model_info_response("gpt-4", "openai", include_metadata=True, llm_router=router); resp = codeflash_output # 45.3μs -> 44.7μs (1.28% faster)

def test_edge_router_is_none_with_metadata():
    """Test with router=None and include_metadata=True."""
    codeflash_output = create_model_info_response("gpt-4", "openai", include_metadata=True, llm_router=None); resp = codeflash_output # 3.40μs -> 3.44μs (1.08% slower)

def test_edge_fallbacks_as_strings():
    """Test when fallback config is a list of strings."""
    router = MockRouter(fallbacks=["gpt-3.5-turbo"])
    codeflash_output = create_model_info_response("gpt-3.5-turbo", "openai", include_metadata=True, llm_router=router); resp = codeflash_output # 4.96μs -> 5.43μs (8.69% slower)

def test_edge_fallbacks_is_empty_list():
    """Test when fallback config is an empty list."""
    router = MockRouter(fallbacks=[])
    codeflash_output = create_model_info_response("gpt-4", "openai", include_metadata=True, llm_router=router); resp = codeflash_output # 3.65μs -> 3.64μs (0.247% faster)

def test_edge_fallbacks_is_none():
    """Test when fallback config is None."""
    router = MockRouter(fallbacks=None)
    codeflash_output = create_model_info_response("gpt-4", "openai", include_metadata=True, llm_router=router); resp = codeflash_output # 3.44μs -> 3.56μs (3.23% slower)

def test_edge_model_id_is_none():
    """Test with model_id=None (should not crash, id will be None)."""
    codeflash_output = create_model_info_response(None, "openai", include_metadata=False); resp = codeflash_output # 2.28μs -> 2.45μs (7.10% slower)

def test_edge_provider_is_none():
    """Test with provider=None (should not crash, owned_by will be None)."""
    codeflash_output = create_model_info_response("gpt-4", None, include_metadata=False); resp = codeflash_output # 2.13μs -> 2.24μs (4.60% slower)

def test_edge_router_has_multiple_fallback_types():
    """Test router with multiple fallback types, ensure correct one is chosen."""
    router = MockRouter(
        fallbacks=[{"gpt-4": ["gpt-3.5-turbo"]}],
        context_window_fallbacks=[{"gpt-4": ["gpt-3"]}],
        content_policy_fallbacks=[{"gpt-4": ["gpt-2"]}],
    )
    # general
    codeflash_output = create_model_info_response("gpt-4", "openai", include_metadata=True, fallback_type="general", llm_router=router); resp1 = codeflash_output # 5.68μs -> 4.76μs (19.5% faster)
    # context_window
    codeflash_output = create_model_info_response("gpt-4", "openai", include_metadata=True, fallback_type="context_window", llm_router=router); resp2 = codeflash_output # 2.61μs -> 2.23μs (16.8% faster)
    # content_policy
    codeflash_output = create_model_info_response("gpt-4", "openai", include_metadata=True, fallback_type="content_policy", llm_router=router); resp3 = codeflash_output # 1.92μs -> 1.72μs (11.5% faster)

# 3. Large Scale Test Cases

def test_large_many_models_and_fallbacks():
    """Test with a router containing a large number of fallback configs."""
    # Create 500 fallback configs
    fallback_configs = []
    for i in range(500):
        fallback_configs.append({f"model-{i}": [f"model-{i-1}", f"model-{i-2}"]})
    router = MockRouter(fallbacks=fallback_configs)
    # Test for a model in the middle
    codeflash_output = create_model_info_response("model-250", "provider-x", include_metadata=True, llm_router=router); resp = codeflash_output # 7.46ms -> 35.6μs (20840% faster)
    # Test for a model at the end
    codeflash_output = create_model_info_response("model-499", "provider-x", include_metadata=True, llm_router=router); resp2 = codeflash_output # 14.9ms -> 60.5μs (24440% faster)
    # Test for a model not present
    codeflash_output = create_model_info_response("model-1000", "provider-x", include_metadata=True, llm_router=router); resp3 = codeflash_output # 14.9ms -> 14.5ms (2.93% faster)

def test_large_generic_fallback_only():
    """Test with large fallback config using only generic fallback."""
    router = MockRouter(fallbacks=[{"*": ["generic-1", "generic-2"]}])
    # Try 100 different models
    for i in range(100):
        codeflash_output = create_model_info_response(f"model-{i}", "provider-x", include_metadata=True, llm_router=router); resp = codeflash_output # 3.14ms -> 3.03ms (3.61% faster)

def test_large_many_context_window_fallbacks():
    """Test with large context_window_fallbacks."""
    context_window_fallbacks = []
    for i in range(200):
        context_window_fallbacks.append({f"model-{i}": [f"cw-{i-1}", f"cw-{i-2}"]})
    router = MockRouter(context_window_fallbacks=context_window_fallbacks)
    codeflash_output = create_model_info_response("model-150", "provider-x", include_metadata=True, fallback_type="context_window", llm_router=router); resp = codeflash_output # 4.44ms -> 23.0μs (19181% faster)

def test_large_models_with_stripped_fallback():
    """Test many models with a fallback that matches via stripped fallback_key."""
    router = MockRouter(fallbacks=[{"base-model": ["fallback-1", "fallback-2"]}])
    # All model ids containing 'base-model' should match stripped fallback
    for i in range(50):
        codeflash_output = create_model_info_response(f"base-model-v{i}", "provider-x", include_metadata=True, llm_router=router); resp = codeflash_output # 1.57ms -> 1.53ms (2.65% faster)

def test_large_models_with_no_fallbacks():
    """Test many models with no fallback config."""
    router = MockRouter(fallbacks=[])
    for i in range(100):
        codeflash_output = create_model_info_response(f"model-{i}", "provider-x", include_metadata=True, llm_router=router); resp = codeflash_output # 119μs -> 117μs (1.52% faster)
# codeflash_output is used to check that the output of the original code is the same as that of the optimized code.

To edit these changes git checkout codeflash/optimize-create_model_info_response-mhbrlg82 and push.

Codeflash

The optimization introduces a **fast-path lookup** in the `get_all_fallbacks` function that dramatically improves performance for exact model matches.

**Key Optimization:**
- Added a preprocessing loop that checks for **direct dictionary key matches** before calling the expensive `get_fallback_model_group` function
- When a fallback config item is a dictionary and its first key exactly matches the requested model, the function immediately returns the corresponding fallback list
- Only falls back to the original `get_fallback_model_group` logic for complex cases (generic fallbacks, stripped matches, string entries)

**Why This Works:**
The line profiler shows that `get_fallback_model_group` was consuming 99.9% of execution time (634ms → 251ms reduction). This function performs expensive pattern matching, string operations, and list manipulations. The fast-path optimization bypasses this entirely for the common case of exact model name matches.

**Performance Impact by Test Case:**
- **Exact matches** (like `test_large_scale_many_fallbacks`): **22,770% speedup** - the fast-path completely avoids the expensive lookup
- **Generic fallbacks** (like `test_large_scale_generic_fallback_used`): **2.95% speedup** - still needs the full lookup logic
- **String-based fallbacks**: **31% slower** - the preprocessing loop adds overhead when it can't help

**Import reorganization** in `create_model_info_response` moves the import to module level, eliminating repeated import overhead during function calls.

The optimization is most effective for applications with large fallback configurations where models have direct dictionary mappings, which appears to be the common production use case based on the test results.
@codeflash-ai codeflash-ai bot requested a review from mashraf-222 October 29, 2025 08:59
@codeflash-ai codeflash-ai bot added ⚡️ codeflash Optimization PR opened by Codeflash AI 🎯 Quality: High Optimization Quality according to Codeflash labels Oct 29, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

⚡️ codeflash Optimization PR opened by Codeflash AI 🎯 Quality: High Optimization Quality according to Codeflash

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant