Skip to content

Conversation

@codeflash-ai
Copy link

@codeflash-ai codeflash-ai bot commented Oct 30, 2025

📄 5% (0.05x) speedup for LiteLLMResponsesTransformationHandler._handle_raw_dict_response_item in litellm/completion_extras/litellm_responses_transformation/transformation.py

⏱️ Runtime : 497 microseconds 472 microseconds (best of 58 runs)

📝 Explanation and details

The optimization achieves a 5% speedup by moving expensive imports out of the hot path and consolidating conditional checks.

Key optimizations:

  1. Import-on-demand: The from litellm.types.utils import Choices, Message statement is moved inside the conditional block where it's actually needed, rather than being executed on every function call. This eliminates unnecessary import overhead for cases that don't create Choice objects (like "reasoning" types or messages without output_text).

  2. Consolidated conditional checking: The original code performed two separate operations - isinstance(content_item, dict) followed by content_item.get("type") - for every content item. The optimized version combines these into a single compound conditional isinstance(content_item, dict) and content_item.get("type") == "output_text", reducing redundant attribute lookups.

Performance benefits by test case:

  • Best gains on edge cases that exit early: reasoning types (216-302% faster), empty content lists (153% faster), and non-dict content items (133% faster) benefit most since they avoid the expensive import entirely
  • Moderate gains on large-scale tests with many non-output_text items (3.78-6.62% faster) due to the consolidated conditional checks reducing iteration overhead
  • Minimal impact on successful output_text cases (0.5-3% variance) since the import still occurs, but is deferred until actually needed

The optimization is particularly effective for workloads with many "reasoning" items, empty content, or large content lists with few actual output_text items, while maintaining identical behavior for all valid cases.

Correctness verification report:

Test Status
⚙️ Existing Unit Tests 🔘 None Found
🌀 Generated Regression Tests 40 Passed
⏪ Replay Tests 🔘 None Found
🔎 Concolic Coverage Tests 4 Passed
📊 Tests Coverage 100.0%
🌀 Generated Regression Tests and Runtime
from typing import Any, Dict, Optional, Tuple

# imports
import pytest  # used for our unit tests
from litellm.completion_extras.litellm_responses_transformation.transformation import \
    LiteLLMResponsesTransformationHandler


# Minimal stubs for Choices and Message to allow testing
class Message:
    def __init__(self, role: str, content: str):
        self.role = role
        self.content = content

    def __eq__(self, other):
        return isinstance(other, Message) and self.role == other.role and self.content == other.content

class Choices:
    def __init__(self, message: Message, finish_reason: str, index: int):
        self.message = message
        self.finish_reason = finish_reason
        self.index = index

    def __eq__(self, other):
        return (
            isinstance(other, Choices)
            and self.message == other.message
            and self.finish_reason == other.finish_reason
            and self.index == other.index
        )

# function to test
def _handle_raw_dict_response_item(item: Dict[str, Any], index: int) -> Tuple[Optional[Any], int]:
    """
    Handle raw dict response items from Responses API (e.g., GPT-5 Codex format).

    Args:
        item: Raw dict response item with 'type' field
        index: Current choice index

    Returns:
        Tuple of (Choice object or None, updated index)
    """
    item_type = item.get("type")

    # Ignore reasoning items for now
    if item_type == "reasoning":
        return None, index

    # Handle message items with output_text content
    if item_type == "message":
        content_list = item.get("content", [])
        for content_item in content_list:
            if isinstance(content_item, dict):
                content_type = content_item.get("type")
                if content_type == "output_text":
                    response_text = content_item.get("text", "")
                    msg = Message(
                        role=item.get("role", "assistant"),
                        content=response_text if response_text else "",
                    )
                    choice = Choices(message=msg, finish_reason="stop", index=index)
                    return choice, index + 1

    # Unknown or unsupported type
    return None, index

# unit tests

# ----------- Basic Test Cases -----------




















#------------------------------------------------
from typing import Any, Dict, Optional, Tuple

# imports
import pytest  # used for our unit tests
from litellm.completion_extras.litellm_responses_transformation.transformation import \
    LiteLLMResponsesTransformationHandler


# Dummy Choices and Message classes for testing purposes
class Message:
    def __init__(self, role: str, content: str):
        self.role = role
        self.content = content

    def __eq__(self, other):
        return (
            isinstance(other, Message)
            and self.role == other.role
            and self.content == other.content
        )

    def __repr__(self):
        return f"Message(role={self.role!r}, content={self.content!r})"

class Choices:
    def __init__(self, message: Message, finish_reason: str, index: int):
        self.message = message
        self.finish_reason = finish_reason
        self.index = index

    def __eq__(self, other):
        return (
            isinstance(other, Choices)
            and self.message == other.message
            and self.finish_reason == other.finish_reason
            and self.index == other.index
        )

    def __repr__(self):
        return (
            f"Choices(message={self.message!r}, finish_reason={self.finish_reason!r}, index={self.index!r})"
        )
from litellm.completion_extras.litellm_responses_transformation.transformation import \
    LiteLLMResponsesTransformationHandler

# unit tests

# --- BASIC TEST CASES ---
def test_message_with_output_text_minimal():
    """Basic: Message with output_text and minimal fields"""
    handler = LiteLLMResponsesTransformationHandler()
    item = {
        "type": "message",
        "content": [{"type": "output_text", "text": "Hello world!"}],
        "role": "assistant"
    }
    choice, new_index = handler._handle_raw_dict_response_item(item, 0) # 38.0μs -> 38.5μs (1.29% slower)
    expected_msg = Message(role="assistant", content="Hello world!")
    expected_choice = Choices(message=expected_msg, finish_reason="stop", index=0)

def test_message_with_output_text_missing_role():
    """Basic: Message with output_text and missing role (should default to 'assistant')"""
    handler = LiteLLMResponsesTransformationHandler()
    item = {
        "type": "message",
        "content": [{"type": "output_text", "text": "Hi!"}]
    }
    choice, new_index = handler._handle_raw_dict_response_item(item, 5) # 27.4μs -> 27.2μs (0.650% faster)
    expected_msg = Message(role="assistant", content="Hi!")
    expected_choice = Choices(message=expected_msg, finish_reason="stop", index=5)

def test_reasoning_type_ignored():
    """Basic: Reasoning type should be ignored and return None"""
    handler = LiteLLMResponsesTransformationHandler()
    item = {
        "type": "reasoning",
        "content": [{"type": "output_text", "text": "Should be ignored"}]
    }
    choice, new_index = handler._handle_raw_dict_response_item(item, 2) # 2.11μs -> 667ns (216% faster)

def test_message_with_multiple_content_items():
    """Basic: Message with multiple content items, only first output_text is used"""
    handler = LiteLLMResponsesTransformationHandler()
    item = {
        "type": "message",
        "content": [
            {"type": "output_text", "text": "First"},
            {"type": "output_text", "text": "Second"},
            {"type": "other_type", "data": "Ignored"}
        ],
        "role": "assistant"
    }
    choice, new_index = handler._handle_raw_dict_response_item(item, 1) # 28.0μs -> 29.1μs (3.61% slower)
    expected_msg = Message(role="assistant", content="First")
    expected_choice = Choices(message=expected_msg, finish_reason="stop", index=1)

def test_message_with_no_output_text():
    """Basic: Message with no output_text in content should return None"""
    handler = LiteLLMResponsesTransformationHandler()
    item = {
        "type": "message",
        "content": [{"type": "other_type", "data": "foo"}],
        "role": "assistant"
    }
    choice, new_index = handler._handle_raw_dict_response_item(item, 3) # 2.61μs -> 1.20μs (118% faster)


def test_message_with_empty_content_list():
    """Edge: Message with empty content list should return None"""
    handler = LiteLLMResponsesTransformationHandler()
    item = {
        "type": "message",
        "content": [],
        "role": "assistant"
    }
    choice, new_index = handler._handle_raw_dict_response_item(item, 0) # 2.25μs -> 887ns (153% faster)

def test_message_with_content_not_a_list():
    """Edge: Message with content not a list (should default to empty)"""
    handler = LiteLLMResponsesTransformationHandler()
    item = {
        "type": "message",
        "content": None,
        "role": "assistant"
    }
    choice, new_index = handler._handle_raw_dict_response_item(item, 0)

def test_message_with_content_items_not_dict():
    """Edge: Message with content items not dict (should be ignored)"""
    handler = LiteLLMResponsesTransformationHandler()
    item = {
        "type": "message",
        "content": ["not_a_dict", 123, None],
        "role": "assistant"
    }
    choice, new_index = handler._handle_raw_dict_response_item(item, 0) # 3.72μs -> 1.60μs (133% faster)

def test_message_with_output_text_missing_text_field():
    """Edge: Message with output_text missing text field (should default to empty string)"""
    handler = LiteLLMResponsesTransformationHandler()
    item = {
        "type": "message",
        "content": [{"type": "output_text"}],
        "role": "assistant"
    }
    choice, new_index = handler._handle_raw_dict_response_item(item, 0) # 37.1μs -> 37.4μs (0.849% slower)
    expected_msg = Message(role="assistant", content="")
    expected_choice = Choices(message=expected_msg, finish_reason="stop", index=0)

def test_message_with_output_text_empty_string():
    """Edge: Message with output_text with empty string"""
    handler = LiteLLMResponsesTransformationHandler()
    item = {
        "type": "message",
        "content": [{"type": "output_text", "text": ""}],
        "role": "assistant"
    }
    choice, new_index = handler._handle_raw_dict_response_item(item, 0) # 27.4μs -> 27.2μs (0.551% faster)
    expected_msg = Message(role="assistant", content="")
    expected_choice = Choices(message=expected_msg, finish_reason="stop", index=0)

def test_message_with_non_string_text_field():
    """Edge: Message with output_text where text is not a string (should coerce to str)"""
    handler = LiteLLMResponsesTransformationHandler()
    item = {
        "type": "message",
        "content": [{"type": "output_text", "text": 12345}],
        "role": "assistant"
    }
    # The function does not coerce to str, so it will use 12345 as is
    choice, new_index = handler._handle_raw_dict_response_item(item, 0)
    expected_msg = Message(role="assistant", content=12345)
    expected_choice = Choices(message=expected_msg, finish_reason="stop", index=0)

def test_message_with_role_none():
    """Edge: Message with role set to None should default to 'assistant'"""
    handler = LiteLLMResponsesTransformationHandler()
    item = {
        "type": "message",
        "content": [{"type": "output_text", "text": "Role is None"}],
        "role": None
    }
    choice, new_index = handler._handle_raw_dict_response_item(item, 0) # 38.0μs -> 37.8μs (0.720% faster)
    expected_msg = Message(role="assistant", content="Role is None")
    expected_choice = Choices(message=expected_msg, finish_reason="stop", index=0)

def test_message_with_extra_fields():
    """Edge: Message with extra fields should not affect output"""
    handler = LiteLLMResponsesTransformationHandler()
    item = {
        "type": "message",
        "content": [{"type": "output_text", "text": "Extra"}],
        "role": "assistant",
        "extra_field": "should be ignored"
    }
    choice, new_index = handler._handle_raw_dict_response_item(item, 0) # 28.9μs -> 28.1μs (2.76% faster)
    expected_msg = Message(role="assistant", content="Extra")
    expected_choice = Choices(message=expected_msg, finish_reason="stop", index=0)

def test_message_with_output_text_and_other_types():
    """Edge: Message with output_text and other types, should use first output_text"""
    handler = LiteLLMResponsesTransformationHandler()
    item = {
        "type": "message",
        "content": [
            {"type": "other_type", "foo": "bar"},
            {"type": "output_text", "text": "Found!"},
            {"type": "output_text", "text": "Second one"}
        ],
        "role": "assistant"
    }
    choice, new_index = handler._handle_raw_dict_response_item(item, 10) # 26.5μs -> 26.7μs (0.798% slower)
    expected_msg = Message(role="assistant", content="Found!")
    expected_choice = Choices(message=expected_msg, finish_reason="stop", index=10)

# --- LARGE SCALE TEST CASES ---
def test_large_scale_many_output_texts():
    """Large scale: Message with 1000 output_text items, should return first one only"""
    handler = LiteLLMResponsesTransformationHandler()
    content_list = [{"type": "output_text", "text": f"text_{i}"} for i in range(1000)]
    item = {
        "type": "message",
        "content": content_list,
        "role": "assistant"
    }
    choice, new_index = handler._handle_raw_dict_response_item(item, 0) # 29.4μs -> 29.9μs (1.43% slower)
    expected_msg = Message(role="assistant", content="text_0")
    expected_choice = Choices(message=expected_msg, finish_reason="stop", index=0)

def test_large_scale_many_other_types_before_output_text():
    """Large scale: Message with 999 other types before one output_text at end"""
    handler = LiteLLMResponsesTransformationHandler()
    content_list = [{"type": "other_type", "foo": i} for i in range(999)]
    content_list.append({"type": "output_text", "text": "final_text"})
    item = {
        "type": "message",
        "content": content_list,
        "role": "assistant"
    }
    choice, new_index = handler._handle_raw_dict_response_item(item, 123) # 80.5μs -> 77.5μs (3.78% faster)
    expected_msg = Message(role="assistant", content="final_text")
    expected_choice = Choices(message=expected_msg, finish_reason="stop", index=123)

def test_large_scale_reasoning_type():
    """Large scale: Reasoning type with large content list should return None"""
    handler = LiteLLMResponsesTransformationHandler()
    content_list = [{"type": "output_text", "text": f"text_{i}"} for i in range(1000)]
    item = {
        "type": "reasoning",
        "content": content_list
    }
    choice, new_index = handler._handle_raw_dict_response_item(item, 999) # 2.54μs -> 631ns (302% faster)


def test_large_scale_message_with_mixed_content():
    """Large scale: Message with 500 other_type, 1 output_text, 499 other_type"""
    handler = LiteLLMResponsesTransformationHandler()
    content_list = [{"type": "other_type", "foo": i} for i in range(500)]
    content_list.append({"type": "output_text", "text": "middle_text"})
    content_list.extend([{"type": "other_type", "foo": i} for i in range(501, 1000)])
    item = {
        "type": "message",
        "content": content_list,
        "role": "assistant"
    }
    choice, new_index = handler._handle_raw_dict_response_item(item, 0) # 61.7μs -> 57.8μs (6.62% faster)
    expected_msg = Message(role="assistant", content="middle_text")
    expected_choice = Choices(message=expected_msg, finish_reason="stop", index=0)
# codeflash_output is used to check that the output of the original code is the same as that of the optimized code.
#------------------------------------------------
from litellm.completion_extras.litellm_responses_transformation.transformation import LiteLLMResponsesTransformationHandler

def test_LiteLLMResponsesTransformationHandler__handle_raw_dict_response_item():
    LiteLLMResponsesTransformationHandler._handle_raw_dict_response_item(LiteLLMResponsesTransformationHandler(), {'type': 'message'}, 0)

def test_LiteLLMResponsesTransformationHandler__handle_raw_dict_response_item_2():
    LiteLLMResponsesTransformationHandler._handle_raw_dict_response_item(LiteLLMResponsesTransformationHandler(), {'type': 'reasoning'}, 0)
🔎 Concolic Coverage Tests and Runtime
Test File::Test Function Original ⏱️ Optimized ⏱️ Speedup
codeflash_concolic_zbim32de/tmpoixl8vne/test_concolic_coverage.py::test_LiteLLMResponsesTransformationHandler__handle_raw_dict_response_item 3.38μs 1.03μs 227%✅
codeflash_concolic_zbim32de/tmpoixl8vne/test_concolic_coverage.py::test_LiteLLMResponsesTransformationHandler__handle_raw_dict_response_item_2 2.19μs 642ns 242%✅

To edit these changes git checkout codeflash/optimize-LiteLLMResponsesTransformationHandler._handle_raw_dict_response_item-mhdlbeai and push.

Codeflash Static Badge

…nse_item

The optimization achieves a **5% speedup** by moving expensive imports out of the hot path and consolidating conditional checks.

**Key optimizations:**

1. **Import-on-demand**: The `from litellm.types.utils import Choices, Message` statement is moved inside the conditional block where it's actually needed, rather than being executed on every function call. This eliminates unnecessary import overhead for cases that don't create Choice objects (like "reasoning" types or messages without output_text).

2. **Consolidated conditional checking**: The original code performed two separate operations - `isinstance(content_item, dict)` followed by `content_item.get("type")` - for every content item. The optimized version combines these into a single compound conditional `isinstance(content_item, dict) and content_item.get("type") == "output_text"`, reducing redundant attribute lookups.

**Performance benefits by test case:**
- **Best gains** on edge cases that exit early: reasoning types (216-302% faster), empty content lists (153% faster), and non-dict content items (133% faster) benefit most since they avoid the expensive import entirely
- **Moderate gains** on large-scale tests with many non-output_text items (3.78-6.62% faster) due to the consolidated conditional checks reducing iteration overhead
- **Minimal impact** on successful output_text cases (0.5-3% variance) since the import still occurs, but is deferred until actually needed

The optimization is particularly effective for workloads with many "reasoning" items, empty content, or large content lists with few actual output_text items, while maintaining identical behavior for all valid cases.
@codeflash-ai codeflash-ai bot requested a review from mashraf-222 October 30, 2025 15:39
@codeflash-ai codeflash-ai bot added ⚡️ codeflash Optimization PR opened by Codeflash AI 🎯 Quality: High Optimization Quality according to Codeflash labels Oct 30, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

⚡️ codeflash Optimization PR opened by Codeflash AI 🎯 Quality: High Optimization Quality according to Codeflash

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant