Skip to content

Conversation

@codeflash-ai
Copy link

@codeflash-ai codeflash-ai bot commented Oct 30, 2025

📄 11% (0.11x) speedup for BaseLLMHTTPHandler._add_stream_param_to_request_body in litellm/llms/custom_httpx/llm_http_handler.py

⏱️ Runtime : 45.3 microseconds 41.0 microseconds (best of 351 runs)

📝 Explanation and details

The optimized code achieves a 10% speedup by eliminating unnecessary dictionary operations when fake_stream=True and the "stream" key doesn't exist in the data.

Key optimizations:

  1. Conditional dictionary copying: The original code always called data.copy() and pop("stream", None) when fake_stream=True. The optimized version first checks if "stream" exists in the data before performing these expensive operations. If the key doesn't exist, it simply returns the original dictionary unchanged.

  2. More efficient key deletion: When the "stream" key does exist, the optimization uses del new_data["stream"] instead of pop("stream", None), which is slightly faster since we already know the key exists.

  3. Removed explicit is True comparisons: Changed fake_stream is True to just fake_stream for cleaner, more Pythonic code.

Performance impact by test case:

  • Best gains (28-45% faster): Cases where fake_stream=True but no "stream" key exists in the data
  • Moderate gains (7-14% faster): Cases where fake_stream=True and "stream" key needs to be removed
  • Large dictionary edge case: One test with 1000+ keys showed 483% improvement when avoiding unnecessary copying

The optimization is particularly effective for scenarios where fake_stream=True is frequently called on data dictionaries that don't contain a "stream" parameter, avoiding the overhead of dictionary copying and key removal operations entirely.

Correctness verification report:

Test Status
⚙️ Existing Unit Tests 🔘 None Found
🌀 Generated Regression Tests 84 Passed
⏪ Replay Tests 🔘 None Found
🔎 Concolic Coverage Tests 4 Passed
📊 Tests Coverage 100.0%
🌀 Generated Regression Tests and Runtime
import pytest
from litellm.llms.custom_httpx.llm_http_handler import BaseLLMHTTPHandler


# function to test
class BaseConfig:
    def __init__(self, supports_stream_param_in_request_body: bool):
        self.supports_stream_param_in_request_body = supports_stream_param_in_request_body
from litellm.llms.custom_httpx.llm_http_handler import BaseLLMHTTPHandler

# unit tests

@pytest.fixture
def handler():
    """Fixture to create a handler instance for reuse."""
    return BaseLLMHTTPHandler()

# -----------------------------
# 1. Basic Test Cases
# -----------------------------

def test_stream_param_added_when_supported_and_not_fake_stream(handler):
    """Stream param should be added when supported and fake_stream is False."""
    data = {"foo": "bar"}
    config = BaseConfig(supports_stream_param_in_request_body=True)
    codeflash_output = handler._add_stream_param_to_request_body(data.copy(), config, fake_stream=False); result = codeflash_output # 719ns -> 725ns (0.828% slower)

def test_stream_param_not_added_when_not_supported_and_not_fake_stream(handler):
    """Stream param should NOT be added when not supported and fake_stream is False."""
    data = {"foo": "bar"}
    config = BaseConfig(supports_stream_param_in_request_body=False)
    codeflash_output = handler._add_stream_param_to_request_body(data.copy(), config, fake_stream=False); result = codeflash_output # 681ns -> 668ns (1.95% faster)

def test_stream_param_removed_when_fake_stream_true(handler):
    """Stream param should be removed if fake_stream is True."""
    data = {"foo": "bar", "stream": True}
    config = BaseConfig(supports_stream_param_in_request_body=True)
    codeflash_output = handler._add_stream_param_to_request_body(data.copy(), config, fake_stream=True); result = codeflash_output # 881ns -> 818ns (7.70% faster)

def test_stream_param_removed_when_fake_stream_true_and_not_present(handler):
    """Should not fail if 'stream' is not present and fake_stream is True."""
    data = {"foo": "bar"}
    config = BaseConfig(supports_stream_param_in_request_body=True)
    codeflash_output = handler._add_stream_param_to_request_body(data.copy(), config, fake_stream=True); result = codeflash_output # 785ns -> 577ns (36.0% faster)

def test_stream_param_added_overwrites_existing_value(handler):
    """Should overwrite existing 'stream' value if supported and not fake_stream."""
    data = {"foo": "bar", "stream": False}
    config = BaseConfig(supports_stream_param_in_request_body=True)
    codeflash_output = handler._add_stream_param_to_request_body(data.copy(), config, fake_stream=False); result = codeflash_output # 667ns -> 677ns (1.48% slower)

# -----------------------------
# 2. Edge Test Cases
# -----------------------------

def test_data_is_empty_dict(handler):
    """Should handle empty data dict gracefully."""
    data = {}
    config = BaseConfig(supports_stream_param_in_request_body=True)
    codeflash_output = handler._add_stream_param_to_request_body(data.copy(), config, fake_stream=False); result = codeflash_output # 719ns -> 717ns (0.279% faster)

def test_data_is_empty_dict_fake_stream(handler):
    """Should handle empty data dict with fake_stream gracefully."""
    data = {}
    config = BaseConfig(supports_stream_param_in_request_body=True)
    codeflash_output = handler._add_stream_param_to_request_body(data.copy(), config, fake_stream=True); result = codeflash_output # 793ns -> 618ns (28.3% faster)

def test_stream_param_is_none(handler):
    """Should remove 'stream' even if value is None when fake_stream is True."""
    data = {"foo": "bar", "stream": None}
    config = BaseConfig(supports_stream_param_in_request_body=True)
    codeflash_output = handler._add_stream_param_to_request_body(data.copy(), config, fake_stream=True); result = codeflash_output # 899ns -> 792ns (13.5% faster)

def test_stream_param_is_false_and_supported(handler):
    """Should set 'stream' to True even if originally False, if supported."""
    data = {"foo": "bar", "stream": False}
    config = BaseConfig(supports_stream_param_in_request_body=True)
    codeflash_output = handler._add_stream_param_to_request_body(data.copy(), config, fake_stream=False); result = codeflash_output # 705ns -> 684ns (3.07% faster)

def test_stream_param_is_false_and_not_supported(handler):
    """Should leave 'stream' untouched if not supported and not fake_stream."""
    data = {"foo": "bar", "stream": False}
    config = BaseConfig(supports_stream_param_in_request_body=False)
    codeflash_output = handler._add_stream_param_to_request_body(data.copy(), config, fake_stream=False); result = codeflash_output # 572ns -> 591ns (3.21% slower)

def test_provider_config_supports_stream_param_is_none(handler):
    """Should not add 'stream' if supports_stream_param_in_request_body is None."""
    class DummyConfig:
        supports_stream_param_in_request_body = None
    data = {"foo": "bar"}
    config = DummyConfig()
    codeflash_output = handler._add_stream_param_to_request_body(data.copy(), config, fake_stream=False); result = codeflash_output # 899ns -> 818ns (9.90% faster)

def test_provider_config_supports_stream_param_is_false(handler):
    """Should not add 'stream' if supports_stream_param_in_request_body is False."""
    data = {"foo": "bar"}
    config = BaseConfig(supports_stream_param_in_request_body=False)
    codeflash_output = handler._add_stream_param_to_request_body(data.copy(), config, fake_stream=False); result = codeflash_output # 642ns -> 568ns (13.0% faster)

def test_data_is_not_mutated_when_fake_stream_true(handler):
    """Original data dict should not be mutated when fake_stream is True."""
    data = {"foo": "bar", "stream": True}
    config = BaseConfig(supports_stream_param_in_request_body=True)
    orig_data = data.copy()
    handler._add_stream_param_to_request_body(data, config, fake_stream=True) # 805ns -> 810ns (0.617% slower)

def test_data_is_mutated_when_stream_param_added(handler):
    """Original data dict should be mutated when stream param is added."""
    data = {"foo": "bar"}
    config = BaseConfig(supports_stream_param_in_request_body=True)
    handler._add_stream_param_to_request_body(data, config, fake_stream=False) # 673ns -> 670ns (0.448% faster)

def test_data_is_mutated_when_stream_param_overwritten(handler):
    """Original data dict should be mutated when stream param is overwritten."""
    data = {"foo": "bar", "stream": False}
    config = BaseConfig(supports_stream_param_in_request_body=True)
    handler._add_stream_param_to_request_body(data, config, fake_stream=False) # 740ns -> 689ns (7.40% faster)

def test_data_is_not_mutated_when_stream_param_not_added(handler):
    """Original data dict should not be mutated when stream param is not added."""
    data = {"foo": "bar"}
    config = BaseConfig(supports_stream_param_in_request_body=False)
    orig_data = data.copy()
    handler._add_stream_param_to_request_body(data, config, fake_stream=False) # 640ns -> 617ns (3.73% faster)

def test_data_is_not_mutated_when_stream_param_removed(handler):
    """Original data dict should not be mutated when stream param is removed."""
    data = {"foo": "bar", "stream": True}
    config = BaseConfig(supports_stream_param_in_request_body=True)
    orig_data = data.copy()
    handler._add_stream_param_to_request_body(data, config, fake_stream=True) # 832ns -> 824ns (0.971% faster)

# -----------------------------
# 3. Large Scale Test Cases
# -----------------------------

def test_large_data_dict_stream_added(handler):
    """Should add 'stream' to large data dict if supported."""
    data = {f"key_{i}": i for i in range(999)}
    config = BaseConfig(supports_stream_param_in_request_body=True)
    codeflash_output = handler._add_stream_param_to_request_body(data.copy(), config, fake_stream=False); result = codeflash_output # 767ns -> 762ns (0.656% faster)
    for i in range(999):
        pass

def test_large_data_dict_stream_removed(handler):
    """Should remove 'stream' from large data dict if fake_stream is True."""
    data = {f"key_{i}": i for i in range(999)}
    data["stream"] = True
    config = BaseConfig(supports_stream_param_in_request_body=True)
    codeflash_output = handler._add_stream_param_to_request_body(data.copy(), config, fake_stream=True); result = codeflash_output # 4.21μs -> 4.46μs (5.47% slower)
    for i in range(999):
        pass

def test_large_data_dict_stream_not_added_if_not_supported(handler):
    """Should NOT add 'stream' to large data dict if not supported."""
    data = {f"key_{i}": i for i in range(999)}
    config = BaseConfig(supports_stream_param_in_request_body=False)
    codeflash_output = handler._add_stream_param_to_request_body(data.copy(), config, fake_stream=False); result = codeflash_output # 650ns -> 639ns (1.72% faster)
    for i in range(999):
        pass

def test_large_data_dict_stream_overwrite(handler):
    """Should overwrite 'stream' value in large data dict if supported and not fake_stream."""
    data = {f"key_{i}": i for i in range(999)}
    data["stream"] = False
    config = BaseConfig(supports_stream_param_in_request_body=True)
    codeflash_output = handler._add_stream_param_to_request_body(data.copy(), config, fake_stream=False); result = codeflash_output # 751ns -> 770ns (2.47% slower)
    for i in range(999):
        pass

def test_large_data_dict_stream_removed_even_if_none(handler):
    """Should remove 'stream' even if value is None in large data dict when fake_stream is True."""
    data = {f"key_{i}": i for i in range(999)}
    data["stream"] = None
    config = BaseConfig(supports_stream_param_in_request_body=True)
    codeflash_output = handler._add_stream_param_to_request_body(data.copy(), config, fake_stream=True); result = codeflash_output # 3.97μs -> 4.04μs (1.71% slower)
    for i in range(999):
        pass
# codeflash_output is used to check that the output of the original code is the same as that of the optimized code.
#------------------------------------------------
import pytest
from litellm.llms.custom_httpx.llm_http_handler import BaseLLMHTTPHandler


# function to test
class BaseConfig:
    def __init__(self, supports_stream_param_in_request_body: bool):
        self.supports_stream_param_in_request_body = supports_stream_param_in_request_body
from litellm.llms.custom_httpx.llm_http_handler import BaseLLMHTTPHandler

# unit tests

@pytest.fixture
def handler():
    """Fixture to create the handler instance."""
    return BaseLLMHTTPHandler()

# 1. Basic Test Cases

def test_stream_param_added_when_supported_and_not_fake(handler):
    """Test that 'stream' param is added when supported and fake_stream is False."""
    data = {"foo": "bar"}
    config = BaseConfig(supports_stream_param_in_request_body=True)
    codeflash_output = handler._add_stream_param_to_request_body(data.copy(), config, fake_stream=False); result = codeflash_output # 751ns -> 758ns (0.923% slower)

def test_stream_param_not_added_when_not_supported(handler):
    """Test that 'stream' param is not added when not supported."""
    data = {"foo": "bar"}
    config = BaseConfig(supports_stream_param_in_request_body=False)
    codeflash_output = handler._add_stream_param_to_request_body(data.copy(), config, fake_stream=False); result = codeflash_output # 638ns -> 623ns (2.41% faster)

def test_stream_param_removed_when_fake_stream_true(handler):
    """Test that 'stream' param is removed when fake_stream is True."""
    data = {"foo": "bar", "stream": True}
    config = BaseConfig(supports_stream_param_in_request_body=True)
    codeflash_output = handler._add_stream_param_to_request_body(data, config, fake_stream=True); result = codeflash_output # 886ns -> 825ns (7.39% faster)

def test_stream_param_removed_when_fake_stream_true_and_not_present(handler):
    """Test that nothing breaks if 'stream' param is not present and fake_stream is True."""
    data = {"foo": "bar"}
    config = BaseConfig(supports_stream_param_in_request_body=True)
    codeflash_output = handler._add_stream_param_to_request_body(data, config, fake_stream=True); result = codeflash_output # 796ns -> 550ns (44.7% faster)

def test_stream_param_added_overwrites_existing_value(handler):
    """Test that 'stream' param is overwritten if already present and supported."""
    data = {"foo": "bar", "stream": False}
    config = BaseConfig(supports_stream_param_in_request_body=True)
    codeflash_output = handler._add_stream_param_to_request_body(data, config, fake_stream=False); result = codeflash_output # 729ns -> 743ns (1.88% slower)

# 2. Edge Test Cases

def test_data_is_empty_dict(handler):
    """Test with empty data dict."""
    data = {}
    config = BaseConfig(supports_stream_param_in_request_body=True)
    codeflash_output = handler._add_stream_param_to_request_body(data.copy(), config, fake_stream=False); result = codeflash_output # 744ns -> 756ns (1.59% slower)

def test_data_is_empty_dict_fake_stream(handler):
    """Test with empty data dict and fake_stream True."""
    data = {}
    config = BaseConfig(supports_stream_param_in_request_body=True)
    codeflash_output = handler._add_stream_param_to_request_body(data.copy(), config, fake_stream=True); result = codeflash_output # 790ns -> 630ns (25.4% faster)

def test_stream_param_is_none(handler):
    """Test with 'stream' param set to None."""
    data = {"stream": None, "foo": "bar"}
    config = BaseConfig(supports_stream_param_in_request_body=True)
    codeflash_output = handler._add_stream_param_to_request_body(data.copy(), config, fake_stream=False); result = codeflash_output # 724ns -> 720ns (0.556% faster)

def test_stream_param_is_string(handler):
    """Test with 'stream' param set to a string."""
    data = {"stream": "yes", "foo": "bar"}
    config = BaseConfig(supports_stream_param_in_request_body=True)
    codeflash_output = handler._add_stream_param_to_request_body(data.copy(), config, fake_stream=False); result = codeflash_output # 701ns -> 672ns (4.32% faster)

def test_provider_config_supports_stream_param_is_none(handler):
    """Test with provider_config.supports_stream_param_in_request_body = None."""
    data = {"foo": "bar"}
    config = BaseConfig(supports_stream_param_in_request_body=None)
    codeflash_output = handler._add_stream_param_to_request_body(data.copy(), config, fake_stream=False); result = codeflash_output # 629ns -> 633ns (0.632% slower)

def test_provider_config_supports_stream_param_is_false_and_fake_stream_true(handler):
    """Test with supports_stream_param_in_request_body False and fake_stream True."""
    data = {"foo": "bar", "stream": True}
    config = BaseConfig(supports_stream_param_in_request_body=False)
    codeflash_output = handler._add_stream_param_to_request_body(data, config, fake_stream=True); result = codeflash_output # 909ns -> 844ns (7.70% faster)

def test_provider_config_supports_stream_param_is_false_and_stream_not_in_data(handler):
    """Test with supports_stream_param_in_request_body False and no stream in data."""
    data = {"foo": "bar"}
    config = BaseConfig(supports_stream_param_in_request_body=False)
    codeflash_output = handler._add_stream_param_to_request_body(data, config, fake_stream=True); result = codeflash_output # 884ns -> 625ns (41.4% faster)

def test_data_is_not_mutated_when_fake_stream_true(handler):
    """Test that original data is not mutated when fake_stream is True."""
    data = {"foo": "bar", "stream": True}
    config = BaseConfig(supports_stream_param_in_request_body=True)
    original_data = data.copy()
    codeflash_output = handler._add_stream_param_to_request_body(data, config, fake_stream=True); _ = codeflash_output # 827ns -> 803ns (2.99% faster)

def test_data_is_mutated_when_stream_param_added(handler):
    """Test that original data is mutated when stream param is added."""
    data = {"foo": "bar"}
    config = BaseConfig(supports_stream_param_in_request_body=True)
    codeflash_output = handler._add_stream_param_to_request_body(data, config, fake_stream=False); _ = codeflash_output # 695ns -> 671ns (3.58% faster)

def test_data_is_not_mutated_when_stream_param_not_added(handler):
    """Test that original data is not mutated when stream param is not added."""
    data = {"foo": "bar"}
    config = BaseConfig(supports_stream_param_in_request_body=False)
    original_data = data.copy()
    codeflash_output = handler._add_stream_param_to_request_body(data, config, fake_stream=False); _ = codeflash_output # 620ns -> 540ns (14.8% faster)

def test_stream_param_removed_when_fake_stream_true_and_stream_is_false(handler):
    """Test that 'stream' param is removed even if its value is False when fake_stream is True."""
    data = {"foo": "bar", "stream": False}
    config = BaseConfig(supports_stream_param_in_request_body=True)
    codeflash_output = handler._add_stream_param_to_request_body(data, config, fake_stream=True); result = codeflash_output # 860ns -> 815ns (5.52% faster)

# 3. Large Scale Test Cases

def test_large_data_dict_with_stream_param_added(handler):
    """Test with large data dict and stream param added."""
    data = {f"key_{i}": i for i in range(1000)}
    config = BaseConfig(supports_stream_param_in_request_body=True)
    codeflash_output = handler._add_stream_param_to_request_body(data.copy(), config, fake_stream=False); result = codeflash_output # 771ns -> 716ns (7.68% faster)
    for i in range(1000):
        pass

def test_large_data_dict_with_stream_param_removed(handler):
    """Test with large data dict and stream param removed by fake_stream."""
    data = {f"key_{i}": i for i in range(1000)}
    data["stream"] = True
    config = BaseConfig(supports_stream_param_in_request_body=True)
    codeflash_output = handler._add_stream_param_to_request_body(data, config, fake_stream=True); result = codeflash_output # 3.61μs -> 3.67μs (1.53% slower)
    for i in range(1000):
        pass

def test_large_data_dict_with_no_stream_param_and_fake_stream(handler):
    """Test with large data dict and no stream param, fake_stream True."""
    data = {f"key_{i}": i for i in range(1000)}
    config = BaseConfig(supports_stream_param_in_request_body=True)
    codeflash_output = handler._add_stream_param_to_request_body(data, config, fake_stream=True); result = codeflash_output # 3.47μs -> 596ns (483% faster)
    for i in range(1000):
        pass

def test_large_data_dict_with_stream_param_not_supported(handler):
    """Test with large data dict and stream param not supported."""
    data = {f"key_{i}": i for i in range(1000)}
    config = BaseConfig(supports_stream_param_in_request_body=False)
    codeflash_output = handler._add_stream_param_to_request_body(data.copy(), config, fake_stream=False); result = codeflash_output # 702ns -> 707ns (0.707% slower)
    for i in range(1000):
        pass
# codeflash_output is used to check that the output of the original code is the same as that of the optimized code.
#------------------------------------------------
from litellm.llms.bedrock.chat.invoke_transformations.amazon_nova_transformation import AmazonInvokeNovaConfig
from litellm.llms.custom_httpx.llm_http_handler import BaseLLMHTTPHandler
from litellm.llms.triton.completion.transformation import TritonConfig

def test_BaseLLMHTTPHandler__add_stream_param_to_request_body():
    BaseLLMHTTPHandler._add_stream_param_to_request_body(BaseLLMHTTPHandler(), {}, AmazonInvokeNovaConfig(), True)

def test_BaseLLMHTTPHandler__add_stream_param_to_request_body_2():
    BaseLLMHTTPHandler._add_stream_param_to_request_body(BaseLLMHTTPHandler(), {}, TritonConfig(), False)
🔎 Concolic Coverage Tests and Runtime
Test File::Test Function Original ⏱️ Optimized ⏱️ Speedup
codeflash_concolic_zbim32de/tmp8v1k5mhx/test_concolic_coverage.py::test_BaseLLMHTTPHandler__add_stream_param_to_request_body 633ns 430ns 47.2%✅
codeflash_concolic_zbim32de/tmp8v1k5mhx/test_concolic_coverage.py::test_BaseLLMHTTPHandler__add_stream_param_to_request_body_2 966ns 1.10μs -12.2%⚠️

To edit these changes git checkout codeflash/optimize-BaseLLMHTTPHandler._add_stream_param_to_request_body-mhdq4lh8 and push.

Codeflash

The optimized code achieves a **10% speedup** by eliminating unnecessary dictionary operations when `fake_stream=True` and the `"stream"` key doesn't exist in the data.

**Key optimizations:**

1. **Conditional dictionary copying**: The original code always called `data.copy()` and `pop("stream", None)` when `fake_stream=True`. The optimized version first checks if `"stream"` exists in the data before performing these expensive operations. If the key doesn't exist, it simply returns the original dictionary unchanged.

2. **More efficient key deletion**: When the `"stream"` key does exist, the optimization uses `del new_data["stream"]` instead of `pop("stream", None)`, which is slightly faster since we already know the key exists.

3. **Removed explicit `is True` comparisons**: Changed `fake_stream is True` to just `fake_stream` for cleaner, more Pythonic code.

**Performance impact by test case:**
- **Best gains** (28-45% faster): Cases where `fake_stream=True` but no `"stream"` key exists in the data
- **Moderate gains** (7-14% faster): Cases where `fake_stream=True` and `"stream"` key needs to be removed
- **Large dictionary edge case**: One test with 1000+ keys showed 483% improvement when avoiding unnecessary copying

The optimization is particularly effective for scenarios where `fake_stream=True` is frequently called on data dictionaries that don't contain a `"stream"` parameter, avoiding the overhead of dictionary copying and key removal operations entirely.
@codeflash-ai codeflash-ai bot requested a review from mashraf-222 October 30, 2025 17:54
@codeflash-ai codeflash-ai bot added the ⚡️ codeflash Optimization PR opened by Codeflash AI label Oct 30, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

⚡️ codeflash Optimization PR opened by Codeflash AI

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant