Skip to content

Conversation

@codeflash-ai
Copy link

@codeflash-ai codeflash-ai bot commented Oct 30, 2025

📄 30% (0.30x) speedup for PredibaseChatCompletion.output_parser in litellm/llms/predibase/chat/handler.py

⏱️ Runtime : 4.64 microseconds 3.57 microseconds (best of 509 runs)

📝 Explanation and details

The optimized code achieves a 29% speedup through three key improvements:

1. Eliminated expensive string reversal operations: The original code used generated_text[::-1].replace(token[::-1], "", 1)[::-1] to remove tokens from the end, which creates multiple temporary strings. The optimized version uses simple slicing generated_text[:-len(token)], which is much more efficient.

2. Moved .strip() outside the loop: Instead of calling generated_text.strip() on every iteration when checking startswith(), the optimized code strips once before the loop, eliminating redundant whitespace removal operations.

3. Replaced .replace() with slicing: For start token removal, generated_text.replace(token, "", 1) scans the entire string, while generated_text[len(token):] directly slices without searching.

4. Minor optimization: Changed the data structure from list to tuple for chat_template_tokens, providing slight memory and iteration improvements.

The line profiler shows the most dramatic improvement on line 17 (end token removal), dropping from 48,927ns to 17,568ns per hit - a 64% reduction. This optimization is particularly effective for text processing scenarios with tokens at string boundaries, as shown in the test cases involving <|assistant|>, <|system|>, and other ChatML tokens at the start/end of generated text.

Correctness verification report:

Test Status
⚙️ Existing Unit Tests 🔘 None Found
🌀 Generated Regression Tests 32 Passed
⏪ Replay Tests 🔘 None Found
🔎 Concolic Coverage Tests 4 Passed
📊 Tests Coverage 100.0%
🌀 Generated Regression Tests and Runtime
import pytest  # used for our unit tests
from litellm.llms.predibase.chat.handler import PredibaseChatCompletion

# unit tests

@pytest.fixture
def parser():
    # Fixture to create a PredibaseChatCompletion instance
    return PredibaseChatCompletion().output_parser

# ------------------------------
# Large Scale Test Cases
# ------------------------------

def test_long_text_with_tokens(parser):
    # Test a long string with tokens at the start and end
    long_text = "<|assistant|>" + "A" * 995 + "<|assistant|>"
    expected = "A" * 995

def test_long_text_no_tokens(parser):
    # Test a long string with no tokens
    long_text = "B" * 1000

def test_many_tokens_in_text(parser):
    # Test a string with many tokens scattered throughout, only first start and last end removed
    text = "<|assistant|>" + ("Hello!<|user|>" * 100) + "<|assistant|>"
    expected = ("Hello!<|user|>" * 100)

def test_all_tokens_start_and_end(parser):
    # Test a string that starts and ends with all tokens in succession
    text = "<|assistant|><|system|><|user|><s></s>Some message<|assistant|><|system|><|user|><s></s>"
    # Only the first and last occurrence of each token should be removed
    # But since all tokens are at start and end, all should be removed
    expected = "Some message"

def test_large_scale_mixed_tokens(parser):
    # Test a string with 1000 tokens interleaved, only first and last tokens removed
    tokens = ["<|assistant|>", "<|system|>", "<|user|>", "<s>", "</s>"]
    # Create a string with 995 'word' and 5 tokens at start/end
    s = "".join(tokens) + " ".join(["word"] * 990) + "".join(tokens)
    expected = " ".join(["word"] * 990)

# ------------------------------
# Determinism and Robustness
# ------------------------------

def test_idempotency(parser):
    # Test that parsing twice does not change the output further
    s = "<|assistant|>Hello!<|assistant|>"
    once = parser(s)
    twice = parser(once)
# codeflash_output is used to check that the output of the original code is the same as that of the optimized code.
#------------------------------------------------
import pytest  # used for our unit tests
from litellm.llms.predibase.chat.handler import PredibaseChatCompletion

# unit tests

@pytest.fixture
def parser():
    # Fixture to create a parser instance for reuse
    return PredibaseChatCompletion().output_parser

# === BASIC TEST CASES ===

def test_basic_no_tokens(parser):
    # Test with a string that does not contain any tokens
    input_text = "Hello, how are you?"
    output = parser(input_text)

def test_basic_start_token(parser):
    # Test with a string that starts with a token
    input_text = "<|assistant|>Hello, how are you?"
    output = parser(input_text)

def test_basic_end_token(parser):
    # Test with a string that ends with a token
    input_text = "Hello, how are you?<|assistant|>"
    output = parser(input_text)

def test_basic_start_and_end_token(parser):
    # Test with a string that starts and ends with a token
    input_text = "<|assistant|>Hello, how are you?<|assistant|>"
    output = parser(input_text)

def test_basic_multiple_tokens(parser):
    # Test with a string that has multiple tokens at start and end
    input_text = "<|system|><|assistant|>Hello, how are you?<|user|><|assistant|>"
    output = parser(input_text)
    # Only the first start and last end tokens should be removed in each pass
    expected = "<|system|>Hello, how are you><|user|>"

# === EDGE TEST CASES ===

def test_edge_only_token(parser):
    # Test with a string that is exactly a token
    input_text = "<|user|>"
    output = parser(input_text)

def test_edge_token_with_whitespace(parser):
    # Test with a token surrounded by whitespace
    input_text = "   <|assistant|>   "
    output = parser(input_text)

def test_edge_token_in_middle(parser):
    # Token appears in the middle, should not be removed
    input_text = "Hello <|assistant|> how are you?"
    output = parser(input_text)

def test_edge_token_case_sensitivity(parser):
    # Token with different case should not be removed
    input_text = "<|Assistant|>Hello"
    output = parser(input_text)

def test_edge_token_with_extra_characters(parser):
    # Token with extra characters should not be removed
    input_text = "<|assistant|>extraHello</s>"
    output = parser(input_text)

def test_edge_token_with_leading_and_trailing_whitespace(parser):
    # Token with whitespace before and after
    input_text = "   <|assistant|>Hello World</s>   "
    output = parser(input_text)

def test_edge_empty_string(parser):
    # Test with empty string
    input_text = ""
    output = parser(input_text)

def test_edge_only_whitespace(parser):
    # Test with string of only whitespace
    input_text = "     "
    output = parser(input_text)

def test_edge_token_substring(parser):
    # Token substring should not be removed
    input_text = "<|assistant|Hello|assistant|>"
    output = parser(input_text)

def test_edge_multiple_different_tokens(parser):
    # Multiple different tokens at start and end
    input_text = "<|system|><|user|>Hi there!</s></s>"
    output = parser(input_text)
    # Should remove <|system|> from start and </s> from end
    expected = "<|user|>Hi there!</s>"

def test_edge_token_with_newline(parser):
    # Token at start/end with newlines
    input_text = "<|assistant|>\nHello World\n</s>"
    output = parser(input_text)

# === LARGE SCALE TEST CASES ===

def test_large_scale_long_text_with_tokens(parser):
    # Large input with tokens at start and end
    text = "<|assistant|>" + "A" * 900 + "</s>"
    output = parser(text)

def test_large_scale_many_tokens(parser):
    # Input with many tokens interspersed, only start/end should be removed
    middle = "<|user|>" * 400
    text = "<|assistant|>" + middle + "</s>"
    output = parser(text)

def test_large_scale_token_at_start_only(parser):
    # Large text, token only at start
    text = "<s>" + "B" * 999
    output = parser(text)

def test_large_scale_token_at_end_only(parser):
    # Large text, token only at end
    text = "C" * 999 + "</s>"
    output = parser(text)

def test_large_scale_no_tokens(parser):
    # Large text, no tokens
    text = "D" * 1000
    output = parser(text)

def test_large_scale_tokens_with_whitespace(parser):
    # Large text with tokens and whitespace
    text = "   <|assistant|>" + "E" * 998 + "</s>   "
    output = parser(text)

def test_large_scale_tokens_with_newlines(parser):
    # Large text with tokens and newlines
    text = "<|system|>\n" + "F" * 995 + "\n</s>"
    output = parser(text)

def test_large_scale_tokens_not_at_edges(parser):
    # Large text with tokens only in the middle
    text = "G" * 500 + "<|assistant|>" + "H" * 499
    output = parser(text)

def test_large_scale_multiple_edge_tokens(parser):
    # Large text with multiple tokens at start/end
    text = "<|assistant|><s>" + "I" * 990 + "</s><|user|>"
    output = parser(text)
    # Should remove <|assistant|> and <s> from start, </s> and <|user|> from end, one per token per pass
    expected = "<s>" + "I" * 990 + "</s>"
    output2 = parser(text)

def test_large_scale_token_substrings(parser):
    # Large text with substrings similar to tokens, should not be removed
    text = "<|assistant|extra>" + "J" * 990 + "<sExtra>"
    output = parser(text)
# codeflash_output is used to check that the output of the original code is the same as that of the optimized code.
#------------------------------------------------
from litellm.llms.predibase.chat.handler import PredibaseChatCompletion

def test_PredibaseChatCompletion_output_parser():
    PredibaseChatCompletion.output_parser(PredibaseChatCompletion(), '<<s>')

def test_PredibaseChatCompletion_output_parser_2():
    PredibaseChatCompletion.output_parser(PredibaseChatCompletion(), '<s>')
🔎 Concolic Coverage Tests and Runtime
Test File::Test Function Original ⏱️ Optimized ⏱️ Speedup
codeflash_concolic_kt42dg31/tmpam36joxn/test_concolic_coverage.py::test_PredibaseChatCompletion_output_parser 2.71μs 1.88μs 44.2%✅
codeflash_concolic_kt42dg31/tmpam36joxn/test_concolic_coverage.py::test_PredibaseChatCompletion_output_parser_2 1.93μs 1.69μs 14.2%✅

To edit these changes git checkout codeflash/optimize-PredibaseChatCompletion.output_parser-mhdbrz9j and push.

Codeflash Static Badge

The optimized code achieves a **29% speedup** through three key improvements:

**1. Eliminated expensive string reversal operations**: The original code used `generated_text[::-1].replace(token[::-1], "", 1)[::-1]` to remove tokens from the end, which creates multiple temporary strings. The optimized version uses simple slicing `generated_text[:-len(token)]`, which is much more efficient.

**2. Moved `.strip()` outside the loop**: Instead of calling `generated_text.strip()` on every iteration when checking `startswith()`, the optimized code strips once before the loop, eliminating redundant whitespace removal operations.

**3. Replaced `.replace()` with slicing**: For start token removal, `generated_text.replace(token, "", 1)` scans the entire string, while `generated_text[len(token):]` directly slices without searching.

**4. Minor optimization**: Changed the data structure from list to tuple for `chat_template_tokens`, providing slight memory and iteration improvements.

The line profiler shows the most dramatic improvement on line 17 (end token removal), dropping from 48,927ns to 17,568ns per hit - a 64% reduction. This optimization is particularly effective for text processing scenarios with tokens at string boundaries, as shown in the test cases involving `<|assistant|>`, `<|system|>`, and other ChatML tokens at the start/end of generated text.
@codeflash-ai codeflash-ai bot requested a review from mashraf-222 October 30, 2025 11:12
@codeflash-ai codeflash-ai bot added ⚡️ codeflash Optimization PR opened by Codeflash AI 🎯 Quality: High Optimization Quality according to Codeflash labels Oct 30, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

⚡️ codeflash Optimization PR opened by Codeflash AI 🎯 Quality: High Optimization Quality according to Codeflash

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant