Skip to content

Conversation

@codeflash-ai
Copy link

@codeflash-ai codeflash-ai bot commented Oct 29, 2025

📄 10% (0.10x) speedup for to_alpha_numeric in panel/chat/utils.py

⏱️ Runtime : 570 microseconds 517 microseconds (best of 37 runs)

📝 Explanation and details

The optimization pre-compiles the regular expression pattern r"\W+" into a re.Pattern object stored as _non_alnum_pattern at module level, rather than compiling it on every function call.

Key change: Instead of calling re.sub(r"\W+", "", user) which compiles the regex pattern each time, the optimized version uses _non_alnum_pattern.sub("", user) with a pre-compiled pattern.

Why it's faster: Regular expression compilation is computationally expensive. The original code recompiles the \W+ pattern on every function call, while the optimized version compiles it once when the module loads and reuses the compiled pattern object. This eliminates the regex compilation overhead from the hot path.

Performance characteristics: The optimization shows consistent 10-50% speedups across all test cases, with the most significant gains on:

  • Small inputs (50%+ faster for single characters and short strings)
  • Simple patterns (40%+ faster for basic alphanumeric strings)
  • Edge cases like empty strings (67% faster)

The speedup is less pronounced but still meaningful for large inputs (3-11% faster), where the regex execution time dominates over compilation time. This optimization is particularly effective for functions called frequently with varied small inputs, which is typical for username sanitization use cases.

Correctness verification report:

Test Status
⚙️ Existing Unit Tests 🔘 None Found
🌀 Generated Regression Tests 80 Passed
⏪ Replay Tests 🔘 None Found
🔎 Concolic Coverage Tests 1 Passed
📊 Tests Coverage 100.0%
🌀 Generated Regression Tests and Runtime
from __future__ import annotations

import re

# imports
import pytest  # used for our unit tests
from panel.chat.utils import to_alpha_numeric

# unit tests

# ------------------------
# Basic Test Cases
# ------------------------

def test_basic_alphanumeric():
    # Only alphanumeric, should just lowercase
    codeflash_output = to_alpha_numeric("User123") # 3.10μs -> 2.20μs (41.1% faster)
    codeflash_output = to_alpha_numeric("HELLOworld") # 1.05μs -> 766ns (37.5% faster)
    codeflash_output = to_alpha_numeric("abcDEF456") # 835ns -> 545ns (53.2% faster)

def test_basic_with_spaces_and_symbols():
    # Spaces and symbols should be removed
    codeflash_output = to_alpha_numeric("John Doe!") # 3.52μs -> 2.73μs (28.7% faster)
    codeflash_output = to_alpha_numeric("A_B_C") # 1.09μs -> 781ns (39.1% faster)
    codeflash_output = to_alpha_numeric("Python@3.9") # 1.56μs -> 1.30μs (19.6% faster)
    codeflash_output = to_alpha_numeric("foo-bar") # 1.06μs -> 869ns (22.3% faster)
    codeflash_output = to_alpha_numeric("hello.world") # 1.07μs -> 824ns (29.6% faster)

def test_basic_mixed_case_and_numbers():
    # Mixed case and numbers, symbols removed
    codeflash_output = to_alpha_numeric("User_42!") # 2.96μs -> 2.11μs (40.4% faster)
    codeflash_output = to_alpha_numeric("123abcDEF") # 1.14μs -> 821ns (38.9% faster)

# ------------------------
# Edge Test Cases
# ------------------------

def test_empty_string():
    # Empty string should return empty string
    codeflash_output = to_alpha_numeric("") # 1.93μs -> 1.15μs (67.8% faster)

def test_only_symbols():
    # Only symbols should return empty string
    codeflash_output = to_alpha_numeric("!@#$%^&*()") # 2.78μs -> 1.91μs (45.2% faster)
    codeflash_output = to_alpha_numeric("___---") # 1.42μs -> 954ns (48.6% faster)
    codeflash_output = to_alpha_numeric(" . ") # 795ns -> 493ns (61.3% faster)

def test_only_spaces():
    # Only spaces should return empty string
    codeflash_output = to_alpha_numeric("   ") # 2.41μs -> 1.59μs (51.2% faster)

def test_unicode_letters_and_digits():
    # Unicode letters and digits should be retained if they are alphanumeric
    codeflash_output = to_alpha_numeric("Café123") # 3.11μs -> 2.38μs (30.5% faster)
    codeflash_output = to_alpha_numeric("你好123") # 1.96μs -> 1.70μs (15.5% faster)
    codeflash_output = to_alpha_numeric("résumé") # 969ns -> 657ns (47.5% faster)
    codeflash_output = to_alpha_numeric("Straße42") # 1.03μs -> 683ns (50.4% faster)

def test_unicode_symbols():
    # Unicode symbols should be removed
    codeflash_output = to_alpha_numeric("hello★world") # 3.82μs -> 2.98μs (28.5% faster)
    codeflash_output = to_alpha_numeric("smile😊face") # 2.12μs -> 1.96μs (8.64% faster)

def test_leading_and_trailing_symbols():
    # Leading/trailing symbols/spaces removed
    codeflash_output = to_alpha_numeric("!user!") # 3.23μs -> 2.48μs (30.3% faster)
    codeflash_output = to_alpha_numeric("  user  ") # 1.26μs -> 933ns (35.2% faster)
    codeflash_output = to_alpha_numeric("$$user$$") # 1.02μs -> 740ns (37.4% faster)

def test_numbers_only():
    # Numbers only should be retained
    codeflash_output = to_alpha_numeric("123456") # 2.48μs -> 1.89μs (30.8% faster)

def test_mixed_non_ascii_alphanumerics():
    # Non-ASCII alphanumerics retained, symbols removed
    codeflash_output = to_alpha_numeric("éèêëēėę123") # 3.30μs -> 2.56μs (28.8% faster)
    codeflash_output = to_alpha_numeric("добрый123") # 1.26μs -> 936ns (34.3% faster)

def test_newlines_and_tabs():
    # Newlines and tabs are non-alphanumeric and should be removed
    codeflash_output = to_alpha_numeric("user\nname") # 3.36μs -> 2.67μs (26.0% faster)
    codeflash_output = to_alpha_numeric("user\tname") # 1.08μs -> 767ns (41.1% faster)

def test_only_one_character():
    # Single alphanumeric character
    codeflash_output = to_alpha_numeric("A") # 2.44μs -> 1.59μs (53.7% faster)
    codeflash_output = to_alpha_numeric("1") # 902ns -> 647ns (39.4% faster)
    # Single symbol
    codeflash_output = to_alpha_numeric("_") # 800ns -> 547ns (46.3% faster)

def test_long_repeating_symbols():
    # Multiple consecutive symbols should be removed
    codeflash_output = to_alpha_numeric("user------name") # 3.42μs -> 2.64μs (29.5% faster)
    codeflash_output = to_alpha_numeric("user___name") # 1.19μs -> 933ns (28.0% faster)
    codeflash_output = to_alpha_numeric("user!!!name") # 932ns -> 730ns (27.7% faster)

def test_mixed_case_and_symbols():
    # Case is normalized, symbols removed
    codeflash_output = to_alpha_numeric("UsEr@#123") # 3.17μs -> 2.45μs (29.5% faster)

# ------------------------
# Large Scale Test Cases
# ------------------------

def test_large_input_all_alphanumeric():
    # Large input, all alphanumeric, should just lowercase
    s = "AbC123" * 150  # 900 characters
    expected = ("abc123" * 150)
    codeflash_output = to_alpha_numeric(s) # 11.5μs -> 10.8μs (7.11% faster)

def test_large_input_with_symbols():
    # Large input with symbols interleaved
    s = ("A!b@C#1$2%3^" * 100)  # 1200 characters, but only 600 alphanumerics
    expected = ("abc123" * 100)
    codeflash_output = to_alpha_numeric(s) # 54.5μs -> 54.4μs (0.145% faster)

def test_large_input_with_spaces():
    # Large input with spaces between every character
    s = " ".join("UserName123" * 80)  # 11*80=880 chars + 879 spaces
    expected = ("username123" * 80)
    codeflash_output = to_alpha_numeric(s) # 73.7μs -> 73.5μs (0.230% faster)

def test_large_input_only_symbols():
    # Large input, only symbols, should return empty string
    s = "!@#$%^&*" * 120  # 960 characters
    codeflash_output = to_alpha_numeric(s) # 9.00μs -> 8.15μs (10.4% faster)

def test_large_input_unicode():
    # Large input, unicode alphanumerics and symbols
    s = ("Straße★42😊" * 80)  # 8*80=640 chars
    expected = ("straße42" * 80)
    codeflash_output = to_alpha_numeric(s) # 31.7μs -> 31.3μs (1.27% faster)

def test_large_input_mixed():
    # Large input, mixture of everything
    s = ("  AbC123!@#\n" * 80)  # 11*80=880 chars
    expected = ("abc123" * 80)
    codeflash_output = to_alpha_numeric(s) # 18.8μs -> 18.0μs (4.24% faster)

# ------------------------
# Mutation-sensitive cases
# ------------------------

def test_mutation_sensitive_case_symbol_between_digits():
    # If function fails to remove symbol between digits, this will catch it
    codeflash_output = to_alpha_numeric("1_2_3") # 2.67μs -> 1.79μs (49.6% faster)

def test_mutation_sensitive_case_case_normalization():
    # If function fails to lowercase, this will catch it
    codeflash_output = to_alpha_numeric("ABCdef") # 2.58μs -> 1.91μs (35.2% faster)

def test_mutation_sensitive_case_symbol_at_end():
    # If function fails to remove trailing symbol
    codeflash_output = to_alpha_numeric("user!") # 2.97μs -> 2.21μs (34.4% faster)

def test_mutation_sensitive_case_symbol_at_start():
    # If function fails to remove leading symbol
    codeflash_output = to_alpha_numeric("!user") # 3.08μs -> 2.53μs (21.9% faster)

def test_mutation_sensitive_case_multiple_symbol_types():
    # If function fails to remove all symbol types
    codeflash_output = to_alpha_numeric("u$e_r#n@a!m%e") # 4.00μs -> 3.16μs (26.6% faster)

def test_mutation_sensitive_case_non_ascii_symbol():
    # If function fails to remove non-ascii symbol
    codeflash_output = to_alpha_numeric("user☆name") # 4.12μs -> 3.17μs (30.0% faster)
# codeflash_output is used to check that the output of the original code is the same as that of the optimized code.
#------------------------------------------------
from __future__ import annotations

import re

# imports
import pytest  # used for our unit tests
from panel.chat.utils import to_alpha_numeric

# unit tests

# 1. Basic Test Cases

def test_basic_alpha_numeric_lowercase():
    # Should return the same lowercase string if already alphanumeric
    codeflash_output = to_alpha_numeric("john123") # 3.35μs -> 2.46μs (36.4% faster)

def test_basic_alpha_numeric_uppercase():
    # Should convert uppercase to lowercase
    codeflash_output = to_alpha_numeric("JOHN123") # 2.81μs -> 2.11μs (33.6% faster)

def test_basic_with_spaces():
    # Should remove spaces and lowercase
    codeflash_output = to_alpha_numeric("John Doe 123") # 3.76μs -> 2.98μs (26.4% faster)

def test_basic_with_underscore_and_dash():
    # Should remove underscores and dashes
    codeflash_output = to_alpha_numeric("John_Doe-123") # 3.21μs -> 2.58μs (24.5% faster)

def test_basic_with_mixed_symbols():
    # Should remove all non-alphanumeric characters
    codeflash_output = to_alpha_numeric("J@o#h$n%1^2&3*") # 3.96μs -> 3.28μs (20.7% faster)

def test_basic_with_empty_string():
    # Should return empty string for empty input
    codeflash_output = to_alpha_numeric("") # 1.94μs -> 1.12μs (72.5% faster)

def test_basic_with_only_symbols():
    # Should return empty string for input with only symbols
    codeflash_output = to_alpha_numeric("!@#$%^&*()") # 2.77μs -> 2.03μs (36.7% faster)

def test_basic_with_only_spaces():
    # Should return empty string for input with only spaces
    codeflash_output = to_alpha_numeric("     ") # 2.59μs -> 1.75μs (48.3% faster)

def test_basic_with_numbers_only():
    # Should return the same string if only numbers
    codeflash_output = to_alpha_numeric("123456") # 2.61μs -> 1.90μs (37.2% faster)

def test_basic_with_letters_only():
    # Should return lowercase letters
    codeflash_output = to_alpha_numeric("AbCdEfG") # 2.60μs -> 1.93μs (35.0% faster)

# 2. Edge Test Cases

def test_edge_with_unicode_letters():
    # Should keep Unicode alphanumeric characters (letters and numbers)
    codeflash_output = to_alpha_numeric("JöhnDœ123") # 3.45μs -> 2.62μs (31.8% faster)

def test_edge_with_unicode_symbols():
    # Should remove Unicode symbols (e.g., emoji)
    codeflash_output = to_alpha_numeric("John😀Doe💡123") # 4.59μs -> 3.79μs (21.1% faster)

def test_edge_with_non_latin_alphanumerics():
    # Should keep non-Latin alphanumerics (Cyrillic, Greek)
    codeflash_output = to_alpha_numeric("Иван123Δelta") # 3.36μs -> 2.63μs (27.5% faster)

def test_edge_with_mixed_newlines_and_tabs():
    # Should remove newlines and tabs
    codeflash_output = to_alpha_numeric("John\nDoe\t123") # 3.75μs -> 3.00μs (25.1% faster)

def test_edge_with_control_characters():
    # Should remove control characters
    codeflash_output = to_alpha_numeric("John\x00Doe\x1F123") # 3.50μs -> 2.71μs (28.9% faster)

def test_edge_with_leading_trailing_symbols():
    # Should remove leading and trailing symbols
    codeflash_output = to_alpha_numeric("!@#JohnDoe123$%^") # 3.31μs -> 2.76μs (19.8% faster)

def test_edge_with_multiple_consecutive_symbols():
    # Should remove consecutive symbols
    codeflash_output = to_alpha_numeric("J!!!o---h___n***123") # 3.72μs -> 3.07μs (21.3% faster)

def test_edge_with_long_string_of_symbols():
    # Should return empty string if only symbols, even if long
    codeflash_output = to_alpha_numeric("!@#$%^&*()" * 50) # 5.93μs -> 5.07μs (16.9% faster)

def test_edge_with_surrogate_pairs():
    # Should remove surrogate pair emoji
    codeflash_output = to_alpha_numeric("John\U0001F600Doe\U0001F4A1") # 3.92μs -> 3.31μs (18.5% faster)

def test_edge_with_mixed_case_and_symbols():
    # Should lowercase and remove symbols
    codeflash_output = to_alpha_numeric("JoHn_DoE-123!") # 3.54μs -> 2.68μs (32.1% faster)

# 3. Large Scale Test Cases

def test_large_scale_long_alphanumeric_string():
    # Should handle long alphanumeric string efficiently
    s = "JohnDoe123" * 100  # 1000+ chars
    expected = "johndoe123" * 100
    codeflash_output = to_alpha_numeric(s) # 12.2μs -> 11.5μs (6.07% faster)

def test_large_scale_long_string_with_symbols():
    # Should remove all symbols from a long string
    s = ("J@o#h$n%1^2&3*" * 100)
    expected = "john123" * 100
    codeflash_output = to_alpha_numeric(s) # 64.5μs -> 62.2μs (3.77% faster)

def test_large_scale_mixed_unicode_and_symbols():
    # Should keep Unicode letters/numbers, remove symbols
    s = ("Jöhn_Dœ-123😀💡" * 50)
    expected = "jöhndœ123" * 50
    codeflash_output = to_alpha_numeric(s) # 24.6μs -> 23.6μs (4.06% faster)

def test_large_scale_only_symbols():
    # Should return empty string for large input of only symbols
    s = "!@#$%^&*()" * 100
    codeflash_output = to_alpha_numeric(s) # 9.29μs -> 8.63μs (7.64% faster)

def test_large_scale_alphanumeric_with_spaces():
    # Should remove spaces from a large string
    s = ("John Doe 123 " * 80)
    expected = "johndoe123" * 80
    codeflash_output = to_alpha_numeric(s) # 35.0μs -> 34.0μs (3.06% faster)

def test_large_scale_numbers_only():
    # Should return same string for large numbers-only input
    s = "1234567890" * 100
    expected = "1234567890" * 100
    codeflash_output = to_alpha_numeric(s) # 12.8μs -> 12.1μs (5.69% faster)

def test_large_scale_letters_only():
    # Should return lowercase letters for large input
    s = "AbCdEfG" * 100
    expected = "abcdefg" * 100
    codeflash_output = to_alpha_numeric(s) # 9.11μs -> 8.20μs (11.1% faster)

def test_large_scale_mixed_case_and_symbols():
    # Should lowercase and remove symbols in large input
    s = ("JoHn_DoE-123!" * 90)
    expected = "johndoe123" * 90
    codeflash_output = to_alpha_numeric(s) # 32.3μs -> 31.6μs (2.19% faster)

# 4. Additional Edge Cases

def test_edge_with_none_input():
    # Should raise TypeError if input is not a string
    with pytest.raises(TypeError):
        to_alpha_numeric(None) # 2.69μs -> 1.90μs (41.6% faster)

def test_edge_with_integer_input():
    # Should raise TypeError if input is not a string
    with pytest.raises(TypeError):
        to_alpha_numeric(12345) # 2.62μs -> 1.91μs (37.2% faster)

def test_edge_with_list_input():
    # Should raise TypeError if input is not a string
    with pytest.raises(TypeError):
        to_alpha_numeric(["John", "Doe", "123"]) # 2.57μs -> 1.70μs (51.2% faster)

def test_edge_with_bytes_input():
    # Should raise TypeError if input is not a string
    with pytest.raises(TypeError):
        to_alpha_numeric(b"JohnDoe123") # 2.26μs -> 1.45μs (56.0% faster)
# codeflash_output is used to check that the output of the original code is the same as that of the optimized code.
#------------------------------------------------
from panel.chat.utils import to_alpha_numeric

def test_to_alpha_numeric():
    to_alpha_numeric('')
🔎 Concolic Coverage Tests and Runtime
Test File::Test Function Original ⏱️ Optimized ⏱️ Speedup
codeflash_concolic_qbtdmixy/tmpbthd3r9v/test_concolic_coverage.py::test_to_alpha_numeric 2.75μs 1.47μs 87.3%✅

To edit these changes git checkout codeflash/optimize-to_alpha_numeric-mhc2abzv and push.

Codeflash

The optimization pre-compiles the regular expression pattern `r"\W+"` into a `re.Pattern` object stored as `_non_alnum_pattern` at module level, rather than compiling it on every function call.

**Key change**: Instead of calling `re.sub(r"\W+", "", user)` which compiles the regex pattern each time, the optimized version uses `_non_alnum_pattern.sub("", user)` with a pre-compiled pattern.

**Why it's faster**: Regular expression compilation is computationally expensive. The original code recompiles the `\W+` pattern on every function call, while the optimized version compiles it once when the module loads and reuses the compiled pattern object. This eliminates the regex compilation overhead from the hot path.

**Performance characteristics**: The optimization shows consistent 10-50% speedups across all test cases, with the most significant gains on:
- Small inputs (50%+ faster for single characters and short strings)  
- Simple patterns (40%+ faster for basic alphanumeric strings)
- Edge cases like empty strings (67% faster)

The speedup is less pronounced but still meaningful for large inputs (3-11% faster), where the regex execution time dominates over compilation time. This optimization is particularly effective for functions called frequently with varied small inputs, which is typical for username sanitization use cases.
@codeflash-ai codeflash-ai bot requested a review from mashraf-222 October 29, 2025 13:59
@codeflash-ai codeflash-ai bot added ⚡️ codeflash Optimization PR opened by Codeflash AI 🎯 Quality: High Optimization Quality according to Codeflash labels Oct 29, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

⚡️ codeflash Optimization PR opened by Codeflash AI 🎯 Quality: High Optimization Quality according to Codeflash

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant