Skip to content

Conversation

@codeflash-ai
Copy link

@codeflash-ai codeflash-ai bot commented Oct 29, 2025

📄 48% (0.48x) speedup for compile_path in starlette/routing.py

⏱️ Runtime : 1.89 milliseconds 1.28 milliseconds (best of 5 runs)

📝 Explanation and details

The optimized version achieves a 48% speedup by addressing Python's most significant string concatenation performance bottleneck and reducing attribute lookup overhead in loops.

Key optimizations:

  1. List-based string building: Replaced repeated string concatenation (path_regex += ..., path_format += ...) with list accumulation (path_regex_parts.append(), path_format_parts.append()) followed by a single "".join() call. This eliminates the O(n²) behavior of string concatenation in Python, where each += creates a new string object.

  2. Localized attribute access: Cached PARAM_REGEX.finditer and re.escape as local variables to avoid repeated attribute lookups in the tight loop that processes path parameters.

  3. Streamlined final assembly: Consolidated the final string building into single join operations instead of multiple concatenations.

Performance impact by test case type:

  • Many parameters (50-999 params): 40-57% faster - benefits most from list-based concatenation
  • Single/few parameters: 19-34% faster - still benefits from reduced overhead
  • Static paths: 5-19% faster - minimal but consistent improvement
  • Host patterns: 23-35% faster - good gains from string building optimization

The optimization is particularly effective for paths with multiple parameters where the original code performed many string concatenations in the loop. Even simple cases benefit from the reduced attribute lookup overhead.

Correctness verification report:

Test Status
⚙️ Existing Unit Tests 🔘 None Found
🌀 Generated Regression Tests 36 Passed
⏪ Replay Tests 30 Passed
🔎 Concolic Coverage Tests 2 Passed
📊 Tests Coverage 100.0%
🌀 Generated Regression Tests and Runtime
import re
from re import Pattern
from typing import Any

# imports
import pytest
from starlette.routing import compile_path


# Minimal Convertor and CONVERTOR_TYPES definitions for testability
class Convertor:
    def __init__(self, regex: str):
        self.regex = regex

class StringConvertor(Convertor):
    def __init__(self):
        super().__init__(r"[^/]+")

class IntConvertor(Convertor):
    def __init__(self):
        super().__init__(r"[0-9]+")
        
class PathConvertor(Convertor):
    def __init__(self):
        super().__init__(r".+")

class SlugConvertor(Convertor):
    def __init__(self):
        super().__init__(r"[-a-zA-Z0-9_]+")

CONVERTOR_TYPES = {
    "str": StringConvertor(),
    "int": IntConvertor(),
    "path": PathConvertor(),
    "slug": SlugConvertor(),
}


PARAM_REGEX = re.compile(r"{([a-zA-Z_][a-zA-Z0-9_]*)(:[a-zA-Z_][a-zA-Z0-9_]*)?}")
from starlette.routing import compile_path

# unit tests

# --------------- BASIC TEST CASES ---------------

def test_static_path():
    # Path with no parameters
    regex, fmt, convertors = compile_path("/about") # 6.15μs -> 5.67μs (8.46% faster)

def test_single_str_param():
    # Path with a single string parameter
    regex, fmt, convertors = compile_path("/user/{username}") # 10.3μs -> 8.68μs (19.0% faster)
    m = regex.match("/user/alice")

def test_single_int_param():
    # Path with a single int parameter
    regex, fmt, convertors = compile_path("/item/{id:int}") # 9.94μs -> 7.99μs (24.5% faster)
    m = regex.match("/item/123")


def test_path_convertor():
    # Path with a path convertor (matches slashes)
    regex, fmt, convertors = compile_path("/files/{filepath:path}") # 13.4μs -> 11.5μs (16.9% faster)
    m = regex.match("/files/a/b/c.txt")
    m2 = regex.match("/files/singlefile")


def test_host_pattern():
    # Host pattern (doesn't start with '/')
    regex, fmt, convertors = compile_path("{subdomain}.example.com") # 14.1μs -> 11.6μs (20.9% faster)
    m = regex.match("api.example.com")

def test_host_with_port():
    # Host pattern with port (should ignore port in regex)
    regex, fmt, convertors = compile_path("{subdomain}.example.com:8000") # 11.0μs -> 8.60μs (28.4% faster)
    m = regex.match("admin.example.com")

# --------------- EDGE TEST CASES ---------------

def test_empty_path():
    # Empty path should match only empty string
    regex, fmt, convertors = compile_path("") # 5.45μs -> 4.77μs (14.2% faster)

def test_only_slash():
    # Path is just "/"
    regex, fmt, convertors = compile_path("/") # 5.39μs -> 4.53μs (19.1% faster)

def test_unknown_convertor():
    # Unknown convertor type should raise AssertionError
    with pytest.raises(AssertionError):
        compile_path("/foo/{bar:unknown}") # 5.39μs -> 4.55μs (18.4% faster)

def test_duplicate_param_name():
    # Duplicate param names should raise ValueError
    with pytest.raises(ValueError) as excinfo:
        compile_path("/foo/{id:int}/bar/{id:str}") # 13.3μs -> 10.4μs (28.4% faster)


def test_param_at_start_and_end():
    # Param at start and end of path
    regex, fmt, convertors = compile_path("/{lang}/docs/{page}") # 15.4μs -> 12.7μs (21.2% faster)
    m = regex.match("/en/docs/index")

def test_adjacent_params():
    # Adjacent params, e.g. /{a:int}{b:int}
    regex, fmt, convertors = compile_path("/{a:int}{b:int}") # 12.3μs -> 9.64μs (27.6% faster)
    m = regex.match("/123456")

def test_param_with_literal_colon():
    # Path segment with literal colon after param
    regex, fmt, convertors = compile_path("/foo/{bar:int}:baz") # 9.88μs -> 8.23μs (20.1% faster)
    m = regex.match("/foo/42:baz")

def test_param_with_literal_braces():
    # Path with literal braces should not be parsed as params
    regex, fmt, convertors = compile_path("/foo/{{bar}}") # 9.62μs -> 8.29μs (16.1% faster)
    m = regex.match("/foo/{bar}")

def test_param_with_non_ascii():
    # Param values can be non-ASCII (for str/slug/path)
    regex, fmt, convertors = compile_path("/user/{name}") # 9.20μs -> 7.79μs (18.0% faster)
    m = regex.match("/user/éclair")


def test_many_params():
    # Path with many params (up to 1000)
    param_count = 100
    path = "/" + "/".join(f"{{p{i}:int}}" for i in range(param_count))
    regex, fmt, convertors = compile_path(path) # 139μs -> 92.2μs (51.1% faster)
    values = "/".join(str(i) for i in range(param_count))
    m = regex.match("/" + values)
    for i in range(param_count):
        pass

def test_long_static_path():
    # Very long static path (1000 chars)
    static = "/" + "a" * 999
    regex, fmt, convertors = compile_path(static) # 8.36μs -> 7.49μs (11.7% faster)

def test_long_path_with_params():
    # Long path with interleaved params and static segments
    segs = []
    fmt_segs = []
    convertor_dict = {}
    for i in range(50):
        segs.append(f"static{i}")
        segs.append(f"{{param{i}:int}}")
        fmt_segs.append(f"static{i}")
        fmt_segs.append(f"{{param{i}}}")
        convertor_dict[f"param{i}"] = IntConvertor()
    path = "/" + "/".join(segs)
    regex, fmt, convertors = compile_path(path) # 93.3μs -> 61.0μs (52.9% faster)
    vals = "/".join([f"static{i}/{i}" for i in range(50)])
    vals = "/" + vals.replace("/", "", 1)  # fix leading slash
    # Build matching string
    vals = "/" + "/".join(f"static{i}/{i}" for i in range(50))
    m = regex.match(vals)
    for i in range(50):
        pass

def test_large_host_pattern():
    # Large host pattern with many params
    host = ".".join(f"{{s{i}}}" for i in range(20)) + ".example.com"
    regex, fmt, convertors = compile_path(host) # 33.7μs -> 24.5μs (37.8% faster)
    test_host = ".".join(f"seg{i}" for i in range(20)) + ".example.com"
    m = regex.match(test_host)
    for i in range(20):
        pass


#------------------------------------------------
import re

# imports
import pytest
from starlette.routing import compile_path


# Minimal Convertor classes and CONVERTOR_TYPES for testing
class StringConvertor:
    regex = "[^/]+"
    def __repr__(self): return "StringConvertor()"

class IntConvertor:
    regex = "[0-9]+"
    def __repr__(self): return "IntConvertor()"

class SlugConvertor:
    regex = "[a-zA-Z0-9_-]+"
    def __repr__(self): return "SlugConvertor()"

CONVERTOR_TYPES = {
    "str": StringConvertor(),
    "int": IntConvertor(),
    "slug": SlugConvertor(),
}


PARAM_REGEX = re.compile("{([a-zA-Z_][a-zA-Z0-9_]*)(:[a-zA-Z_][a-zA-Z0-9_]*)?}")
from starlette.routing import compile_path

# unit tests

# ----------- Basic Test Cases -----------

def test_static_path_no_params():
    # No parameters, should return regex matching only the static path
    regex, fmt, conv = compile_path("/about") # 6.18μs -> 5.87μs (5.28% faster)

def test_single_str_param():
    # Single parameter with default (str) type
    regex, fmt, conv = compile_path("/user/{username}") # 12.5μs -> 10.0μs (24.5% faster)

def test_single_int_param():
    # Single parameter with explicit int type
    regex, fmt, conv = compile_path("/item/{id:int}") # 10.5μs -> 8.53μs (22.8% faster)


def test_host_with_param():
    # Host string with parameter
    regex, fmt, conv = compile_path("{subdomain}.example.com") # 15.3μs -> 12.0μs (27.5% faster)

def test_host_with_param_and_port():
    # Host string with parameter and port, port should be ignored in regex
    regex, fmt, conv = compile_path("{subdomain}.example.com:8080") # 11.5μs -> 8.65μs (33.0% faster)

# ----------- Edge Test Cases -----------

def test_unknown_convertor_type():
    # Unknown convertor type should raise AssertionError
    with pytest.raises(AssertionError):
        compile_path("/user/{username:unknown}") # 5.19μs -> 4.31μs (20.6% faster)

def test_duplicate_param_names():
    # Duplicate parameter names should raise ValueError
    with pytest.raises(ValueError) as excinfo:
        compile_path("/user/{id:int}/post/{id:str}") # 13.2μs -> 9.87μs (34.1% faster)

def test_multiple_duplicate_param_names():
    # Multiple duplicate parameter names should raise ValueError listing all
    with pytest.raises(ValueError) as excinfo:
        compile_path("/x/{a:int}/y/{a:str}/z/{b:int}/w/{b:str}") # 14.6μs -> 10.9μs (34.1% faster)


def test_path_with_no_leading_slash_is_host():
    # Path with no leading slash is a host
    regex, fmt, conv = compile_path("api.{domain}.com") # 14.3μs -> 11.6μs (23.6% faster)

def test_empty_path():
    # Empty path should match only the start and end
    regex, fmt, conv = compile_path("/") # 5.45μs -> 4.69μs (16.1% faster)

def test_param_at_start_and_end():
    # Param at start and end of path
    regex, fmt, conv = compile_path("/{first}/middle/{last}") # 13.1μs -> 10.3μs (26.5% faster)

def test_param_with_underscore_and_digits():
    # Param name with underscores and digits
    regex, fmt, conv = compile_path("/foo/{bar_123:int}") # 9.71μs -> 8.11μs (19.7% faster)

def test_param_with_default_type():
    # Param with omitted type should default to str
    regex, fmt, conv = compile_path("/foo/{bar}") # 9.17μs -> 7.46μs (22.9% faster)

# ----------- Large Scale Test Cases -----------

def test_many_params_in_path():
    # Path with many parameters (up to 50)
    path = "/" + "/".join(f"{{p{i}:int}}" for i in range(50))
    regex, fmt, conv = compile_path(path) # 72.9μs -> 51.9μs (40.4% faster)
    # Check regex pattern and format string
    expected_pattern = "^" + "".join(f"/(?P<p{i}>[0-9]+)" for i in range(50)) + "$"
    expected_fmt = "/" + "/".join(f"{{p{i}}}" for i in range(50))

def test_long_static_path():
    # Path with long static segments and a param at the end
    static = "/".join(["static"] * 100)
    path = f"/{static}/{{last:str}}"
    regex, fmt, conv = compile_path(path) # 11.8μs -> 9.29μs (27.5% faster)
    expected_pattern = "^" + "/".join(["static"] * 100) + "/(?P<last>[^/]+)$"
    expected_fmt = "/" + "/".join(["static"] * 100) + "/{last}"

def test_large_host_pattern():
    # Host pattern with multiple params
    path = "{a}.example.{b}.service.{c}.com"
    regex, fmt, conv = compile_path(path) # 15.5μs -> 11.5μs (35.2% faster)
    expected_pattern = (
        "^"
        "(?P<a>[^/]+)\\.example\\."
        "(?P<b>[^/]+)\\.service\\."
        "(?P<c>[^/]+)\\.com$"
    )


def test_path_with_maximum_allowed_elements():
    # Path with 999 params (limit just below 1000)
    path = "/" + "/".join(f"{{p{i}:str}}" for i in range(999))
    regex, fmt, conv = compile_path(path) # 1.21ms -> 769μs (57.2% faster)
    expected_pattern = "^" + "".join(f"/(?P<p{i}>[^/]+)" for i in range(999)) + "$"
    expected_fmt = "/" + "/".join(f"{{p{i}}}" for i in range(999))
# codeflash_output is used to check that the output of the original code is the same as that of the optimized code.
#------------------------------------------------
from starlette.routing import compile_path

def test_compile_path():
    compile_path('')

def test_compile_path_2():
    compile_path('/')
⏪ Replay Tests and Runtime
🔎 Concolic Coverage Tests and Runtime
Test File::Test Function Original ⏱️ Optimized ⏱️ Speedup
codeflash_concolic_xzaz2m9_/tmp1su6qt5x/test_concolic_coverage.py::test_compile_path 6.89μs 6.41μs 7.62%✅
codeflash_concolic_xzaz2m9_/tmp1su6qt5x/test_concolic_coverage.py::test_compile_path_2 5.38μs 4.74μs 13.5%✅

To edit these changes git checkout codeflash/optimize-compile_path-mhbltt21 and push.

Codeflash

The optimized version achieves a **48% speedup** by addressing Python's most significant string concatenation performance bottleneck and reducing attribute lookup overhead in loops.

**Key optimizations:**

1. **List-based string building**: Replaced repeated string concatenation (`path_regex += ...`, `path_format += ...`) with list accumulation (`path_regex_parts.append()`, `path_format_parts.append()`) followed by a single `"".join()` call. This eliminates the O(n²) behavior of string concatenation in Python, where each `+=` creates a new string object.

2. **Localized attribute access**: Cached `PARAM_REGEX.finditer` and `re.escape` as local variables to avoid repeated attribute lookups in the tight loop that processes path parameters.

3. **Streamlined final assembly**: Consolidated the final string building into single join operations instead of multiple concatenations.

**Performance impact by test case type:**
- **Many parameters** (50-999 params): 40-57% faster - benefits most from list-based concatenation
- **Single/few parameters**: 19-34% faster - still benefits from reduced overhead
- **Static paths**: 5-19% faster - minimal but consistent improvement
- **Host patterns**: 23-35% faster - good gains from string building optimization

The optimization is particularly effective for paths with multiple parameters where the original code performed many string concatenations in the loop. Even simple cases benefit from the reduced attribute lookup overhead.
@codeflash-ai codeflash-ai bot requested a review from mashraf-222 October 29, 2025 06:18
@codeflash-ai codeflash-ai bot added ⚡️ codeflash Optimization PR opened by Codeflash AI 🎯 Quality: High Optimization Quality according to Codeflash labels Oct 29, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

⚡️ codeflash Optimization PR opened by Codeflash AI 🎯 Quality: High Optimization Quality according to Codeflash

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant