Skip to content

Conversation

@codeflash-ai
Copy link

@codeflash-ai codeflash-ai bot commented Oct 29, 2025

📄 1,038% (10.38x) speedup for URL.remove_query_params in starlette/datastructures.py

⏱️ Runtime : 71.2 milliseconds 6.26 milliseconds (best of 147 runs)

📝 Explanation and details

The optimized code achieves a 1037% speedup by targeting the most expensive operation in the original code: the MultiDict.pop() method called repeatedly in remove_query_params().

Key optimizations:

  1. Eliminated O(n*m) complexity in remove_query_params(): The original code called params.pop(key, None) for each key to remove, where each pop() operation performed a list comprehension to rebuild the entire _list. This created quadratic behavior when removing many keys. The optimized version uses a single list comprehension with set-based membership testing (item[0] not in keys_set) to filter out unwanted keys in one pass.

  2. Fast set-based lookups: Converting the keys to remove into a set enables O(1) membership testing instead of O(k) linear searches through the keys list for each query parameter.

  3. Host header search optimization: Replaced the explicit loop with next() and a generator expression for finding the "host" header, enabling early termination without iterating through all headers.

Performance impact by test case:

  • Small URLs (1-10 params): 1-8% improvements due to reduced overhead
  • Medium URLs (50-100 params): 100-250% improvements as quadratic behavior becomes noticeable
  • Large URLs (500-1000 params): 600-2900% improvements where the original O(n*m) complexity severely degrades performance

The optimization is most effective for URLs with many query parameters being removed, which is common in web applications that need to filter or sanitize query strings.

Correctness verification report:

Test Status
⚙️ Existing Unit Tests 🔘 None Found
🌀 Generated Regression Tests 219 Passed
⏪ Replay Tests 🔘 None Found
🔎 Concolic Coverage Tests 5 Passed
📊 Tests Coverage 100.0%
🌀 Generated Regression Tests and Runtime
from urllib.parse import urlencode

# imports
import pytest
from starlette.datastructures import URL

# ------------------- UNIT TESTS -------------------

# Basic Test Cases

def test_remove_single_query_param():
    # Remove a single query param from a simple URL
    url = URL("https://example.com/path?foo=1&bar=2")
    codeflash_output = url.remove_query_params("foo"); result = codeflash_output # 27.8μs -> 26.6μs (4.51% faster)

def test_remove_multiple_query_params():
    # Remove multiple query params
    url = URL("https://example.com/path?foo=1&bar=2&baz=3")
    codeflash_output = url.remove_query_params(["foo", "baz"]); result = codeflash_output # 33.2μs -> 31.1μs (6.85% faster)

def test_remove_param_not_present():
    # Remove a param that doesn't exist (should be a no-op)
    url = URL("https://example.com/path?foo=1&bar=2")
    codeflash_output = url.remove_query_params("baz"); result = codeflash_output # 26.5μs -> 25.1μs (5.34% faster)

def test_remove_all_params():
    # Remove all params, should result in no query string
    url = URL("https://example.com/path?foo=1")
    codeflash_output = url.remove_query_params("foo"); result = codeflash_output # 25.1μs -> 23.4μs (7.64% faster)

def test_remove_from_url_without_query():
    # Remove from URL with no query string (should be a no-op)
    url = URL("https://example.com/path")
    codeflash_output = url.remove_query_params("foo"); result = codeflash_output # 20.6μs -> 20.0μs (2.87% faster)

def test_remove_blank_value_param():
    # Remove a param with blank value
    url = URL("https://example.com/path?foo=&bar=2")
    codeflash_output = url.remove_query_params("foo"); result = codeflash_output # 30.0μs -> 28.3μs (6.18% faster)

def test_remove_param_with_fragment():
    # Remove param when URL has a fragment
    url = URL("https://example.com/path?foo=1&bar=2#frag")
    codeflash_output = url.remove_query_params("foo"); result = codeflash_output # 23.7μs -> 23.2μs (1.78% faster)

def test_remove_param_with_repeated_keys():
    # Remove a param that appears multiple times (should remove all)
    url = URL("https://example.com/path?foo=1&foo=2&bar=3")
    codeflash_output = url.remove_query_params("foo"); result = codeflash_output # 29.5μs -> 29.2μs (1.19% faster)

def test_remove_param_with_empty_string_key():
    # Remove an empty string key (should not affect other params)
    url = URL("https://example.com/path?foo=1&=empty&bar=2")
    codeflash_output = url.remove_query_params(""); result = codeflash_output # 31.8μs -> 30.6μs (3.97% faster)

# Edge Test Cases

def test_remove_param_from_url_with_only_query():
    # Remove param from a URL that is just a query string
    url = URL("?foo=1&bar=2")
    codeflash_output = url.remove_query_params("foo"); result = codeflash_output # 24.5μs -> 24.1μs (1.34% faster)

def test_remove_param_with_special_characters():
    # Remove param with special characters in key
    url = URL("https://example.com/path?f%20oo=1&bar=2")
    codeflash_output = url.remove_query_params("f oo"); result = codeflash_output # 36.9μs -> 35.3μs (4.46% faster)

def test_remove_param_with_unicode_characters():
    # Remove param with unicode characters in key
    url = URL("https://example.com/path?naïve=1&bar=2")
    codeflash_output = url.remove_query_params("naïve"); result = codeflash_output # 30.3μs -> 29.4μs (3.01% faster)

def test_remove_param_with_empty_query_string():
    # Remove from URL with an empty query string
    url = URL("https://example.com/path?")
    codeflash_output = url.remove_query_params("foo"); result = codeflash_output # 15.9μs -> 15.1μs (5.13% faster)

def test_remove_param_with_no_params_left():
    # Remove all params, leaving only base URL and fragment
    url = URL("https://example.com/path?foo=1#frag")
    codeflash_output = url.remove_query_params("foo"); result = codeflash_output # 23.9μs -> 22.9μs (4.22% faster)

def test_remove_param_with_duplicate_keys_and_blank_values():
    # Remove duplicate keys with blank values
    url = URL("https://example.com/path?foo=&foo=2&bar=3")
    codeflash_output = url.remove_query_params("foo"); result = codeflash_output # 30.6μs -> 29.0μs (5.42% faster)

def test_remove_param_with_mixed_types():
    # Remove param with integer key (should convert to string)
    url = URL("https://example.com/path?1=one&2=two")
    codeflash_output = url.remove_query_params(1); result = codeflash_output

def test_remove_param_with_list_of_nonexistent_keys():
    # Remove multiple keys, none of which exist
    url = URL("https://example.com/path?foo=1&bar=2")
    codeflash_output = url.remove_query_params(["baz", "qux"]); result = codeflash_output # 33.7μs -> 31.2μs (8.05% faster)

def test_remove_param_with_empty_keys_list():
    # Remove with empty list (should be a no-op)
    url = URL("https://example.com/path?foo=1&bar=2")
    codeflash_output = url.remove_query_params([]); result = codeflash_output # 26.2μs -> 26.4μs (0.744% slower)

def test_remove_param_with_non_string_keys():
    # Remove with non-string key (should treat as string)
    url = URL("https://example.com/path?123=abc&bar=2")
    codeflash_output = url.remove_query_params(123); result = codeflash_output

# Large Scale Test Cases

def test_remove_many_params():
    # Remove 100 params from a URL with 100 params
    params = [(f"key{i}", str(i)) for i in range(100)]
    query = urlencode(params)
    url = URL(f"https://example.com/path?{query}")
    codeflash_output = url.remove_query_params([f"key{i}" for i in range(100)]); result = codeflash_output # 361μs -> 111μs (224% faster)

def test_remove_half_params():
    # Remove half of the params from a URL with 100 params
    params = [(f"key{i}", str(i)) for i in range(100)]
    query = urlencode(params)
    url = URL(f"https://example.com/path?{query}")
    keys_to_remove = [f"key{i}" for i in range(50)]
    codeflash_output = url.remove_query_params(keys_to_remove); result = codeflash_output # 329μs -> 151μs (118% faster)
    expected_query = urlencode([(f"key{i}", str(i)) for i in range(50, 100)])

def test_remove_params_from_long_url():
    # Remove params from a very long URL (max 1000 params)
    params = [(f"key{i}", str(i)) for i in range(1000)]
    query = urlencode(params)
    url = URL(f"https://example.com/path?{query}")
    keys_to_remove = [f"key{i}" for i in range(500, 1000)]
    codeflash_output = url.remove_query_params(keys_to_remove); result = codeflash_output # 17.5ms -> 1.28ms (1264% faster)
    expected_query = urlencode([(f"key{i}", str(i)) for i in range(500)])

def test_remove_params_performance():
    # Test performance with 1000 params (should complete quickly)
    import time
    params = [(f"key{i}", str(i)) for i in range(1000)]
    query = urlencode(params)
    url = URL(f"https://example.com/path?{query}")
    start = time.time()
    codeflash_output = url.remove_query_params([f"key{i}" for i in range(1000)]); result = codeflash_output # 22.5ms -> 742μs (2927% faster)
    elapsed = time.time() - start

def test_remove_params_with_repeated_keys_large():
    # Remove repeated keys in a large URL
    params = [("foo", str(i)) for i in range(500)] + [("bar", str(i)) for i in range(500)]
    query = urlencode(params)
    url = URL(f"https://example.com/path?{query}")
    codeflash_output = url.remove_query_params("foo"); result = codeflash_output # 1.24ms -> 1.22ms (1.61% faster)
    expected_query = urlencode([("bar", str(i)) for i in range(500)])
# codeflash_output is used to check that the output of the original code is the same as that of the optimized code.
#------------------------------------------------
from collections.abc import Iterable, Iterator, Mapping, Sequence
from typing import Any, TypeVar, cast
# function to test (copied from above)
from urllib.parse import parse_qsl, urlencode, urlparse, urlunparse

# imports
import pytest  # used for our unit tests
from starlette.datastructures import URL

# ----------- UNIT TESTS ------------

# Basic Test Cases

def test_remove_single_query_param():
    # Remove a single query parameter from a simple URL
    url = URL("https://example.com/path?foo=1&bar=2")
    codeflash_output = url.remove_query_params("foo"); result = codeflash_output # 26.9μs -> 25.5μs (5.73% faster)

def test_remove_multiple_query_params():
    # Remove multiple query parameters at once
    url = URL("https://example.com/path?foo=1&bar=2&baz=3")
    codeflash_output = url.remove_query_params(["foo", "baz"]); result = codeflash_output # 26.5μs -> 24.3μs (8.95% faster)

def test_remove_nonexistent_param():
    # Remove a parameter that doesn't exist (should not change the URL)
    url = URL("https://example.com/path?foo=1&bar=2")
    codeflash_output = url.remove_query_params("baz"); result = codeflash_output # 25.8μs -> 24.8μs (4.19% faster)

def test_remove_all_params():
    # Remove all query parameters, resulting in no query string
    url = URL("https://example.com/path?foo=1&bar=2")
    codeflash_output = url.remove_query_params(["foo", "bar"]); result = codeflash_output # 20.7μs -> 19.4μs (6.57% faster)

def test_remove_param_from_url_without_query():
    # Remove param from a URL that has no query string
    url = URL("https://example.com/path")
    codeflash_output = url.remove_query_params("foo"); result = codeflash_output # 15.9μs -> 15.4μs (3.32% faster)

def test_remove_param_with_blank_value():
    # Remove param with blank value
    url = URL("https://example.com/path?foo=&bar=2")
    codeflash_output = url.remove_query_params("foo"); result = codeflash_output # 24.1μs -> 23.5μs (2.21% faster)

# Edge Test Cases

def test_remove_param_with_duplicate_keys():
    # Remove param when the query string has duplicate keys (should remove all)
    url = URL("https://example.com/path?foo=1&foo=2&bar=3")
    codeflash_output = url.remove_query_params("foo"); result = codeflash_output # 25.5μs -> 23.7μs (7.56% faster)

def test_remove_param_with_encoded_characters():
    # Remove param with encoded characters in the key
    url = URL("https://example.com/path?f%20oo=1&bar=2")
    codeflash_output = url.remove_query_params("f oo"); result = codeflash_output # 31.8μs -> 30.3μs (4.63% faster)

def test_remove_param_with_empty_string_key():
    # Remove param with empty string key
    url = URL("https://example.com/path?=1&foo=2")
    codeflash_output = url.remove_query_params(""); result = codeflash_output # 29.6μs -> 28.0μs (5.79% faster)

def test_remove_param_with_blank_query():
    # Remove param from a URL with a blank query string
    url = URL("https://example.com/path?")
    codeflash_output = url.remove_query_params("foo"); result = codeflash_output # 18.8μs -> 17.5μs (7.74% faster)

def test_remove_param_with_special_characters():
    # Remove param with special characters in the key
    url = URL("https://example.com/path?f@o!o=1&bar=2")
    codeflash_output = url.remove_query_params("f@o!o"); result = codeflash_output # 28.5μs -> 28.2μs (1.23% faster)

def test_remove_param_with_numeric_key():
    # Remove param where key is numeric
    url = URL("https://example.com/path?123=abc&foo=bar")
    codeflash_output = url.remove_query_params("123"); result = codeflash_output # 28.5μs -> 27.6μs (3.35% faster)

def test_remove_param_with_none_key():
    # Remove param with None key (should not remove anything)
    url = URL("https://example.com/path?foo=1&bar=2")
    codeflash_output = url.remove_query_params([None]); result = codeflash_output # 25.7μs -> 24.5μs (4.96% faster)

def test_remove_param_with_empty_keys_list():
    # Remove param with empty keys list (should not remove anything)
    url = URL("https://example.com/path?foo=1&bar=2")
    codeflash_output = url.remove_query_params([]); result = codeflash_output # 24.1μs -> 23.8μs (1.04% faster)

def test_remove_param_from_url_with_fragment():
    # Remove param from URL with fragment
    url = URL("https://example.com/path?foo=1&bar=2#frag")
    codeflash_output = url.remove_query_params("foo"); result = codeflash_output # 29.6μs -> 28.0μs (5.70% faster)

def test_remove_param_with_blank_value_and_blank_key():
    # Remove param with blank key and blank value
    url = URL("https://example.com/path?=1&foo=2")
    codeflash_output = url.remove_query_params(""); result = codeflash_output # 23.4μs -> 22.4μs (4.42% faster)

# Large Scale Test Cases

def test_remove_many_params():
    # Remove many parameters from a large query string
    base_url = "https://example.com/path?"
    query_items = [f"key{i}=val{i}" for i in range(100)]
    url = URL(base_url + "&".join(query_items))
    remove_keys = [f"key{i}" for i in range(50)]
    codeflash_output = url.remove_query_params(remove_keys); result = codeflash_output # 342μs -> 162μs (111% faster)
    # Only keys 50..99 should remain
    expected_query = "&".join([f"key{i}=val{i}" for i in range(50, 100)])

def test_remove_all_params_large():
    # Remove all parameters from a large query string
    base_url = "https://example.com/path?"
    query_items = [f"key{i}=val{i}" for i in range(100)]
    url = URL(base_url + "&".join(query_items))
    remove_keys = [f"key{i}" for i in range(100)]
    codeflash_output = url.remove_query_params(remove_keys); result = codeflash_output # 344μs -> 96.7μs (257% faster)

def test_remove_no_params_large():
    # Remove no parameters from a large query string (should not change)
    base_url = "https://example.com/path?"
    query_items = [f"key{i}=val{i}" for i in range(100)]
    url = URL(base_url + "&".join(query_items))
    codeflash_output = url.remove_query_params([]); result = codeflash_output # 204μs -> 204μs (0.255% slower)

def test_remove_params_with_mixed_types_large():
    # Remove parameters with mixed types (str, int, etc.) from a large query string
    base_url = "https://example.com/path?"
    query_items = [f"{i}=val{i}" for i in range(500)]
    url = URL(base_url + "&".join(query_items))
    remove_keys = list(range(0, 500, 2))  # Remove every even key
    # Convert keys to str for compatibility
    remove_keys_str = [str(k) for k in remove_keys]
    codeflash_output = url.remove_query_params(remove_keys_str); result = codeflash_output # 4.81ms -> 666μs (622% faster)
    expected_query = "&".join([f"{i}=val{i}" for i in range(1, 500, 2)])

def test_remove_params_performance():
    # Performance test: remove 999 keys from a 999-element query string
    base_url = "https://example.com/path?"
    query_items = [f"key{i}=val{i}" for i in range(999)]
    url = URL(base_url + "&".join(query_items))
    remove_keys = [f"key{i}" for i in range(999)]
    codeflash_output = url.remove_query_params(remove_keys); result = codeflash_output # 22.8ms -> 768μs (2861% faster)
# codeflash_output is used to check that the output of the original code is the same as that of the optimized code.
#------------------------------------------------
from starlette.datastructures import URL

def test_URL_remove_query_params():
    URL.remove_query_params(URL(url='', scope=None), '')
🔎 Concolic Coverage Tests and Runtime
Test File::Test Function Original ⏱️ Optimized ⏱️ Speedup
codeflash_concolic_xzaz2m9_/tmpzqrtf1l3/test_concolic_coverage.py::test_URL_remove_query_params 19.6μs 18.5μs 6.44%✅

To edit these changes git checkout codeflash/optimize-URL.remove_query_params-mhbqkyz1 and push.

Codeflash

The optimized code achieves a **1037% speedup** by targeting the most expensive operation in the original code: the `MultiDict.pop()` method called repeatedly in `remove_query_params()`.

**Key optimizations:**

1. **Eliminated O(n*m) complexity in `remove_query_params()`**: The original code called `params.pop(key, None)` for each key to remove, where each `pop()` operation performed a list comprehension to rebuild the entire `_list`. This created quadratic behavior when removing many keys. The optimized version uses a single list comprehension with set-based membership testing (`item[0] not in keys_set`) to filter out unwanted keys in one pass.

2. **Fast set-based lookups**: Converting the keys to remove into a `set` enables O(1) membership testing instead of O(k) linear searches through the keys list for each query parameter.

3. **Host header search optimization**: Replaced the explicit loop with `next()` and a generator expression for finding the "host" header, enabling early termination without iterating through all headers.

**Performance impact by test case:**
- **Small URLs (1-10 params)**: 1-8% improvements due to reduced overhead
- **Medium URLs (50-100 params)**: 100-250% improvements as quadratic behavior becomes noticeable  
- **Large URLs (500-1000 params)**: 600-2900% improvements where the original O(n*m) complexity severely degrades performance

The optimization is most effective for URLs with many query parameters being removed, which is common in web applications that need to filter or sanitize query strings.
@codeflash-ai codeflash-ai bot requested a review from mashraf-222 October 29, 2025 08:31
@codeflash-ai codeflash-ai bot added ⚡️ codeflash Optimization PR opened by Codeflash AI 🎯 Quality: High Optimization Quality according to Codeflash labels Oct 29, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

⚡️ codeflash Optimization PR opened by Codeflash AI 🎯 Quality: High Optimization Quality according to Codeflash

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants