Skip to content

Conversation

@codeflash-ai
Copy link

@codeflash-ai codeflash-ai bot commented Oct 29, 2025

📄 24% (0.24x) speedup for MultiDict.setdefault in starlette/datastructures.py

⏱️ Runtime : 3.34 milliseconds 2.69 milliseconds (best of 110 runs)

📝 Explanation and details

The optimization replaces return self[key] with return _dict[key] on the final line. This eliminates a method call overhead by accessing the underlying dictionary directly instead of going through the __getitem__ method.

Key change: The return statement now uses direct dictionary access (_dict[key]) rather than the class's item access protocol (self[key]).

Why this is faster:

  • self[key] triggers Python's attribute lookup mechanism and calls the __getitem__ method
  • _dict[key] is a direct dictionary lookup operation
  • The profiler shows the return line improved from 626.3ns per hit to 236.3ns per hit (62% faster on that line)

Performance characteristics: This optimization is particularly effective for:

  • High-frequency setdefault calls on existing keys (54-63% speedup in tests)
  • Large-scale operations with many existing keys (28% speedup on 1000 keys)
  • Mixed usage patterns where setdefault is called repeatedly

The optimization is safe because MultiDict.__getitem__ simply delegates to self._dict[key] anyway, so the direct access preserves identical behavior while eliminating the method call overhead.

Correctness verification report:

Test Status
⚙️ Existing Unit Tests 🔘 None Found
🌀 Generated Regression Tests 9890 Passed
⏪ Replay Tests 🔘 None Found
🔎 Concolic Coverage Tests 2 Passed
📊 Tests Coverage 100.0%
🌀 Generated Regression Tests and Runtime
from typing import Any

# imports
import pytest
from starlette.datastructures import MultiDict


class ImmutableMultiDict:
    def __init__(self, items=None):
        self._list = []
        self._dict = {}
        if items:
            for k, v in items:
                self._list.append((k, v))
                self._dict[k] = v

    def __getitem__(self, key):
        return self._dict[key]

    def __contains__(self, key):
        return key in self._dict

    def setlist(self, key, values):
        # Remove old
        self._list = [(k, v) for k, v in self._list if k != key]
        # Add new
        for v in values:
            self._list.append((key, v))
        # Only last value is stored in _dict (like dict)
        self._dict[key] = values[-1]
from starlette.datastructures import MultiDict

# unit tests

# -------------------------------
# 1. Basic Test Cases
# -------------------------------

def test_setdefault_basic_new_key():
    # Test adding a new key with default value
    md = MultiDict([('a', 1), ('b', 2)])
    codeflash_output = md.setdefault('c', 3); result = codeflash_output # 1.17μs -> 873ns (34.5% faster)

def test_setdefault_basic_existing_key():
    # Test getting an existing key doesn't change value
    md = MultiDict([('a', 1), ('b', 2)])
    codeflash_output = md.setdefault('a', 99); result = codeflash_output # 851ns -> 551ns (54.4% faster)

def test_setdefault_basic_default_none():
    # Test default argument omitted (should be None)
    md = MultiDict([('x', 42)])
    codeflash_output = md.setdefault('y'); result = codeflash_output # 1.08μs -> 843ns (28.7% faster)

def test_setdefault_basic_mutable_default():
    # Test with mutable default value
    md = MultiDict()
    default = []
    codeflash_output = md.setdefault('foo', default); result = codeflash_output # 1.10μs -> 837ns (31.5% faster)
    # Changing result should affect md['foo']
    result.append(1)

def test_setdefault_basic_multiple_calls():
    # Test calling setdefault multiple times on same key
    md = MultiDict()
    codeflash_output = md.setdefault('x', 1); first = codeflash_output # 1.11μs -> 904ns (22.9% faster)
    codeflash_output = md.setdefault('x', 2); second = codeflash_output # 448ns -> 310ns (44.5% faster)

# -------------------------------
# 2. Edge Test Cases
# -------------------------------

def test_setdefault_edge_key_is_none():
    # None as key
    md = MultiDict()
    md.setdefault(None, "test") # 1.18μs -> 929ns (26.9% faster)

def test_setdefault_edge_value_is_none():
    # None as value
    md = MultiDict()
    md.setdefault('key', None) # 1.10μs -> 794ns (38.9% faster)

def test_setdefault_edge_key_is_false():
    # False as key
    md = MultiDict()
    md.setdefault(False, "falsy") # 1.19μs -> 948ns (25.6% faster)

def test_setdefault_edge_key_is_zero():
    # 0 as key
    md = MultiDict()
    md.setdefault(0, "zero") # 1.16μs -> 855ns (35.8% faster)

def test_setdefault_edge_key_types():
    # Key types: int, str, tuple, frozenset
    md = MultiDict()
    keys = [123, "abc", (1,2), frozenset({1,2})]
    for k in keys:
        md.setdefault(k, "val") # 3.18μs -> 2.71μs (17.3% faster)

def test_setdefault_edge_existing_key_with_none_value():
    # Existing key with None value
    md = MultiDict([('x', None)])
    codeflash_output = md.setdefault('x', 99); result = codeflash_output # 716ns -> 486ns (47.3% faster)

def test_setdefault_edge_empty_string_key():
    # Empty string as key
    md = MultiDict()
    md.setdefault('', 'empty') # 1.09μs -> 791ns (38.2% faster)

def test_setdefault_edge_key_collision():
    # Two keys that compare equal but are different types
    md = MultiDict()
    md.setdefault(1, 'int') # 1.21μs -> 870ns (39.0% faster)
    md.setdefault(True, 'bool') # 678ns -> 633ns (7.11% faster)

def test_setdefault_edge_mutable_key():
    # Unhashable key should raise TypeError
    md = MultiDict()
    with pytest.raises(TypeError):
        md.setdefault([], 'listkey') # 1.36μs -> 1.37μs (0.365% slower)

def test_setdefault_edge_key_deletion():
    # Test that after deleting a key, setdefault can add it again
    md = MultiDict([('x', 1)])
    del md['x']
    codeflash_output = md.setdefault('x', 2); result = codeflash_output # 1.17μs -> 935ns (25.3% faster)

def test_setdefault_edge_overwrite_with_setitem():
    # setdefault should not overwrite value set by __setitem__
    md = MultiDict()
    md.setdefault('a', 1) # 1.06μs -> 798ns (32.5% faster)
    md['a'] = 2
    codeflash_output = md.setdefault('a', 3); result = codeflash_output # 470ns -> 332ns (41.6% faster)

def test_setdefault_edge_overwrite_with_setlist():
    # setdefault should not overwrite value set by setlist
    md = MultiDict()
    md.setdefault('a', 1) # 1.02μs -> 750ns (36.4% faster)
    md.setlist('a', [3, 4])
    codeflash_output = md.setdefault('a', 5); result = codeflash_output # 454ns -> 314ns (44.6% faster)

def test_setdefault_edge_key_is_object():
    # Object as key
    class KeyObj:
        pass
    obj = KeyObj()
    md = MultiDict()
    md.setdefault(obj, 'objval') # 1.25μs -> 956ns (30.4% faster)

def test_setdefault_edge_default_is_callable():
    # Callable as default value
    md = MultiDict()
    def f(): return 42
    md.setdefault('func', f) # 1.07μs -> 786ns (35.9% faster)

# -------------------------------
# 3. Large Scale Test Cases
# -------------------------------

def test_setdefault_large_scale_many_keys():
    # Add 1000 keys
    md = MultiDict()
    for i in range(1000):
        codeflash_output = md.setdefault(f'key{i}', i); val = codeflash_output # 361μs -> 294μs (23.0% faster)
    # Check all keys present
    for i in range(1000):
        pass

def test_setdefault_large_scale_existing_keys():
    # Prepopulate with 1000 keys, then setdefault with new default
    items = [(str(i), i) for i in range(1000)]
    md = MultiDict(items)
    for i in range(1000):
        codeflash_output = md.setdefault(str(i), 'new'); val = codeflash_output # 276μs -> 215μs (28.3% faster)

def test_setdefault_large_scale_mixed_usage():
    # Mix setdefault and setitem
    md = MultiDict()
    for i in range(500):
        md.setdefault(f'a{i}', i) # 182μs -> 148μs (23.2% faster)
    for i in range(500, 1000):
        md[f'a{i}'] = i
    # Check all keys
    for i in range(1000):
        pass
    # setdefault should not overwrite setitem
    for i in range(500, 1000):
        codeflash_output = md.setdefault(f'a{i}', 'shouldnotchange'); val = codeflash_output # 138μs -> 110μs (25.6% faster)

def test_setdefault_large_scale_performance():
    # Check that setdefault is O(1) for 1000 keys (no performance assertion, but should not timeout)
    md = MultiDict()
    for i in range(1000):
        md.setdefault(i, i) # 333μs -> 269μs (23.8% faster)
    # Confirm all keys
    for i in range(1000):
        pass

def test_setdefault_large_scale_mutable_defaults():
    # Use mutable default for many keys
    md = MultiDict()
    lists = []
    for i in range(1000):
        lst = []
        codeflash_output = md.setdefault(f'list{i}', lst); val = codeflash_output # 367μs -> 295μs (24.2% faster)
        lists.append(lst)
    # Mutate some lists and check
    for i in range(0, 1000, 100):
        lists[i].append(i)
# codeflash_output is used to check that the output of the original code is the same as that of the optimized code.
#------------------------------------------------
from __future__ import annotations

from typing import Any, TypeVar

# imports
import pytest
from starlette.datastructures import MultiDict


# Dummy base class to allow MultiDict to work for testing.
class ImmutableMultiDict:
    def __init__(self, items=None):
        self._list = []
        self._dict = {}
        if items:
            for k, v in items:
                self._list.append((k, v))
                self._dict[k] = v

    def __getitem__(self, key):
        return self._dict[key]

    def __contains__(self, key):
        return key in self._dict

    def setlist(self, key, values):
        # Remove any previous values for key
        self._list = [(k, v) for k, v in self._list if k != key]
        # Add new values
        for v in values:
            self._list.append((key, v))
        self._dict[key] = values[-1] if values else None
from starlette.datastructures import MultiDict

# unit tests

# --- Basic Test Cases ---

def test_setdefault_existing_key():
    """If key exists, setdefault returns its value and does not change the dict."""
    md = MultiDict([('a', 1), ('b', 2)])
    codeflash_output = md.setdefault('a', 99); result = codeflash_output # 857ns -> 526ns (62.9% faster)

def test_setdefault_new_key():
    """If key does not exist, setdefault adds it with the default value."""
    md = MultiDict([('a', 1)])
    codeflash_output = md.setdefault('b', 42); result = codeflash_output # 1.08μs -> 711ns (52.3% faster)

def test_setdefault_default_none():
    """If default is not specified, None is used."""
    md = MultiDict([('x', 'y')])
    codeflash_output = md.setdefault('z'); result = codeflash_output # 1.03μs -> 831ns (24.5% faster)

def test_setdefault_multiple_calls():
    """Repeated calls with same key should not change value."""
    md = MultiDict([('k', 'v')])
    codeflash_output = md.setdefault('k', 'new'); first = codeflash_output # 765ns -> 494ns (54.9% faster)
    codeflash_output = md.setdefault('k', 'new2'); second = codeflash_output # 341ns -> 251ns (35.9% faster)

# --- Edge Test Cases ---

def test_setdefault_key_is_none():
    """None as a key should work."""
    md = MultiDict()
    codeflash_output = md.setdefault(None, 'null'); val = codeflash_output # 1.09μs -> 887ns (23.1% faster)

def test_setdefault_value_is_none():
    """None as a value should be accepted."""
    md = MultiDict()
    codeflash_output = md.setdefault('foo', None); val = codeflash_output # 1.02μs -> 797ns (27.7% faster)

def test_setdefault_key_is_tuple():
    """Tuple as a key should work."""
    md = MultiDict()
    key = (1, 2)
    codeflash_output = md.setdefault(key, 'bar'); val = codeflash_output # 1.35μs -> 1.06μs (27.3% faster)

def test_setdefault_key_is_int():
    """Int as a key should work."""
    md = MultiDict()
    codeflash_output = md.setdefault(123, 'number'); val = codeflash_output # 1.21μs -> 932ns (30.0% faster)

def test_setdefault_overwrite_after_del():
    """After deleting a key, setdefault should add it again."""
    md = MultiDict([('x', 1), ('y', 2)])
    del md['x']
    codeflash_output = md.setdefault('x', 99); val = codeflash_output # 1.05μs -> 889ns (18.2% faster)

def test_setdefault_with_mutable_default():
    """setdefault with a mutable default should not share between keys."""
    md = MultiDict()
    default = []
    codeflash_output = md.setdefault('a', default); val1 = codeflash_output # 1.06μs -> 822ns (29.0% faster)
    codeflash_output = md.setdefault('b', default); val2 = codeflash_output # 642ns -> 478ns (34.3% faster)
    val1.append(1)

def test_setdefault_with_falsey_values():
    """Falsey values (0, '', False) should be handled correctly."""
    md = MultiDict()
    md.setdefault('zero', 0) # 1.01μs -> 787ns (28.5% faster)
    md.setdefault('empty', '') # 616ns -> 487ns (26.5% faster)
    md.setdefault('false', False) # 496ns -> 507ns (2.17% slower)

def test_setdefault_key_collision():
    """setdefault should distinguish keys by identity and hash."""
    md = MultiDict()
    md.setdefault(1, 'int') # 1.16μs -> 902ns (28.8% faster)
    md.setdefault(True, 'bool') # 674ns -> 594ns (13.5% faster)

def test_setdefault_non_hashable_key_raises():
    """setdefault with unhashable key should raise TypeError."""
    md = MultiDict()
    with pytest.raises(TypeError):
        md.setdefault(['not', 'hashable'], 'fail') # 1.34μs -> 1.40μs (4.01% slower)

def test_setdefault_key_is_object():
    """Object instances as keys should work."""
    class Foo: pass
    f = Foo()
    md = MultiDict()
    codeflash_output = md.setdefault(f, 'bar'); val = codeflash_output # 1.22μs -> 977ns (25.0% faster)

# --- Large Scale Test Cases ---

def test_setdefault_large_number_of_keys():
    """Test setdefault with 1000 unique keys."""
    md = MultiDict()
    for i in range(1000):
        codeflash_output = md.setdefault(f'key{i}', i); val = codeflash_output # 363μs -> 294μs (23.5% faster)

def test_setdefault_large_repeated_keys():
    """Test setdefault called repeatedly on same key in large dict."""
    md = MultiDict([(f'k{i}', i) for i in range(1000)])
    for i in range(1000):
        codeflash_output = md.setdefault(f'k{i}', 'new'); val = codeflash_output # 274μs -> 214μs (27.8% faster)

def test_setdefault_large_with_falsey_defaults():
    """Test setdefault with many falsey default values."""
    md = MultiDict()
    for i in range(500):
        md.setdefault(f'zero{i}', 0) # 186μs -> 155μs (20.1% faster)
        md.setdefault(f'empty{i}', '') # 183μs -> 149μs (22.8% faster)
        md.setdefault(f'false{i}', False) # 180μs -> 147μs (22.6% faster)
    for i in range(500):
        pass

def test_setdefault_large_mutable_defaults():
    """Test setdefault with many mutable default values."""
    md = MultiDict()
    for i in range(100):
        default = []
        codeflash_output = md.setdefault(f'list{i}', default); val = codeflash_output # 41.1μs -> 33.8μs (21.6% faster)
        val.append(i)

def test_setdefault_performance():
    """Performance sanity check: setdefault should complete quickly for 1000 keys."""
    import time
    md = MultiDict()
    start = time.time()
    for i in range(1000):
        md.setdefault(i, i) # 332μs -> 268μs (23.6% faster)
    duration = time.time() - start

# --- Determinism Test ---

def test_setdefault_determinism():
    """Repeated runs should always yield the same results."""
    md1 = MultiDict()
    md2 = MultiDict()
    for i in range(100):
        md1.setdefault(i, i * 2) # 36.3μs -> 29.5μs (23.0% faster)
        md2.setdefault(i, i * 2) # 34.2μs -> 27.6μs (23.6% faster)
# codeflash_output is used to check that the output of the original code is the same as that of the optimized code.
#------------------------------------------------
from starlette.datastructures import MultiDict

def test_MultiDict_setdefault():
    MultiDict.setdefault(MultiDict(), 0, default=0)
🔎 Concolic Coverage Tests and Runtime
Test File::Test Function Original ⏱️ Optimized ⏱️ Speedup
codeflash_concolic_b9ikc1l3/tmp2nib0xd6/test_concolic_coverage.py::test_MultiDict_setdefault 1.33μs 1.15μs 15.5%✅

To edit these changes git checkout codeflash/optimize-MultiDict.setdefault-mhc9m1ht and push.

Codeflash

The optimization replaces `return self[key]` with `return _dict[key]` on the final line. This eliminates a method call overhead by accessing the underlying dictionary directly instead of going through the `__getitem__` method.

**Key change**: The return statement now uses direct dictionary access (`_dict[key]`) rather than the class's item access protocol (`self[key]`).

**Why this is faster**: 
- `self[key]` triggers Python's attribute lookup mechanism and calls the `__getitem__` method
- `_dict[key]` is a direct dictionary lookup operation
- The profiler shows the return line improved from 626.3ns per hit to 236.3ns per hit (62% faster on that line)

**Performance characteristics**: This optimization is particularly effective for:
- High-frequency setdefault calls on existing keys (54-63% speedup in tests)
- Large-scale operations with many existing keys (28% speedup on 1000 keys)
- Mixed usage patterns where setdefault is called repeatedly

The optimization is safe because `MultiDict.__getitem__` simply delegates to `self._dict[key]` anyway, so the direct access preserves identical behavior while eliminating the method call overhead.
@codeflash-ai codeflash-ai bot requested a review from mashraf-222 October 29, 2025 17:24
@codeflash-ai codeflash-ai bot added ⚡️ codeflash Optimization PR opened by Codeflash AI 🎯 Quality: Medium Optimization Quality according to Codeflash labels Oct 29, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

⚡️ codeflash Optimization PR opened by Codeflash AI 🎯 Quality: Medium Optimization Quality according to Codeflash

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant