Skip to content

Conversation

@codeflash-ai
Copy link

@codeflash-ai codeflash-ai bot commented Oct 30, 2025

📄 32% (0.32x) speedup for json_safe in src/anthropic/_utils/_utils.py

⏱️ Runtime : 23.6 milliseconds 17.9 milliseconds (best of 96 runs)

📝 Explanation and details

The optimization achieves a 31% speedup by reordering the conditional checks in the json_safe function to prioritize the most efficient path for datetime/date objects.

Key Change: The isinstance(data, (datetime, date)) check is moved from the bottom to the top of the conditional chain.

Why This Works:

  • Fast-path optimization: Date/datetime objects are handled with a simple isoformat() call and early return, avoiding the more expensive is_mapping() and is_iterable() function calls
  • Reduced function call overhead: In the original code, date/datetime objects still triggered calls to is_mapping() (26,278 hits) and is_iterable() (25,027 hits). The optimized version reduces these to 18,149 and 16,898 hits respectively
  • Better branch prediction: The most common non-recursive case is handled first, reducing the average number of conditions evaluated per call

Performance by Test Case Type:

  • Massive gains for date-heavy workloads: Tests with many dates show 97-283% speedups (e.g., test_large_list_of_dates goes from 1.01ms to 262μs)
  • Moderate gains for mixed structures: Nested objects containing dates see 14-29% improvements
  • Minimal impact on date-free data: Simple collections show small regressions (1-9% slower) due to the additional datetime check, but this is vastly outweighed by the gains

The optimization is particularly effective for JSON serialization scenarios where date/datetime objects are common, which aligns with the function's purpose of translating data "in the same fashion as pydantic v2's model_dump(mode="json")".

Correctness verification report:

Test Status
⚙️ Existing Unit Tests 🔘 None Found
🌀 Generated Regression Tests 85 Passed
⏪ Replay Tests 🔘 None Found
🔎 Concolic Coverage Tests 1 Passed
📊 Tests Coverage 100.0%
🌀 Generated Regression Tests and Runtime
from __future__ import annotations

from datetime import date, datetime, timedelta
from typing import Iterable, Mapping

# imports
import pytest  # used for our unit tests
from anthropic._utils._utils import json_safe
from typing_extensions import TypeGuard

# unit tests

# Basic Test Cases

def test_basic_int():
    # Test with a simple integer
    codeflash_output = json_safe(42) # 3.02μs -> 2.87μs (5.09% faster)

def test_basic_float():
    # Test with a simple float
    codeflash_output = json_safe(3.14159) # 2.96μs -> 2.85μs (3.89% faster)

def test_basic_str():
    # Test with a simple string
    codeflash_output = json_safe("hello") # 3.15μs -> 2.93μs (7.58% faster)

def test_basic_bool():
    # Test with a boolean value
    codeflash_output = json_safe(True) # 2.99μs -> 3.00μs (0.200% slower)
    codeflash_output = json_safe(False) # 1.32μs -> 1.31μs (0.152% faster)

def test_basic_none():
    # Test with None
    codeflash_output = json_safe(None) # 2.91μs -> 2.83μs (2.90% faster)

def test_basic_datetime():
    # Test with a datetime object
    dt = datetime(2024, 6, 1, 12, 30, 45)
    codeflash_output = json_safe(dt) # 4.88μs -> 2.41μs (103% faster)

def test_basic_date():
    # Test with a date object
    d = date(2024, 6, 1)
    codeflash_output = json_safe(d) # 3.87μs -> 1.65μs (135% faster)

def test_basic_list():
    # Test with a simple list of primitives
    codeflash_output = json_safe([1, "a", True]) # 7.42μs -> 7.50μs (1.11% slower)

def test_basic_dict():
    # Test with a simple dict of primitives
    codeflash_output = json_safe({"a": 1, "b": True}) # 7.63μs -> 7.99μs (4.56% slower)

def test_basic_nested_dict_list():
    # Test with nested dicts and lists
    data = {"a": [1, 2, {"b": datetime(2024, 6, 1)}]}
    expected = {"a": [1, 2, {"b": datetime(2024, 6, 1).isoformat()}]}
    codeflash_output = json_safe(data) # 11.9μs -> 11.2μs (6.49% faster)

def test_basic_tuple():
    # Test with a tuple
    codeflash_output = json_safe((1, 2, 3)) # 6.53μs -> 6.69μs (2.38% slower)

def test_basic_set():
    # Test with a set (order is not guaranteed)
    codeflash_output = json_safe({1, 2, 3}); result = codeflash_output # 6.41μs -> 6.51μs (1.60% slower)

def test_basic_bytes_bytearray():
    # Test with bytes and bytearray (should be returned as-is)
    b = b"abc"
    ba = bytearray(b"abc")
    codeflash_output = json_safe(b) # 3.09μs -> 3.01μs (2.93% faster)
    codeflash_output = json_safe(ba) # 1.67μs -> 1.57μs (6.64% faster)

# Edge Test Cases

def test_edge_empty_list():
    # Test with an empty list
    codeflash_output = json_safe([]) # 3.32μs -> 3.35μs (1.01% slower)

def test_edge_empty_dict():
    # Test with an empty dict
    codeflash_output = json_safe({}) # 2.62μs -> 2.67μs (2.02% slower)

def test_edge_empty_tuple():
    # Test with an empty tuple
    codeflash_output = json_safe(()) # 3.52μs -> 3.58μs (1.48% slower)

def test_edge_empty_set():
    # Test with an empty set
    codeflash_output = json_safe(set()) # 3.42μs -> 3.50μs (2.34% slower)

def test_edge_nested_empty_structures():
    # Test with nested empty structures
    data = {"a": [], "b": {}, "c": ([], {})}
    expected = {"a": [], "b": {}, "c": [[], {}]}
    codeflash_output = json_safe(data) # 12.0μs -> 12.3μs (2.27% slower)


def test_edge_dict_with_bytes_key():
    # bytes keys should be returned as-is
    d = {b"key": "value"}
    codeflash_output = json_safe(d) # 7.50μs -> 7.33μs (2.33% faster)

def test_edge_list_with_datetime_and_date():
    # List containing both datetime and date
    dt = datetime(2024, 6, 1, 15, 0, 0)
    d = date(2024, 6, 2)
    codeflash_output = json_safe([dt, d]) # 9.36μs -> 7.30μs (28.3% faster)

def test_edge_deeply_nested():
    # Deeply nested structure
    dt = datetime(2024, 6, 1)
    data = {"a": [{"b": [{"c": dt}]}]}
    expected = {"a": [{"b": [{"c": dt.isoformat()}]}]}
    codeflash_output = json_safe(data) # 12.7μs -> 12.0μs (6.18% faster)

def test_edge_iterable_not_str_bytes_bytearray():
    # Custom iterable that is not str/bytes/bytearray
    class MyIterable:
        def __iter__(self):
            return iter([1, 2, 3])
    codeflash_output = json_safe(MyIterable()) # 36.5μs -> 35.5μs (2.93% faster)

def test_edge_mapping_subclass():
    # Custom mapping subclass
    class MyMapping(dict):
        pass
    m = MyMapping({1: datetime(2024, 6, 1)})
    expected = {1: datetime(2024, 6, 1).isoformat()}
    codeflash_output = json_safe(m) # 8.87μs -> 8.41μs (5.50% faster)


def test_edge_tuple_with_mixed_types():
    # Tuple with mixed types
    dt = datetime(2024, 6, 1)
    d = date(2024, 6, 2)
    codeflash_output = json_safe((dt, d, 1, "a")) # 9.40μs -> 8.24μs (14.0% faster)

def test_edge_bytes_in_list():
    # List containing bytes and bytearray
    b = b"abc"
    ba = bytearray(b"def")
    codeflash_output = json_safe([b, ba]) # 6.02μs -> 5.67μs (6.15% faster)

def test_edge_bytes_in_dict():
    # Dict containing bytes and bytearray as values
    b = b"abc"
    ba = bytearray(b"def")
    codeflash_output = json_safe({"b": b, "ba": ba}) # 7.55μs -> 7.43μs (1.55% faster)

def test_edge_str_in_list():
    # List containing strings should remain unchanged
    codeflash_output = json_safe(["abc", "def"]) # 5.65μs -> 5.61μs (0.641% faster)

def test_edge_unicode_str():
    # Unicode string should remain unchanged
    codeflash_output = json_safe("你好,世界") # 2.94μs -> 2.89μs (1.56% faster)

def test_edge_large_integer():
    # Very large integer should remain unchanged
    large_int = 10**100
    codeflash_output = json_safe(large_int) # 2.94μs -> 2.85μs (3.16% faster)

def test_edge_large_float():
    # Very large float should remain unchanged
    large_float = 1.7976931348623157e+308
    codeflash_output = json_safe(large_float) # 2.77μs -> 2.85μs (2.74% slower)

def test_edge_none_in_list_and_dict():
    # None values in list and dict
    codeflash_output = json_safe([None, 1]) # 5.76μs -> 6.15μs (6.25% slower)
    codeflash_output = json_safe({"a": None, "b": 1}) # 5.41μs -> 5.70μs (5.14% slower)


def test_edge_dict_with_date_key():
    # Dict with date key should convert key to isoformat
    d = {date(2024, 6, 2): "value"}
    expected = {date(2024, 6, 2).isoformat(): "value"}
    codeflash_output = json_safe(d) # 7.69μs -> 7.01μs (9.71% faster)

def test_edge_dict_with_bool_key():
    # Dict with bool key should keep key as bool
    d = {True: "yes", False: "no"}
    codeflash_output = json_safe(d) # 7.82μs -> 7.92μs (1.33% slower)

# Large Scale Test Cases

def test_large_list_of_ints():
    # Large list of integers
    data = list(range(1000))
    codeflash_output = json_safe(data) # 743μs -> 740μs (0.475% faster)

def test_large_list_of_dates():
    # Large list of dates
    base = date(2024, 6, 1)
    data = [base + timedelta(days=i) for i in range(1000)]
    expected = [d.isoformat() for d in data]
    codeflash_output = json_safe(data) # 996μs -> 261μs (282% faster)

def test_large_dict_of_ints():
    # Large dict of ints
    data = {str(i): i for i in range(1000)}
    codeflash_output = json_safe(data) # 1.61ms -> 1.64ms (1.60% slower)

def test_large_dict_with_nested_lists():
    # Large dict with nested lists of dates
    base = datetime(2024, 6, 1)
    data = {str(i): [base + timedelta(days=i+j) for j in range(10)] for i in range(100)}
    expected = {str(i): [(base + timedelta(days=i+j)).isoformat() for j in range(10)] for i in range(100)}
    codeflash_output = json_safe(data) # 1.38ms -> 652μs (112% faster)

def test_large_nested_structure():
    # Large nested structure with dicts and lists
    base = date(2024, 6, 1)
    data = [{"idx": i, "dates": [base + timedelta(days=j) for j in range(10)]} for i in range(100)]
    expected = [{"idx": i, "dates": [(base + timedelta(days=j)).isoformat() for j in range(10)]} for i in range(100)]
    codeflash_output = json_safe(data) # 1.42ms -> 718μs (97.8% faster)

def test_large_set():
    # Large set of ints
    data = set(range(1000))
    codeflash_output = json_safe(data); result = codeflash_output # 744μs -> 746μs (0.281% slower)

def test_large_tuple():
    # Large tuple of ints
    data = tuple(range(1000))
    codeflash_output = json_safe(data); result = codeflash_output # 746μs -> 746μs (0.062% slower)


def test_large_dict_with_non_str_keys():
    # Large dict with int keys
    data = {i: i for i in range(1000)}
    codeflash_output = json_safe(data) # 1.53ms -> 1.52ms (0.559% faster)


#------------------------------------------------
from __future__ import annotations

from datetime import date, datetime
from typing import Iterable, Mapping

# imports
import pytest  # used for our unit tests
from anthropic._utils._utils import json_safe
from typing_extensions import TypeGuard

# unit tests

# 1. BASIC TEST CASES

def test_basic_int():
    # Should return int unchanged
    codeflash_output = json_safe(42) # 3.28μs -> 3.16μs (3.67% faster)

def test_basic_float():
    # Should return float unchanged
    codeflash_output = json_safe(3.14) # 2.93μs -> 2.99μs (2.01% slower)

def test_basic_str():
    # Should return string unchanged
    codeflash_output = json_safe("hello") # 3.16μs -> 3.11μs (1.51% faster)

def test_basic_bool():
    # Should return bool unchanged
    codeflash_output = json_safe(True) # 3.04μs -> 3.04μs (0.230% slower)
    codeflash_output = json_safe(False) # 1.33μs -> 1.32μs (0.758% faster)

def test_basic_none():
    # Should return None unchanged
    codeflash_output = json_safe(None) # 2.74μs -> 2.87μs (4.60% slower)

def test_basic_date():
    # Should convert date to ISO format string
    d = date(2023, 6, 1)
    codeflash_output = json_safe(d) # 4.51μs -> 2.07μs (118% faster)

def test_basic_datetime():
    # Should convert datetime to ISO format string
    dt = datetime(2023, 6, 1, 12, 30, 45)
    codeflash_output = json_safe(dt) # 4.43μs -> 2.24μs (98.0% faster)

def test_basic_list_of_ints():
    # Should recursively process lists
    codeflash_output = json_safe([1, 2, 3]) # 6.77μs -> 7.48μs (9.47% slower)

def test_basic_tuple_of_strs():
    # Should recursively process tuples as lists
    codeflash_output = json_safe(("a", "b", "c")) # 6.71μs -> 6.90μs (2.77% slower)

def test_basic_dict_str_int():
    # Should recursively process dicts
    codeflash_output = json_safe({"a": 1, "b": 2}) # 7.49μs -> 7.75μs (3.32% slower)

def test_basic_dict_with_date():
    # Should convert date values to ISO format
    d = date(2023, 6, 1)
    codeflash_output = json_safe({"today": d}) # 6.75μs -> 5.91μs (14.1% faster)

def test_basic_nested_dict_list():
    # Should recursively process nested structures
    dt = datetime(2023, 6, 1, 12, 30, 45)
    obj = {
        "dates": [date(2023, 1, 1), date(2023, 2, 2)],
        "info": {"dt": dt, "flag": True}
    }
    expected = {
        "dates": ["2023-01-01", "2023-02-02"],
        "info": {"dt": "2023-06-01T12:30:45", "flag": True}
    }
    codeflash_output = json_safe(obj) # 16.3μs -> 14.0μs (17.0% faster)

# 2. EDGE TEST CASES

def test_empty_list():
    # Should handle empty lists
    codeflash_output = json_safe([]) # 3.27μs -> 3.42μs (4.59% slower)

def test_empty_tuple():
    # Should handle empty tuples as empty lists
    codeflash_output = json_safe(()) # 3.44μs -> 3.40μs (1.12% faster)

def test_empty_dict():
    # Should handle empty dicts
    codeflash_output = json_safe({}) # 2.43μs -> 2.57μs (5.56% slower)

def test_dict_with_non_str_keys():
    # Should recursively process non-str keys
    d = {1: "one", date(2023, 1, 1): "date"}
    expected = {1: "one", "2023-01-01": "date"}
    codeflash_output = json_safe(d) # 8.93μs -> 8.14μs (9.72% faster)


def test_bytes_and_bytearray():
    # Should leave bytes and bytearray unchanged
    b = b"abc"
    ba = bytearray(b"xyz")
    codeflash_output = json_safe(b) # 4.49μs -> 4.40μs (2.23% faster)
    codeflash_output = json_safe(ba) # 1.78μs -> 1.58μs (12.6% faster)

def test_set_and_frozenset():
    # Should process sets and frozensets as lists
    s = {1, 2, 3}
    fs = frozenset([4, 5, 6])
    codeflash_output = json_safe(s); out1 = codeflash_output # 6.97μs -> 7.16μs (2.61% slower)
    codeflash_output = json_safe(fs); out2 = codeflash_output # 4.24μs -> 4.44μs (4.44% slower)


def test_custom_iterable():
    # Should process custom iterables as lists
    class MyIterable:
        def __iter__(self):
            yield "x"
            yield "y"
    codeflash_output = json_safe(MyIterable()) # 32.9μs -> 31.4μs (4.96% faster)

def test_custom_mapping():
    # Should process custom mappings as dicts
    class MyMapping(dict):
        pass
    mm = MyMapping({"a": date(2022, 1, 1), "b": 42})
    expected = {"a": "2022-01-01", "b": 42}
    codeflash_output = json_safe(mm) # 12.2μs -> 11.6μs (5.52% faster)

def test_nested_empty_structures():
    # Should handle nested empty lists/dicts
    obj = {"a": [], "b": {}}
    expected = {"a": [], "b": {}}
    codeflash_output = json_safe(obj) # 7.48μs -> 7.66μs (2.38% slower)

def test_deeply_nested_structure():
    # Should handle deep nesting
    obj = {"a": [{"b": [date(2020, 1, 1)]}]}
    expected = {"a": [{"b": ["2020-01-01"]}]}
    codeflash_output = json_safe(obj) # 11.2μs -> 10.3μs (8.53% faster)

def test_dict_with_mixed_types():
    # Should handle dicts with mixed value types
    obj = {"num": 1, "str": "x", "dt": datetime(2020, 1, 1, 1, 2, 3), "lst": [date(2020, 2, 2), None]}
    expected = {"num": 1, "str": "x", "dt": "2020-01-01T01:02:03", "lst": ["2020-02-02", None]}
    codeflash_output = json_safe(obj) # 15.5μs -> 14.2μs (9.43% faster)

def test_iterable_of_mappings():
    # Should handle list of dicts
    obj = [{"a": date(2020, 1, 1)}, {"b": datetime(2020, 1, 1, 1, 2, 3)}]
    expected = [{"a": "2020-01-01"}, {"b": "2020-01-01T01:02:03"}]
    codeflash_output = json_safe(obj) # 11.3μs -> 9.91μs (14.1% faster)

def test_mapping_of_iterables():
    # Should handle dict of lists
    obj = {"dates": [date(2020, 1, 1), date(2020, 2, 2)]}
    expected = {"dates": ["2020-01-01", "2020-02-02"]}
    codeflash_output = json_safe(obj) # 8.63μs -> 7.29μs (18.3% faster)

def test_string_like_iterable():
    # Should not treat str as iterable
    codeflash_output = json_safe("abc") # 2.85μs -> 2.68μs (6.12% faster)
    codeflash_output = json_safe(b"abc") # 1.54μs -> 1.51μs (2.26% faster)
    codeflash_output = json_safe(bytearray(b"abc")) # 1.51μs -> 1.33μs (13.7% faster)

def test_tuple_with_mixed_types():
    # Should process tuples and convert date/datetime
    t = (1, date(2021, 1, 1), "x", datetime(2022, 2, 2, 2, 2, 2))
    expected = [1, "2021-01-01", "x", "2022-02-02T02:02:02"]
    codeflash_output = json_safe(t) # 9.97μs -> 7.96μs (25.2% faster)

def test_dict_with_tuple_key_and_value():
    # Should fail if tuple key is converted to list (unhashable)
    d = {(1, 2): (date(2020, 1, 1),)}
    with pytest.raises(TypeError):
        json_safe(d) # 10.6μs -> 10.1μs (4.34% faster)

def test_dict_with_bytes_key():
    # Should leave bytes key unchanged
    d = {b"key": "value"}
    codeflash_output = json_safe(d) # 5.47μs -> 5.55μs (1.33% slower)


def test_large_list_of_ints():
    # Should handle large lists efficiently
    large_list = list(range(1000))
    codeflash_output = json_safe(large_list) # 743μs -> 746μs (0.335% slower)

def test_large_list_of_dates():
    # Should convert all dates to ISO format in large list
    large_dates = [date(2020, 1, 1) for _ in range(1000)]
    expected = ["2020-01-01"] * 1000
    codeflash_output = json_safe(large_dates) # 1.01ms -> 262μs (283% faster)

def test_large_dict_of_str_to_int():
    # Should handle large dicts efficiently
    large_dict = {str(i): i for i in range(1000)}
    codeflash_output = json_safe(large_dict) # 1.61ms -> 1.65ms (2.14% slower)

def test_large_dict_with_dates():
    # Should convert date values in large dict
    large_dict = {str(i): date(2020, 1, 1) for i in range(1000)}
    expected = {str(i): "2020-01-01" for i in range(1000)}
    codeflash_output = json_safe(large_dict) # 1.88ms -> 1.15ms (63.4% faster)

def test_large_nested_structure():
    # Should handle large nested lists of dicts with dates
    large_nested = [{"d": date(2020, 1, 1)} for _ in range(1000)]
    expected = [{"d": "2020-01-01"} for _ in range(1000)]
    codeflash_output = json_safe(large_nested) # 2.45ms -> 1.83ms (34.3% faster)

def test_large_mixed_structure():
    # Should handle large dict of lists of mixed types
    large_mixed = {str(i): [i, date(2020, 1, 1), None] for i in range(1000)}
    expected = {str(i): [i, "2020-01-01", None] for i in range(1000)}
    codeflash_output = json_safe(large_mixed) # 4.41ms -> 3.84ms (15.0% faster)

def test_large_deeply_nested_structure():
    # Should handle deep nesting with moderate size
    obj = {"a": [{"b": [{"c": date(2020, 1, 1)} for _ in range(10)]} for _ in range(10)]}
    expected = {"a": [{"b": [{"c": "2020-01-01"} for _ in range(10)]} for _ in range(10)]}
    codeflash_output = json_safe(obj) # 275μs -> 212μs (29.7% faster)
# codeflash_output is used to check that the output of the original code is the same as that of the optimized code.
#------------------------------------------------
from anthropic._utils._utils import json_safe

def test_json_safe():
    json_safe(datetime.date(1, 6, 1))

def test_json_safe_2():
    json_safe('')
🔎 Concolic Coverage Tests and Runtime
Test File::Test Function Original ⏱️ Optimized ⏱️ Speedup
codeflash_concolic_6zuacb2h/tmpor0tq1if/test_concolic_coverage.py::test_json_safe_2 5.05μs 4.72μs 6.88%✅

To edit these changes git checkout codeflash/optimize-json_safe-mhe1zwcb and push.

Codeflash Static Badge

The optimization achieves a **31% speedup** by reordering the conditional checks in the `json_safe` function to prioritize the most efficient path for datetime/date objects.

**Key Change**: The `isinstance(data, (datetime, date))` check is moved from the bottom to the top of the conditional chain.

**Why This Works**:
- **Fast-path optimization**: Date/datetime objects are handled with a simple `isoformat()` call and early return, avoiding the more expensive `is_mapping()` and `is_iterable()` function calls
- **Reduced function call overhead**: In the original code, date/datetime objects still triggered calls to `is_mapping()` (26,278 hits) and `is_iterable()` (25,027 hits). The optimized version reduces these to 18,149 and 16,898 hits respectively
- **Better branch prediction**: The most common non-recursive case is handled first, reducing the average number of conditions evaluated per call

**Performance by Test Case Type**:
- **Massive gains for date-heavy workloads**: Tests with many dates show 97-283% speedups (e.g., `test_large_list_of_dates` goes from 1.01ms to 262μs)
- **Moderate gains for mixed structures**: Nested objects containing dates see 14-29% improvements
- **Minimal impact on date-free data**: Simple collections show small regressions (1-9% slower) due to the additional datetime check, but this is vastly outweighed by the gains

The optimization is particularly effective for JSON serialization scenarios where date/datetime objects are common, which aligns with the function's purpose of translating data "in the same fashion as `pydantic` v2's `model_dump(mode="json")`".
@codeflash-ai codeflash-ai bot requested a review from mashraf-222 October 30, 2025 23:26
@codeflash-ai codeflash-ai bot added ⚡️ codeflash Optimization PR opened by Codeflash AI 🎯 Quality: High Optimization Quality according to Codeflash labels Oct 30, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

⚡️ codeflash Optimization PR opened by Codeflash AI 🎯 Quality: High Optimization Quality according to Codeflash

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant