Skip to content

Conversation

@codeflash-ai
Copy link

@codeflash-ai codeflash-ai bot commented Oct 29, 2025

📄 13% (0.13x) speedup for BookStackDataSource.get_page in backend/python/app/sources/external/bookstack/bookstack.py

⏱️ Runtime : 3.70 milliseconds 3.27 milliseconds (best of 230 runs)

📝 Explanation and details

The optimized code achieves a 13% runtime speedup through three key micro-optimizations that reduce object allocations and string operations:

1. Conditional URL formatting (HTTPClient): Instead of always calling request.url.format(**request.path_params), the code checks if path_params exists first. When empty (common case), it uses the URL directly, avoiding unnecessary string formatting overhead.

2. Smart header merging (HTTPClient): Rather than always creating a new dictionary with {**self.headers, **request.headers}, it now checks if request.headers is empty. If so, it reuses self.headers directly. When merging is needed, it uses the more efficient copy() + update() pattern, reducing dictionary allocations.

3. Direct header reference (BookStack): Eliminates the dict(self.http.headers) copy operation by using self.http.headers directly since headers aren't modified. This saves ~1,100 dictionary allocations in typical usage.

4. f-string URL construction (BookStack): Replaces string concatenation + .format() with a single f-string operation (f"{self.base_url}/api/pages/{id}"), which is faster for simple interpolation.

The line profiler shows the biggest gains in URL construction (609µs → 385µs) and header handling (334µs → 248µs) operations. These optimizations are particularly effective for high-throughput scenarios where the same patterns repeat frequently, as evidenced by the consistent 1.3% throughput improvement across concurrent test cases with 10-500 requests.

Correctness verification report:

Test Status
⚙️ Existing Unit Tests 🔘 None Found
🌀 Generated Regression Tests 1108 Passed
⏪ Replay Tests 🔘 None Found
🔎 Concolic Coverage Tests 🔘 None Found
📊 Tests Coverage 88.9%
🌀 Generated Regression Tests and Runtime
import asyncio  # used to run async functions
from typing import Any, Dict, Optional, Union

import pytest  # used for our unit tests
from app.sources.external.bookstack.bookstack import BookStackDataSource

# --- Minimal stubs for dependencies (simulate BookStack API and HTTP client) ---

class HTTPRequest:
    def __init__(self, method: str, url: str, headers: Dict[str, str], query_params: Dict[str, Any], body: Any):
        self.method = method
        self.url = url
        self.headers = headers
        self.query_params = query_params
        self.body = body
        self.path_params = {}

class DummyHTTPClient:
    """Simulate HTTPClient with async execute method."""
    def __init__(self, responses: Dict[str, Any], raise_for: Optional[set]=None, headers: Optional[Dict[str, str]]=None):
        self.responses = responses
        self.raise_for = raise_for or set()
        self.headers = headers or {"Authorization": "Token dummy:dummy", "Content-Type": "application/json", "Accept": "application/json"}

    async def execute(self, request: HTTPRequest, **kwargs):
        # Simulate error for certain URLs
        if request.url in self.raise_for:
            raise RuntimeError(f"Simulated error for {request.url}")
        # Return wrapped HTTPResponse
        return HTTPResponse(self.responses.get(request.url, {"id": 0, "name": "Unknown", "content": ""}))

    def get_base_url(self):
        return "https://bookstack.example.com"

# BookStackResponse as expected by get_page
class BookStackResponse:
    def __init__(self, success: bool, data: Any = None, error: Optional[str] = None):
        self.success = success
        self.data = data
        self.error = error

# BookStackClient stub
class BookStackClient:
    def __init__(self, http_client: DummyHTTPClient):
        self._client = http_client

    def get_client(self):
        return self._client
from app.sources.external.bookstack.bookstack import BookStackDataSource

# --- UNIT TESTS ---

# 1. BASIC TEST CASES

@pytest.mark.asyncio
async def test_get_page_returns_expected_success():
    """Test that get_page returns expected data for a valid page id."""
    # Arrange
    page_id = 42
    expected_data = {"id": page_id, "name": "My Page", "content": "Hello world"}
    url = "https://bookstack.example.com/api/pages/42"
    responses = {url: expected_data}
    client = BookStackClient(DummyHTTPClient(responses))
    datasource = BookStackDataSource(client)

    # Act
    result = await datasource.get_page(page_id)

@pytest.mark.asyncio
async def test_get_page_returns_unknown_for_missing_page():
    """Test that get_page returns default data if page is not found in responses."""
    # Arrange
    page_id = 99
    url = "https://bookstack.example.com/api/pages/99"
    responses = {}  # No entry for this page
    client = BookStackClient(DummyHTTPClient(responses))
    datasource = BookStackDataSource(client)

    # Act
    result = await datasource.get_page(page_id)

@pytest.mark.asyncio
async def test_get_page_async_await_behavior():
    """Test that get_page can be awaited and returns a coroutine."""
    # Arrange
    page_id = 7
    url = "https://bookstack.example.com/api/pages/7"
    responses = {url: {"id": 7, "name": "Async Page", "content": "Async"}}
    client = BookStackClient(DummyHTTPClient(responses))
    datasource = BookStackDataSource(client)

    # Act
    codeflash_output = datasource.get_page(page_id); coro = codeflash_output
    result = await coro

# 2. EDGE TEST CASES

@pytest.mark.asyncio
async def test_get_page_handles_exception_and_returns_error():
    """Test that get_page returns success=False and error message on exception."""
    # Arrange
    page_id = 13
    url = "https://bookstack.example.com/api/pages/13"
    responses = {}
    raise_for = {url}
    client = BookStackClient(DummyHTTPClient(responses, raise_for=raise_for))
    datasource = BookStackDataSource(client)

    # Act
    result = await datasource.get_page(page_id)

@pytest.mark.asyncio
async def test_get_page_with_non_int_id_casts_to_str_in_url():
    """Test that get_page works with non-int id (e.g., string convertible to int)."""
    # Arrange
    page_id = "21"  # string, but should be accepted as int-like
    url = "https://bookstack.example.com/api/pages/21"
    expected_data = {"id": 21, "name": "String ID Page", "content": "String"}
    responses = {url: expected_data}
    client = BookStackClient(DummyHTTPClient(responses))
    datasource = BookStackDataSource(client)

    # Act
    # The function expects int, but test user error
    result = await datasource.get_page(int(page_id))

@pytest.mark.asyncio
async def test_get_page_concurrent_requests():
    """Test concurrent calls to get_page with different ids."""
    # Arrange
    ids = [1, 2, 3, 4, 5]
    responses = {
        f"https://bookstack.example.com/api/pages/{i}": {"id": i, "name": f"Page {i}", "content": f"Content {i}"}
        for i in ids
    }
    client = BookStackClient(DummyHTTPClient(responses))
    datasource = BookStackDataSource(client)

    # Act
    results = await asyncio.gather(*(datasource.get_page(i) for i in ids))

    # Assert
    for i, result in zip(ids, results):
        pass

@pytest.mark.asyncio
async def test_get_page_handles_missing_http_client():
    """Test that get_page raises ValueError if HTTP client is missing."""
    # Arrange
    class DummyBookStackClientNoHttp:
        def get_client(self):
            return None
    # Act & Assert
    with pytest.raises(ValueError) as excinfo:
        BookStackDataSource(DummyBookStackClientNoHttp())

@pytest.mark.asyncio
async def test_get_page_handles_missing_base_url_method():
    """Test that get_page raises ValueError if HTTP client lacks get_base_url."""
    class DummyHttpNoBaseUrl:
        headers = {}
    class DummyBookStackClient:
        def get_client(self):
            return DummyHttpNoBaseUrl()
    with pytest.raises(ValueError) as excinfo:
        BookStackDataSource(DummyBookStackClient())

# 3. LARGE SCALE TEST CASES

@pytest.mark.asyncio
async def test_get_page_many_concurrent_requests():
    """Test get_page under moderate concurrent load (50 requests)."""
    ids = list(range(50))
    responses = {
        f"https://bookstack.example.com/api/pages/{i}": {"id": i, "name": f"Page {i}", "content": f"Content {i}"}
        for i in ids
    }
    client = BookStackClient(DummyHTTPClient(responses))
    datasource = BookStackDataSource(client)

    # Act
    results = await asyncio.gather(*(datasource.get_page(i) for i in ids))
    for i, result in zip(ids, results):
        pass

@pytest.mark.asyncio
async def test_get_page_large_id_values():
    """Test get_page with very large page id values."""
    large_id = 999_999_999
    url = f"https://bookstack.example.com/api/pages/{large_id}"
    expected_data = {"id": large_id, "name": "Large Page", "content": "Big"}
    responses = {url: expected_data}
    client = BookStackClient(DummyHTTPClient(responses))
    datasource = BookStackDataSource(client)

    result = await datasource.get_page(large_id)

# 4. THROUGHPUT TEST CASES

@pytest.mark.asyncio
async def test_get_page_throughput_small_load():
    """Throughput: Test get_page with small batch (10 concurrent requests)."""
    ids = list(range(10))
    responses = {
        f"https://bookstack.example.com/api/pages/{i}": {"id": i, "name": f"Page {i}", "content": f"Content {i}"}
        for i in ids
    }
    client = BookStackClient(DummyHTTPClient(responses))
    datasource = BookStackDataSource(client)

    results = await asyncio.gather(*(datasource.get_page(i) for i in ids))

@pytest.mark.asyncio
async def test_get_page_throughput_medium_load():
    """Throughput: Test get_page with medium batch (100 concurrent requests)."""
    ids = list(range(100))
    responses = {
        f"https://bookstack.example.com/api/pages/{i}": {"id": i, "name": f"Page {i}", "content": f"Content {i}"}
        for i in ids
    }
    client = BookStackClient(DummyHTTPClient(responses))
    datasource = BookStackDataSource(client)

    results = await asyncio.gather(*(datasource.get_page(i) for i in ids))

@pytest.mark.asyncio
async def test_get_page_throughput_with_some_failures():
    """Throughput: Test get_page with a mix of successes and simulated failures."""
    ids = list(range(20))
    fail_ids = {5, 10, 15}
    responses = {
        f"https://bookstack.example.com/api/pages/{i}": {"id": i, "name": f"Page {i}", "content": f"Content {i}"}
        for i in ids if i not in fail_ids
    }
    raise_for = {f"https://bookstack.example.com/api/pages/{i}" for i in fail_ids}
    client = BookStackClient(DummyHTTPClient(responses, raise_for=raise_for))
    datasource = BookStackDataSource(client)

    results = await asyncio.gather(*(datasource.get_page(i) for i in ids))
    for i, result in zip(ids, results):
        if i in fail_ids:
            pass
        else:
            pass

@pytest.mark.asyncio
async def test_get_page_throughput_high_volume():
    """Throughput: Test get_page under high volume (500 concurrent requests)."""
    ids = list(range(500))
    responses = {
        f"https://bookstack.example.com/api/pages/{i}": {"id": i, "name": f"Page {i}", "content": f"Content {i}"}
        for i in ids
    }
    client = BookStackClient(DummyHTTPClient(responses))
    datasource = BookStackDataSource(client)

    results = await asyncio.gather(*(datasource.get_page(i) for i in ids))
    for i, result in zip(ids, results):
        pass
# codeflash_output is used to check that the output of the original code is the same as that of the optimized code.
#------------------------------------------------
import asyncio  # used to run async functions

import pytest  # used for our unit tests
from app.sources.external.bookstack.bookstack import BookStackDataSource


# Minimal HTTPRequest and BookStackResponse for testing
class HTTPRequest:
    def __init__(self, method, url, headers=None, query_params=None, body=None, path_params=None):
        self.method = method
        self.url = url
        self.headers = headers or {}
        self.query_params = query_params or {}
        self.body = body
        self.path_params = path_params or {}

class BookStackResponse:
    def __init__(self, success: bool, data=None, error=None):
        self.success = success
        self.data = data
        self.error = error

# Dummy httpx.Response for testing
class DummyResponse:
    def __init__(self, json_data, status_code=200):
        self._json_data = json_data
        self.status_code = status_code

    def json(self):
        return self._json_data

# Minimal HTTPClient for testing
class DummyHTTPClient:
    def __init__(self, response_map=None, raise_map=None):
        self.headers = {"Authorization": "Token testtoken"}
        self.response_map = response_map or {}
        self.raise_map = raise_map or {}
        self.base_url = "https://bookstack.example.com"
        self._closed = False

    def get_base_url(self):
        return self.base_url

    async def execute(self, request, **kwargs):
        # Simulate raising exceptions for certain URLs
        if request.url in self.raise_map:
            raise self.raise_map[request.url]
        # Simulate returning a dummy response for certain URLs
        data = self.response_map.get(request.url, {"id": 1, "name": "Test Page"})
        return HTTPResponse(DummyResponse(data))

    def close(self):
        self._closed = True

class BookStackClient:
    def __init__(self, client):
        self.client = client

    def get_client(self):
        return self.client
from app.sources.external.bookstack.bookstack import BookStackDataSource

# ========== UNIT TESTS ==========

# ----------- BASIC TEST CASES -----------

@pytest.mark.asyncio
async def test_get_page_basic_success():
    """Test basic successful retrieval of a page."""
    # Setup: Dummy client returns a valid page
    page_id = 42
    url = "https://bookstack.example.com/api/pages/{}".format(page_id)
    dummy_client = DummyHTTPClient(response_map={url: {"id": page_id, "title": "Hello World"}})
    datasource = BookStackDataSource(BookStackClient(dummy_client))

    # Act
    result = await datasource.get_page(page_id)

@pytest.mark.asyncio
async def test_get_page_basic_async_await_behavior():
    """Test that get_page returns a coroutine and must be awaited."""
    page_id = 1
    url = "https://bookstack.example.com/api/pages/{}".format(page_id)
    dummy_client = DummyHTTPClient(response_map={url: {"id": page_id}})
    datasource = BookStackDataSource(BookStackClient(dummy_client))

    # Act
    codeflash_output = datasource.get_page(page_id); coro = codeflash_output
    result = await coro

# ----------- EDGE TEST CASES -----------

@pytest.mark.asyncio
async def test_get_page_nonexistent_page():
    """Test get_page with a page ID that does not exist (simulate error)."""
    page_id = 9999
    url = "https://bookstack.example.com/api/pages/{}".format(page_id)
    # Simulate HTTPClient raising an exception for this ID
    dummy_client = DummyHTTPClient(raise_map={url: Exception("Page not found")})
    datasource = BookStackDataSource(BookStackClient(dummy_client))

    result = await datasource.get_page(page_id)

@pytest.mark.asyncio
async def test_get_page_invalid_id_type():
    """Test get_page with an invalid id type (string instead of int)."""
    page_id = "not-an-int"
    url = "https://bookstack.example.com/api/pages/{}".format(page_id)
    dummy_client = DummyHTTPClient(response_map={url: {"id": page_id}})
    datasource = BookStackDataSource(BookStackClient(dummy_client))

    # Should still succeed because the function does not type-check id
    result = await datasource.get_page(page_id)

@pytest.mark.asyncio
async def test_get_page_concurrent_execution():
    """Test concurrent calls to get_page for different IDs."""
    ids = [10, 20, 30]
    urls = ["https://bookstack.example.com/api/pages/{}".format(i) for i in ids]
    response_map = {url: {"id": i, "title": f"Page {i}"} for url, i in zip(urls, ids)}
    dummy_client = DummyHTTPClient(response_map=response_map)
    datasource = BookStackDataSource(BookStackClient(dummy_client))

    # Run concurrently
    results = await asyncio.gather(*(datasource.get_page(i) for i in ids))

    for result, i in zip(results, ids):
        pass

@pytest.mark.asyncio
async def test_get_page_exception_handling():
    """Test that get_page returns a failed response if the HTTP client raises an exception."""
    page_id = 123
    url = "https://bookstack.example.com/api/pages/{}".format(page_id)
    dummy_client = DummyHTTPClient(raise_map={url: RuntimeError("Network error")})
    datasource = BookStackDataSource(BookStackClient(dummy_client))

    result = await datasource.get_page(page_id)

@pytest.mark.asyncio
async def test_get_page_empty_headers():
    """Test get_page when the HTTP client returns empty headers."""
    page_id = 55
    url = "https://bookstack.example.com/api/pages/{}".format(page_id)
    dummy_client = DummyHTTPClient(response_map={url: {"id": page_id}})
    dummy_client.headers = {}  # Empty headers
    datasource = BookStackDataSource(BookStackClient(dummy_client))

    result = await datasource.get_page(page_id)

# ----------- LARGE SCALE TEST CASES -----------

@pytest.mark.asyncio
async def test_get_page_many_concurrent_requests():
    """Test many concurrent get_page requests (up to 100)."""
    ids = list(range(1, 101))
    urls = ["https://bookstack.example.com/api/pages/{}".format(i) for i in ids]
    response_map = {url: {"id": i, "title": f"Page {i}"} for url, i in zip(urls, ids)}
    dummy_client = DummyHTTPClient(response_map=response_map)
    datasource = BookStackDataSource(BookStackClient(dummy_client))

    results = await asyncio.gather(*(datasource.get_page(i) for i in ids))

    for i, result in zip(ids, results):
        pass

@pytest.mark.asyncio
async def test_get_page_large_id_value():
    """Test get_page with a very large page ID."""
    page_id = 987654321
    url = "https://bookstack.example.com/api/pages/{}".format(page_id)
    dummy_client = DummyHTTPClient(response_map={url: {"id": page_id, "title": "Big Page"}})
    datasource = BookStackDataSource(BookStackClient(dummy_client))

    result = await datasource.get_page(page_id)

# ----------- THROUGHPUT TEST CASES -----------

@pytest.mark.asyncio
async def test_get_page_throughput_small_load():
    """Throughput test: small load of concurrent requests."""
    ids = list(range(1, 11))
    urls = ["https://bookstack.example.com/api/pages/{}".format(i) for i in ids]
    response_map = {url: {"id": i, "title": f"Page {i}"} for url, i in zip(urls, ids)}
    dummy_client = DummyHTTPClient(response_map=response_map)
    datasource = BookStackDataSource(BookStackClient(dummy_client))

    results = await asyncio.gather(*(datasource.get_page(i) for i in ids))
    for i, result in zip(ids, results):
        pass

@pytest.mark.asyncio
async def test_get_page_throughput_medium_load():
    """Throughput test: medium load of concurrent requests."""
    ids = list(range(1, 51))
    urls = ["https://bookstack.example.com/api/pages/{}".format(i) for i in ids]
    response_map = {url: {"id": i, "title": f"Page {i}"} for url, i in zip(urls, ids)}
    dummy_client = DummyHTTPClient(response_map=response_map)
    datasource = BookStackDataSource(BookStackClient(dummy_client))

    results = await asyncio.gather(*(datasource.get_page(i) for i in ids))
    for i, result in zip(ids, results):
        pass

@pytest.mark.asyncio
async def test_get_page_throughput_high_volume():
    """Throughput test: high volume of concurrent requests (up to 200)."""
    ids = list(range(1, 201))
    urls = ["https://bookstack.example.com/api/pages/{}".format(i) for i in ids]
    response_map = {url: {"id": i, "title": f"Page {i}"} for url, i in zip(urls, ids)}
    dummy_client = DummyHTTPClient(response_map=response_map)
    datasource = BookStackDataSource(BookStackClient(dummy_client))

    results = await asyncio.gather(*(datasource.get_page(i) for i in ids))
    for i, result in zip(ids, results):
        pass

@pytest.mark.asyncio
async def test_get_page_throughput_mixed_success_failure():
    """Throughput test: mixed successful and failed requests."""
    ids = list(range(1, 21))
    urls = ["https://bookstack.example.com/api/pages/{}".format(i) for i in ids]
    response_map = {url: {"id": i, "title": f"Page {i}"} for url, i in zip(urls, ids) if i % 2 == 0}
    raise_map = {url: Exception("Page not found") for url, i in zip(urls, ids) if i % 2 == 1}
    dummy_client = DummyHTTPClient(response_map=response_map, raise_map=raise_map)
    datasource = BookStackDataSource(BookStackClient(dummy_client))

    results = await asyncio.gather(*(datasource.get_page(i) for i in ids))

    for i, result in zip(ids, results):
        if i % 2 == 0:
            pass
        else:
            pass
# codeflash_output is used to check that the output of the original code is the same as that of the optimized code.
#------------------------------------------------
from app.sources.external.bookstack.bookstack import BookStackDataSource

To edit these changes git checkout codeflash/optimize-BookStackDataSource.get_page-mhbkgp9o and push.

Codeflash

The optimized code achieves a 13% runtime speedup through three key micro-optimizations that reduce object allocations and string operations:

**1. Conditional URL formatting** (HTTPClient): Instead of always calling `request.url.format(**request.path_params)`, the code checks if `path_params` exists first. When empty (common case), it uses the URL directly, avoiding unnecessary string formatting overhead.

**2. Smart header merging** (HTTPClient): Rather than always creating a new dictionary with `{**self.headers, **request.headers}`, it now checks if `request.headers` is empty. If so, it reuses `self.headers` directly. When merging is needed, it uses the more efficient `copy()` + `update()` pattern, reducing dictionary allocations.

**3. Direct header reference** (BookStack): Eliminates the `dict(self.http.headers)` copy operation by using `self.http.headers` directly since headers aren't modified. This saves ~1,100 dictionary allocations in typical usage.

**4. f-string URL construction** (BookStack): Replaces string concatenation + `.format()` with a single f-string operation (`f"{self.base_url}/api/pages/{id}"`), which is faster for simple interpolation.

The line profiler shows the biggest gains in URL construction (609µs → 385µs) and header handling (334µs → 248µs) operations. These optimizations are particularly effective for high-throughput scenarios where the same patterns repeat frequently, as evidenced by the consistent 1.3% throughput improvement across concurrent test cases with 10-500 requests.
@codeflash-ai codeflash-ai bot requested a review from mashraf-222 October 29, 2025 05:40
@codeflash-ai codeflash-ai bot added ⚡️ codeflash Optimization PR opened by Codeflash AI 🎯 Quality: High Optimization Quality according to Codeflash labels Oct 29, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

⚡️ codeflash Optimization PR opened by Codeflash AI 🎯 Quality: High Optimization Quality according to Codeflash

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant