Skip to content

Conversation

@codeflash-ai
Copy link

@codeflash-ai codeflash-ai bot commented Oct 29, 2025

📄 7% (0.07x) speedup for BookStackDataSource.export_page_plaintext in backend/python/app/sources/external/bookstack/bookstack.py

⏱️ Runtime : 874 microseconds 818 microseconds (best of 170 runs)

📝 Explanation and details

Explanation

Optimizations Applied.

  1. HTTPClient:

    • Avoided redundant string formatting for url and improved header merging using in-place updates for performance and memory.
    • Pass None for parameters instead of creating empty dicts when possible, reducing unnecessary object creation.
    • Refactored body handling logic for fast path checks, reducing dictionary lookups.
    • Used direct mapping instead of dict constructor for copying headers if there are no modifications needed.
    • The creation and merging of dictionaries in request arguments occurs only when necessary.
  2. HTTPResponse:

    • Cached content_type as a property with lazy initialization to reduce repeated lookups.
    • Used property for accessing the content rather than recomputing.
  3. BookStackDataSource:

    • Minimized dictionary creations for headers and query parameters (used the original reference directly for headers since HTTPClient creates it once, and avoids the dict() call).
    • Used f-string for URL generation instead of .format, which is faster for single variable replacements.
    • Inline access to headers using the reference already available, which avoids unnecessary copying.
    • Moved param initialization outside the hot path (params = {} -> None since GET requests do not require actual params in this implementation).
    • Reduced object allocations by reusing header and param references.

In sum, these changes minimize unnecessary object creation, dictionary/array copies, and optimize hot path (request execution). These produce measurable improvements, particularly under high-throughput scenarios.


Correctness verification report:

Test Status
⚙️ Existing Unit Tests 🔘 None Found
🌀 Generated Regression Tests 223 Passed
⏪ Replay Tests 🔘 None Found
🔎 Concolic Coverage Tests 🔘 None Found
📊 Tests Coverage 100.0%
🌀 Generated Regression Tests and Runtime
import asyncio  # used to run async functions
import base64

import pytest  # used for our unit tests
from app.sources.client.bookstack.bookstack import (
    BookStackClient, BookStackRESTClientViaToken)
from app.sources.client.http.http_request import HTTPRequest
from app.sources.external.bookstack.bookstack import BookStackDataSource


# Mock classes for HTTPResponse and HTTPClient
class MockHTTPResponse:
    def __init__(self, content: bytes, content_type: str = "text/plain"):
        self.content = content
        self.content_type = content_type

    def bytes(self) -> bytes:
        return self.content

class MockHTTPClient:
    def __init__(self, content: bytes = b"Hello, world!", content_type: str = "text/plain", raise_exc: Exception = None):
        self.headers = {"Content-Type": content_type, "Accept": "application/json"}
        self._content = content
        self._content_type = content_type
        self._raise_exc = raise_exc
        self.base_url = "http://mock.bookstack.local"

    def get_base_url(self):
        return self.base_url

    async def execute(self, request: HTTPRequest, **kwargs):
        # Simulate error if requested
        if self._raise_exc:
            raise self._raise_exc
        return MockHTTPResponse(self._content, self._content_type)

# Mock BookStackClient
class MockBookStackClient(BookStackClient):
    def __init__(self, http_client):
        self.client = http_client

    def get_client(self):
        return self.client

# Helper to create BookStackDataSource with mock client
def make_datasource(content: bytes = b"Hello, world!", content_type: str = "text/plain", raise_exc: Exception = None):
    http_client = MockHTTPClient(content=content, content_type=content_type, raise_exc=raise_exc)
    client = MockBookStackClient(http_client)
    return BookStackDataSource(client)

# Basic Test Cases

@pytest.mark.asyncio

async def test_export_page_plaintext_basic_empty_content():
    """Test export with empty content."""
    ds = make_datasource(content=b"", content_type="text/plain")
    result = await ds.export_page_plaintext(1)

@pytest.mark.asyncio
async def test_export_page_plaintext_basic_different_content_type():
    """Test export with a different content type."""
    ds = make_datasource(content=b"PDFDATA", content_type="application/pdf")
    result = await ds.export_page_plaintext(99)

# Edge Test Cases

@pytest.mark.asyncio
async def test_export_page_plaintext_edge_exception_handling():
    """Test that exception in HTTP client is handled gracefully."""
    ds = make_datasource(raise_exc=RuntimeError("Network failure"))
    result = await ds.export_page_plaintext(123)

@pytest.mark.asyncio
async def test_export_page_plaintext_edge_invalid_page_id():
    """Test export with invalid page ID (simulate error)."""
    ds = make_datasource(raise_exc=ValueError("Invalid page ID"))
    result = await ds.export_page_plaintext(-1)

@pytest.mark.asyncio
async def test_export_page_plaintext_edge_non_ascii_content():
    """Test export with non-ASCII content."""
    content = "你好,世界".encode("utf-8")
    ds = make_datasource(content=content, content_type="text/plain")
    result = await ds.export_page_plaintext(88)

@pytest.mark.asyncio
async def test_export_page_plaintext_edge_concurrent_execution():
    """Test concurrent execution of exports for different pages."""
    ds = make_datasource(content=b"Concurrent", content_type="text/plain")
    ids = [1, 2, 3, 4]
    coros = [ds.export_page_plaintext(page_id) for page_id in ids]
    results = await asyncio.gather(*coros)
    for result in results:
        pass

@pytest.mark.asyncio
async def test_export_page_plaintext_edge_large_binary_content():
    """Test export with large binary content (under 1000 bytes)."""
    large_content = b"A" * 999
    ds = make_datasource(content=large_content, content_type="application/octet-stream")
    result = await ds.export_page_plaintext(555)

# Large Scale Test Cases

@pytest.mark.asyncio

async def test_export_page_plaintext_large_scale_varied_content_types():
    """Test concurrent exports with varied content types."""
    types = ["text/plain", "application/pdf", "application/octet-stream", "text/markdown"]
    contents = [b"plain", b"%PDF-1.4", b"\x00\x01\x02", b"# Markdown"]
    coros = []
    for i, (ct, c) in enumerate(zip(types, contents)):
        ds = make_datasource(content=c, content_type=ct)
        coros.append(ds.export_page_plaintext(i + 100))
    results = await asyncio.gather(*coros)
    for i, result in enumerate(results):
        pass

# Throughput Test Cases

@pytest.mark.asyncio


async def test_export_page_plaintext_throughput_high_volume():
    """Throughput test: high volume of 100 concurrent exports."""
    ds = make_datasource(content=b"high", content_type="text/plain")
    coros = [ds.export_page_plaintext(i) for i in range(100)]
    results = await asyncio.gather(*coros)
    for result in results:
        pass
# codeflash_output is used to check that the output of the original code is the same as that of the optimized code.
#------------------------------------------------
import asyncio  # used to run async functions
# Function under test (EXACT COPY)
import base64
from typing import Dict, Union

import pytest  # used for our unit tests
from app.sources.external.bookstack.bookstack import BookStackDataSource


# Mocks and minimal stubs for dependencies
class DummyResponse:
    # Simulate httpx.Response for HTTPResponse
    def __init__(self, content: bytes, content_type: str = "text/plain"):
        self.content = content
        self.headers = {"Content-Type": content_type}

    @property
    def content_type(self):
        return self.headers.get("Content-Type", "text/plain")

class HTTPRequest:
    def __init__(self, method, url, headers, query_params, body):
        self.method = method
        self.url = url
        self.headers = headers
        self.query_params = query_params
        self.body = body
        self.path_params = {}

class DummyHTTPClient:
    def __init__(self):
        self.headers = {"Content-Type": "text/plain", "Accept": "text/plain"}
        self._base_url = "http://bookstack.test"

    def get_base_url(self):
        return self._base_url

    async def execute(self, request, **kwargs):
        # Simulate different responses based on request.url
        if "export/plaintext" in request.url:
            # Simulate successful export
            page_id = request.url.split("/")[-3]
            if page_id == "0":
                # Simulate not found
                raise Exception("Page not found")
            elif page_id == "999999":
                # Simulate large page
                content = b"A" * 1000
            elif page_id == "error":
                raise Exception("Simulated error")
            else:
                content = f"Plaintext content for page {page_id}".encode("utf-8")
            return HTTPResponse(DummyResponse(content, "text/plain"))
        raise Exception("Unknown endpoint")

class BookStackRESTClientViaToken(DummyHTTPClient):
    def __init__(self, base_url, token_id, token_secret):
        super().__init__()
        self._base_url = base_url

class BookStackClient:
    def __init__(self, client):
        self.client = client

    def get_client(self):
        return self.client

class BookStackResponse:
    def __init__(self, success, data=None, error=None):
        self.success = success
        self.data = data
        self.error = error
from app.sources.external.bookstack.bookstack import BookStackDataSource

# ------------------- UNIT TESTS -------------------

# Helper to create a data source for tests
def make_datasource(base_url="http://bookstack.test"):
    client = BookStackRESTClientViaToken(base_url, "tokenid", "tokensecret")
    bs_client = BookStackClient(client)
    return BookStackDataSource(bs_client)

# 1. Basic Test Cases

@pytest.mark.asyncio

async def test_export_page_plaintext_basic_not_found():
    """Test exporting a non-existent page returns error and success=False."""
    datasource = make_datasource()
    page_id = 0  # Simulate not found
    resp = await datasource.export_page_plaintext(page_id)

@pytest.mark.asyncio

async def test_export_page_plaintext_edge_invalid_id_type():
    """Test passing an invalid id type (str instead of int) raises error in response."""
    datasource = make_datasource()
    resp = await datasource.export_page_plaintext("error")  # id as str triggers error in dummy client

@pytest.mark.asyncio

async def test_export_page_plaintext_edge_exception_handling():
    """Test that exceptions in the HTTP client are captured and returned as error."""
    datasource = make_datasource()
    # Use a page id that triggers an exception in dummy client
    resp = await datasource.export_page_plaintext("error")

@pytest.mark.asyncio






async def test_export_page_plaintext_throughput_error_handling():
    """Throughput: Export with some error-inducing IDs and ensure errors are returned."""
    datasource = make_datasource()
    ids = [1, "error", 2, 0, 3]
    results = await asyncio.gather(*(datasource.export_page_plaintext(i) for i in ids))
# codeflash_output is used to check that the output of the original code is the same as that of the optimized code.
#------------------------------------------------
from app.sources.external.bookstack.bookstack import BookStackDataSource

To edit these changes git checkout codeflash/optimize-BookStackDataSource.export_page_plaintext-mhblm42i and push.

Codeflash

### Explanation

#### Optimizations Applied.

1. **HTTPClient:**
   - Avoided redundant string formatting for `url` and improved header merging using in-place updates for performance and memory.
   - Pass `None` for parameters instead of creating empty dicts when possible, reducing unnecessary object creation.
   - Refactored body handling logic for fast path checks, reducing dictionary lookups.
   - Used direct mapping instead of `dict` constructor for copying headers if there are no modifications needed.
   - The creation and merging of dictionaries in request arguments occurs only when necessary.

2. **HTTPResponse:**
   - Cached `content_type` as a property with lazy initialization to reduce repeated lookups.
   - Used property for accessing the content rather than recomputing.

3. **BookStackDataSource:**
   - Minimized dictionary creations for headers and query parameters (used the original reference directly for `headers` since HTTPClient creates it once, and avoids the `dict()` call).
   - Used f-string for URL generation instead of `.format`, which is faster for single variable replacements.
   - Inline access to `headers` using the reference already available, which avoids unnecessary copying.
   - Moved param initialization outside the hot path (`params = {}` -> `None` since GET requests do not require actual params in this implementation).
   - Reduced object allocations by reusing header and param references.

In sum, these changes minimize unnecessary object creation, dictionary/array copies, and optimize hot path (request execution). These produce measurable improvements, particularly under high-throughput scenarios.

---
@codeflash-ai codeflash-ai bot requested a review from mashraf-222 October 29, 2025 06:12
@codeflash-ai codeflash-ai bot added the ⚡️ codeflash Optimization PR opened by Codeflash AI label Oct 29, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

⚡️ codeflash Optimization PR opened by Codeflash AI

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant