Skip to content

Conversation

@codeflash-ai
Copy link

@codeflash-ai codeflash-ai bot commented Oct 29, 2025

📄 18% (0.18x) speedup for BookStackDataSource.export_book_plaintext in backend/python/app/sources/external/bookstack/bookstack.py

⏱️ Runtime : 2.78 milliseconds 2.35 milliseconds (best of 255 runs)

📝 Explanation and details

The optimized code achieves an 18% runtime improvement and 1.2% throughput increase through two key micro-optimizations:

1. Efficient String Formatting in URL Construction:

  • Original: url = self.base_url + "/api/books/{id}/export/plaintext".format(id=id)
  • Optimized: url = f"{self.base_url}/api/books/{id}/export/plaintext"

The f-string approach is faster than .format() method calls, eliminating the overhead of method dispatch and format string parsing. Line profiler shows this reduces time from 464,728ns to 294,315ns (37% faster for this operation).

2. Optimized Header Dictionary Operations:

  • Original: merged_headers = {**self.headers, **request.headers} (dictionary unpacking)
  • Optimized: merged_headers = self.headers.copy() followed by conditional update()

Instead of always creating a new dictionary through unpacking (which allocates memory for all keys), the optimization copies the base headers once and only calls update() when request headers exist. This reduces dictionary operations and memory allocations.

3. Direct Header Copying:

  • Original: headers = dict(self.http.headers) (constructor call)
  • Optimized: headers = self.http.headers.copy() (direct method)

The .copy() method is more efficient than the dict() constructor for copying existing dictionaries.

These optimizations are particularly effective for high-throughput scenarios where the same operations are repeated frequently, as shown in the test results where concurrent loads of 50-200 requests benefit from reduced per-operation overhead. The improvements compound when handling multiple simultaneous API calls typical in BookStack data export operations.

Correctness verification report:

Test Status
⚙️ Existing Unit Tests 🔘 None Found
🌀 Generated Regression Tests 779 Passed
⏪ Replay Tests 🔘 None Found
🔎 Concolic Coverage Tests 🔘 None Found
📊 Tests Coverage 100.0%
🌀 Generated Regression Tests and Runtime
import asyncio  # used to run async functions
# Function under test (EXACT COPY, DO NOT MODIFY)
from typing import Dict, Union

import pytest  # used for our unit tests
from app.sources.external.bookstack.bookstack import BookStackDataSource


# Mocks and minimal implementations for dependencies
class DummyAsyncClient:
    """A dummy async client mimicking httpx.AsyncClient for testing"""
    def __init__(self, response_json):
        self.response_json = response_json
        self.closed = False

    async def request(self, method, url, **kwargs):
        # Return a dummy httpx.Response-like object
        return DummyResponse(self.response_json)

    async def aclose(self):
        self.closed = True

class DummyResponse:
    """A dummy httpx.Response-like object"""
    def __init__(self, json_data):
        self._json_data = json_data

    def json(self):
        return self._json_data

class DummyHTTPRequest:
    """Minimal HTTPRequest for testing"""
    def __init__(self, method, url, headers, query_params, body):
        self.method = method
        self.url = url
        self.headers = headers
        self.query_params = query_params
        self.body = body
        self.path_params = {}

class DummyHTTPResponse:
    """Minimal HTTPResponse for testing"""
    def __init__(self, response):
        self.response = response

    def json(self):
        return self.response.json()

class DummyHTTPClient:
    """Minimal HTTPClient for testing"""
    def __init__(self, response_json=None, raise_exc=False):
        self.headers = {"Authorization": "Token dummy:dummy"}
        self._response_json = response_json
        self._raise_exc = raise_exc
        self.base_url = "http://localhost"
        self.client = DummyAsyncClient(response_json)
        self.closed = False

    def get_base_url(self):
        return self.base_url

    async def execute(self, request, **kwargs):
        if self._raise_exc:
            raise Exception("Dummy exception")
        return DummyHTTPResponse(DummyResponse(self._response_json))

    async def close(self):
        self.closed = True

class DummyBookStackRESTClientViaToken(DummyHTTPClient):
    """Dummy BookStackRESTClientViaToken for testing"""
    def __init__(self, base_url, response_json=None, raise_exc=False):
        super().__init__(response_json, raise_exc)
        self.base_url = base_url

class DummyBookStackClient:
    """Dummy BookStackClient for testing"""
    def __init__(self, client):
        self.client = client

    def get_client(self):
        return self.client

# BookStackResponse for testing
class BookStackResponse:
    def __init__(self, success, data=None, error=None):
        self.success = success
        self.data = data
        self.error = error
from app.sources.external.bookstack.bookstack import BookStackDataSource

# ------------------- UNIT TESTS -------------------

# 1. Basic Test Cases

@pytest.mark.asyncio
async def test_export_book_plaintext_basic_success():
    """Test basic successful export of book as plaintext"""
    expected_json = {"book_id": 1, "content": "Hello world"}
    client = DummyBookStackClient(DummyBookStackRESTClientViaToken("http://localhost", response_json=expected_json))
    datasource = BookStackDataSource(client)
    response = await datasource.export_book_plaintext(1)

@pytest.mark.asyncio
async def test_export_book_plaintext_basic_empty_content():
    """Test export with empty content"""
    expected_json = {"book_id": 2, "content": ""}
    client = DummyBookStackClient(DummyBookStackRESTClientViaToken("http://localhost", response_json=expected_json))
    datasource = BookStackDataSource(client)
    response = await datasource.export_book_plaintext(2)

@pytest.mark.asyncio
async def test_export_book_plaintext_basic_nonexistent_book():
    """Test export with a book id that does not exist (simulate error)"""
    client = DummyBookStackClient(DummyBookStackRESTClientViaToken("http://localhost", response_json=None, raise_exc=True))
    datasource = BookStackDataSource(client)
    response = await datasource.export_book_plaintext(999)

# 2. Edge Test Cases

@pytest.mark.asyncio
async def test_export_book_plaintext_invalid_id_type():
    """Test passing a non-int book id (should still format as string)"""
    expected_json = {"book_id": "abc", "content": "Text"}
    client = DummyBookStackClient(DummyBookStackRESTClientViaToken("http://localhost", response_json=expected_json))
    datasource = BookStackDataSource(client)
    # id is str, but function expects int; should not crash due to Python's format
    response = await datasource.export_book_plaintext("abc")

@pytest.mark.asyncio
async def test_export_book_plaintext_http_client_not_initialized():
    """Test error when HTTP client is not initialized in BookStackClient"""
    class BadBookStackClient:
        def get_client(self):
            return None
    with pytest.raises(ValueError) as excinfo:
        BookStackDataSource(BadBookStackClient())

@pytest.mark.asyncio
async def test_export_book_plaintext_http_client_missing_base_url():
    """Test error when HTTP client does not have get_base_url method"""
    class BadHTTPClient:
        pass
    class BadBookStackClient:
        def get_client(self):
            return BadHTTPClient()
    with pytest.raises(ValueError) as excinfo:
        BookStackDataSource(BadBookStackClient())

@pytest.mark.asyncio
async def test_export_book_plaintext_concurrent_execution():
    """Test concurrent execution of export_book_plaintext"""
    expected_json = {"book_id": 3, "content": "Concurrent"}
    client = DummyBookStackClient(DummyBookStackRESTClientViaToken("http://localhost", response_json=expected_json))
    datasource = BookStackDataSource(client)
    # Run multiple concurrent requests
    results = await asyncio.gather(
        datasource.export_book_plaintext(3),
        datasource.export_book_plaintext(3),
        datasource.export_book_plaintext(3)
    )
    for response in results:
        pass

@pytest.mark.asyncio
async def test_export_book_plaintext_exception_handling():
    """Test that exceptions in HTTP client are caught and returned as error"""
    client = DummyBookStackClient(DummyBookStackRESTClientViaToken("http://localhost", response_json=None, raise_exc=True))
    datasource = BookStackDataSource(client)
    response = await datasource.export_book_plaintext(4)

# 3. Large Scale Test Cases

@pytest.mark.asyncio
async def test_export_book_plaintext_large_scale_concurrent():
    """Test large scale concurrent execution within reasonable bounds"""
    expected_json = {"book_id": 5, "content": "Bulk"}
    client = DummyBookStackClient(DummyBookStackRESTClientViaToken("http://localhost", response_json=expected_json))
    datasource = BookStackDataSource(client)
    # 50 concurrent requests
    tasks = [datasource.export_book_plaintext(5) for _ in range(50)]
    results = await asyncio.gather(*tasks)
    for response in results:
        pass

@pytest.mark.asyncio
async def test_export_book_plaintext_large_scale_mixed_success_and_error():
    """Test large scale with mixed success and error responses"""
    success_client = DummyBookStackClient(DummyBookStackRESTClientViaToken("http://localhost", response_json={"book_id": 6, "content": "OK"}))
    error_client = DummyBookStackClient(DummyBookStackRESTClientViaToken("http://localhost", response_json=None, raise_exc=True))
    success_ds = BookStackDataSource(success_client)
    error_ds = BookStackDataSource(error_client)
    # Mix 25 successes and 25 errors
    tasks = [success_ds.export_book_plaintext(6) for _ in range(25)] + [error_ds.export_book_plaintext(7) for _ in range(25)]
    results = await asyncio.gather(*tasks)
    for i, response in enumerate(results):
        if i < 25:
            pass
        else:
            pass

# 4. Throughput Test Cases

@pytest.mark.asyncio
async def test_export_book_plaintext_throughput_small_load():
    """Throughput test: small load (5 requests)"""
    expected_json = {"book_id": 8, "content": "Small"}
    client = DummyBookStackClient(DummyBookStackRESTClientViaToken("http://localhost", response_json=expected_json))
    datasource = BookStackDataSource(client)
    tasks = [datasource.export_book_plaintext(8) for _ in range(5)]
    results = await asyncio.gather(*tasks)
    for response in results:
        pass

@pytest.mark.asyncio
async def test_export_book_plaintext_throughput_medium_load():
    """Throughput test: medium load (50 requests)"""
    expected_json = {"book_id": 9, "content": "Medium"}
    client = DummyBookStackClient(DummyBookStackRESTClientViaToken("http://localhost", response_json=expected_json))
    datasource = BookStackDataSource(client)
    tasks = [datasource.export_book_plaintext(9) for _ in range(50)]
    results = await asyncio.gather(*tasks)
    for response in results:
        pass

@pytest.mark.asyncio
async def test_export_book_plaintext_throughput_high_load():
    """Throughput test: high load (200 requests)"""
    expected_json = {"book_id": 10, "content": "High"}
    client = DummyBookStackClient(DummyBookStackRESTClientViaToken("http://localhost", response_json=expected_json))
    datasource = BookStackDataSource(client)
    tasks = [datasource.export_book_plaintext(10) for _ in range(200)]
    results = await asyncio.gather(*tasks)
    for response in results:
        pass

@pytest.mark.asyncio
async def test_export_book_plaintext_throughput_mixed_load():
    """Throughput test: mixed success and error under load"""
    ok_json = {"book_id": 11, "content": "Mixed"}
    ok_client = DummyBookStackClient(DummyBookStackRESTClientViaToken("http://localhost", response_json=ok_json))
    err_client = DummyBookStackClient(DummyBookStackRESTClientViaToken("http://localhost", response_json=None, raise_exc=True))
    ok_ds = BookStackDataSource(ok_client)
    err_ds = BookStackDataSource(err_client)
    # 100 ok, 100 error
    tasks = [ok_ds.export_book_plaintext(11) for _ in range(100)] + [err_ds.export_book_plaintext(12) for _ in range(100)]
    results = await asyncio.gather(*tasks)
    for i, response in enumerate(results):
        if i < 100:
            pass
        else:
            pass
# codeflash_output is used to check that the output of the original code is the same as that of the optimized code.
#------------------------------------------------
import asyncio  # used to run async functions
# Function under test (EXACT COPY, DO NOT MODIFY)
from typing import Dict, Union

import pytest  # used for our unit tests
from app.sources.external.bookstack.bookstack import BookStackDataSource


# Mocks and stubs for dependencies
class DummyHTTPResponse:
    """A dummy HTTPResponse to simulate httpx.Response behavior."""
    def __init__(self, json_data):
        self._json_data = json_data

    def json(self):
        return self._json_data

class DummyHTTPClient:
    """A dummy HTTP client that simulates async HTTP requests."""
    def __init__(self, headers=None, should_raise=False, response_data=None):
        self.headers = headers or {"Authorization": "Token testid:testsecret"}
        self._should_raise = should_raise
        self._response_data = response_data or {"plaintext": "Book content"}

    def get_base_url(self):
        return "https://bookstack.example.com"

    async def execute(self, request, **kwargs):
        if self._should_raise:
            raise Exception("Simulated HTTP error")
        return DummyHTTPResponse(self._response_data)

class DummyBookStackClient:
    """A dummy BookStackClient that returns a DummyHTTPClient."""
    def __init__(self, http_client):
        self._http_client = http_client

    def get_client(self):
        return self._http_client

# BookStackResponse for result validation
class BookStackResponse:
    def __init__(self, success, data=None, error=None):
        self.success = success
        self.data = data
        self.error = error

# HTTPRequest stub for completeness
class HTTPRequest:
    def __init__(self, method, url, headers, query_params, body):
        self.method = method
        self.url = url
        self.headers = headers
        self.query_params = query_params
        self.body = body
from app.sources.external.bookstack.bookstack import BookStackDataSource

# ----------------- UNIT TESTS -----------------

# 1. Basic Test Cases

@pytest.mark.asyncio
async def test_export_book_plaintext_basic_success():
    """Test basic successful plaintext export of a book."""
    http_client = DummyHTTPClient(response_data={"plaintext": "Book content"})
    client = DummyBookStackClient(http_client)
    datasource = BookStackDataSource(client)
    result = await datasource.export_book_plaintext(42)

@pytest.mark.asyncio
async def test_export_book_plaintext_basic_different_id():
    """Test export with a different book ID returns expected data."""
    http_client = DummyHTTPClient(response_data={"plaintext": "Another content"})
    client = DummyBookStackClient(http_client)
    datasource = BookStackDataSource(client)
    result = await datasource.export_book_plaintext(99)

@pytest.mark.asyncio
async def test_export_book_plaintext_basic_async_behavior():
    """Test that the function is awaitable and returns a coroutine."""
    http_client = DummyHTTPClient()
    client = DummyBookStackClient(http_client)
    datasource = BookStackDataSource(client)
    codeflash_output = datasource.export_book_plaintext(1); coro = codeflash_output
    result = await coro

# 2. Edge Test Cases

@pytest.mark.asyncio
async def test_export_book_plaintext_http_error():
    """Test that an HTTP error is handled and returns success=False."""
    http_client = DummyHTTPClient(should_raise=True)
    client = DummyBookStackClient(http_client)
    datasource = BookStackDataSource(client)
    result = await datasource.export_book_plaintext(123)

@pytest.mark.asyncio
async def test_export_book_plaintext_missing_http_client():
    """Test that initializing with a missing HTTP client raises ValueError."""
    class BadClient:
        def get_client(self):
            return None
    with pytest.raises(ValueError):
        BookStackDataSource(BadClient())

@pytest.mark.asyncio
async def test_export_book_plaintext_bad_base_url_method():
    """Test that missing get_base_url method raises ValueError."""
    class BadHTTPClient:
        headers = {}
    class BadClient:
        def get_client(self):
            return BadHTTPClient()
    with pytest.raises(ValueError):
        BookStackDataSource(BadClient())

@pytest.mark.asyncio
async def test_export_book_plaintext_concurrent_execution():
    """Test concurrent execution of multiple exports."""
    http_client = DummyHTTPClient(response_data={"plaintext": "Concurrent"})
    client = DummyBookStackClient(http_client)
    datasource = BookStackDataSource(client)
    ids = [1, 2, 3, 4, 5]
    coros = [datasource.export_book_plaintext(book_id) for book_id in ids]
    results = await asyncio.gather(*coros)
    for result in results:
        pass

@pytest.mark.asyncio
async def test_export_book_plaintext_edge_case_empty_response():
    """Test export when the response is an empty dict."""
    http_client = DummyHTTPClient(response_data={})
    client = DummyBookStackClient(http_client)
    datasource = BookStackDataSource(client)
    result = await datasource.export_book_plaintext(10)

@pytest.mark.asyncio
async def test_export_book_plaintext_edge_case_non_dict_response():
    """Test export when the response is not a dict (should still succeed)."""
    class DummyHTTPResponseNonDict:
        def json(self):
            return "Not a dict"
    class DummyHTTPClientNonDict(DummyHTTPClient):
        async def execute(self, request, **kwargs):
            return DummyHTTPResponseNonDict()
    http_client = DummyHTTPClientNonDict()
    client = DummyBookStackClient(http_client)
    datasource = BookStackDataSource(client)
    result = await datasource.export_book_plaintext(11)

# 3. Large Scale Test Cases

@pytest.mark.asyncio
async def test_export_book_plaintext_large_scale_concurrent():
    """Test large scale concurrent execution (50 books)."""
    http_client = DummyHTTPClient(response_data={"plaintext": "Bulk"})
    client = DummyBookStackClient(http_client)
    datasource = BookStackDataSource(client)
    ids = list(range(100, 150))
    coros = [datasource.export_book_plaintext(book_id) for book_id in ids]
    results = await asyncio.gather(*coros)
    for result in results:
        pass

@pytest.mark.asyncio
async def test_export_book_plaintext_large_scale_error_handling():
    """Test error handling when some requests fail concurrently."""
    # Some requests will succeed, some will raise
    class FlakyHTTPClient(DummyHTTPClient):
        async def execute(self, request, **kwargs):
            if "fail" in request.url:
                raise Exception("Flaky error")
            return DummyHTTPResponse({"plaintext": "Flaky"})
    client = DummyBookStackClient(FlakyHTTPClient())
    datasource = BookStackDataSource(client)
    ids = ["ok1", "ok2", "fail1", "ok3", "fail2"]
    # Patch the url generation to include "fail" for certain ids
    orig_method = BookStackDataSource.export_book_plaintext
    async def patched_export_book_plaintext(self, id):
        params = {}
        url = self.base_url + f"/api/books/{id}/export/plaintext"
        headers = dict(self.http.headers)
        request = HTTPRequest(
            method="GET",
            url=url,
            headers=headers,
            query_params=params,
            body=None
        )
        try:
            response = await self.http.execute(request)
            return BookStackResponse(success=True, data=response.json())
        except Exception as e:
            return BookStackResponse(success=False, error=str(e))
    BookStackDataSource.export_book_plaintext = patched_export_book_plaintext
    coros = [datasource.export_book_plaintext(book_id) for book_id in ids]
    results = await asyncio.gather(*coros)
    BookStackDataSource.export_book_plaintext = orig_method  # restore
    for i, result in enumerate(results):
        if "fail" in ids[i]:
            pass
        else:
            pass

# 4. Throughput Test Cases

@pytest.mark.asyncio
async def test_export_book_plaintext_throughput_small_load():
    """Throughput test: small load (5 requests)."""
    http_client = DummyHTTPClient(response_data={"plaintext": "Small load"})
    client = DummyBookStackClient(http_client)
    datasource = BookStackDataSource(client)
    ids = [101, 102, 103, 104, 105]
    coros = [datasource.export_book_plaintext(book_id) for book_id in ids]
    results = await asyncio.gather(*coros)
    for result in results:
        pass

@pytest.mark.asyncio
async def test_export_book_plaintext_throughput_medium_load():
    """Throughput test: medium load (20 requests)."""
    http_client = DummyHTTPClient(response_data={"plaintext": "Medium load"})
    client = DummyBookStackClient(http_client)
    datasource = BookStackDataSource(client)
    ids = list(range(200, 220))
    coros = [datasource.export_book_plaintext(book_id) for book_id in ids]
    results = await asyncio.gather(*coros)
    for result in results:
        pass

@pytest.mark.asyncio
async def test_export_book_plaintext_throughput_high_volume():
    """Throughput test: high volume (100 requests)."""
    http_client = DummyHTTPClient(response_data={"plaintext": "High volume"})
    client = DummyBookStackClient(http_client)
    datasource = BookStackDataSource(client)
    ids = list(range(1000, 1100))
    coros = [datasource.export_book_plaintext(book_id) for book_id in ids]
    results = await asyncio.gather(*coros)
    for result in results:
        pass
# codeflash_output is used to check that the output of the original code is the same as that of the optimized code.
#------------------------------------------------
from app.sources.external.bookstack.bookstack import BookStackDataSource

To edit these changes git checkout codeflash/optimize-BookStackDataSource.export_book_plaintext-mhbi7z4c and push.

Codeflash

The optimized code achieves an 18% runtime improvement and 1.2% throughput increase through two key micro-optimizations:

**1. Efficient String Formatting in URL Construction:**
- **Original:** `url = self.base_url + "/api/books/{id}/export/plaintext".format(id=id)`
- **Optimized:** `url = f"{self.base_url}/api/books/{id}/export/plaintext"`

The f-string approach is faster than `.format()` method calls, eliminating the overhead of method dispatch and format string parsing. Line profiler shows this reduces time from 464,728ns to 294,315ns (37% faster for this operation).

**2. Optimized Header Dictionary Operations:**
- **Original:** `merged_headers = {**self.headers, **request.headers}` (dictionary unpacking)
- **Optimized:** `merged_headers = self.headers.copy()` followed by conditional `update()`

Instead of always creating a new dictionary through unpacking (which allocates memory for all keys), the optimization copies the base headers once and only calls `update()` when request headers exist. This reduces dictionary operations and memory allocations.

**3. Direct Header Copying:**
- **Original:** `headers = dict(self.http.headers)` (constructor call)  
- **Optimized:** `headers = self.http.headers.copy()` (direct method)

The `.copy()` method is more efficient than the `dict()` constructor for copying existing dictionaries.

These optimizations are particularly effective for **high-throughput scenarios** where the same operations are repeated frequently, as shown in the test results where concurrent loads of 50-200 requests benefit from reduced per-operation overhead. The improvements compound when handling multiple simultaneous API calls typical in BookStack data export operations.
@codeflash-ai codeflash-ai bot requested a review from mashraf-222 October 29, 2025 04:37
@codeflash-ai codeflash-ai bot added ⚡️ codeflash Optimization PR opened by Codeflash AI 🎯 Quality: High Optimization Quality according to Codeflash labels Oct 29, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

⚡️ codeflash Optimization PR opened by Codeflash AI 🎯 Quality: High Optimization Quality according to Codeflash

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant