fix: Mock html2text in content truncation test by T-rav · Pull Request #28 · Kode-Rex/webcat

T-rav · 2025-10-03T20:24:46Z

Fixes test failure on main after dependabot merge.

Add @patch for html2text.HTML2Text in truncation test
Mock handle() to return large content for proper testing
Test was failing because html2text wasn't mocked, causing MagicMock to leak into assertions

All 44 tests now pass locally and should pass in CI.

Summary by CodeRabbit

New Features
- Improved search and streaming responses with more consistent formatting and reliable fallback sources.
Bug Fixes
- Enhanced content scraping to better handle HTML, plain text, PDFs and long content, reducing truncation and parsing errors.
- More robust health and client endpoints with clearer, standardized responses.
Tests
- Expanded and stabilized test coverage with deterministic mocks and new test builders/factories to improve reliability.
Documentation
- Added architecture and testing guidelines to developer docs.

@patch

- Add @patch for html2text.HTML2Text in truncation test - Mock handle() to return large content for proper testing - Fixes test failure on CI after dependabot merge 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>

coderabbitai · 2025-10-03T20:25:17Z

Walkthrough

Refactors and helper extraction across scraping, search, streaming, health, and client code; adds numerous test factories/builders/mocks and updates unit tests to use them; small doc updates. Changes introduce many private helpers, centralize logic, and standardize test fixtures and SSE message formatting.

Changes

Cohort / File(s)	Summary
Content scraping service `docker/services/content_scraper.py`	Large refactor: extracted many private helpers (_fetch_content, _configure_html2text, _convert_with_readability, _convert_fallback, _truncate_if_needed, _process_code_blocks, _process_math_elements, etc.) and rewrote scrape_search_result to use them; added explicit content-type handling and truncation.
Streaming / demo servers `docker/demo_server.py`, `docker/simple_demo.py`	Extracted SSE helpers and operation-specific handlers (_format_sse_message, _generate_webcat_stream, _handle_search_operation, _handle_health_operation, _get_server_info); replaced inline generators with centralized generator and improved error handling and standardized SSE messages.
API/search orchestration helpers `docker/api_tools.py`	Added internal async/worker helpers and formatters (_fetch_with_serper, _fetch_with_duckduckgo, _format_no_results_error, _format_search_error, _process_and_format_results) and refactored create_webcat_functions to use them.
Clients: Serper / DuckDuckGo `docker/clients/serper_client.py`, `docker/clients/duckduckgo_client.py`	Introduced conversion helpers (_convert_organic_results, _convert_ddg_result) and refactored result-mapping to use these helpers (delegating APISearchResult construction).
Health & info endpoints `docker/health.py`	Split health/status/client/root handlers into many helpers (_get_health_status, _get_unhealthy_status, _load_client_file, _get_detailed_status, _get_root_info, etc.) and adjusted endpoints to call them.
Unit tests — content scraper update `docker/tests/unit/services/test_content_scraper.py`	Added @patch("services.content_scraper.html2text.HTML2Text") to test_truncates_content_exceeding_max_length and updated test signature to accept mock_html2text; mocked HTML2Text.handle output to control truncation assertions.
Tests — builders, factories, mocks `docker/tests/builders/api_search_result_builder.py`, `docker/tests/factories/*`, `docker/tests/factories/mock_http_response.py`, `docker/tests/factories/mock_serper_response.py`, `docker/tests/factories/...`	Added APISearchResultBuilder and multiple test factories/mocks: DDGSFactory, MockDDGS, MockDDGSClass, MockDDGSContextManager, SerperResponseFactory, MockSerperResponse, many HTTP response factory helpers (success, html_with_title, pdf, with_http_error, error_404, timeout, connection_error), and simplified/typed MockHttpResponse. Tests updated to use these factories.
Tests — unit updates to use builders/factories `docker/tests/unit/clients/test_duckduckgo_client.py`, `docker/tests/unit/clients/test_serper_client.py`, `docker/tests/unit/services/test_search_processor.py`, `docker/tests/unit/services/test_search_service.py`, `docker/tests/unit/tools/test_search_tool.py`, `docker/tests/test_mcp_server.py`	Replaced inline/mocked responses with the new factories/builders (DDGSFactory, SerperResponseFactory, APISearchResultBuilder, HttpResponseFactory) and updated assertions/patch usage accordingly; cleaned imports.
Docs `CLAUDE.md`	Documentation additions: Architecture Standards, No Raw Mocks guidance, Single Source of Truth/tooling notes; no code changes.

Sequence Diagram(s)

sequenceDiagram
    participant ScraperAPI as scrape_search_result
    participant HTTP as _fetch_content (requests)
    participant Readability as _convert_with_readability
    participant HTML2Text as _configure_html2text / html2text.HTML2Text
    participant Fallback as _convert_fallback
    participant Trunc as _truncate_if_needed

    ScraperAPI->>HTTP: GET url with browser headers
    alt binary/pdf
        HTTP-->>ScraperAPI: binary content
        ScraperAPI->>ScraperAPI: _handle_binary_content -> markdown/pdf metadata
    else plaintext
        HTTP-->>ScraperAPI: text/plain
        ScraperAPI->>ScraperAPI: _handle_plain_text -> markdown
    else html
        HTTP-->>ScraperAPI: html content
        ScraperAPI->>Readability: try extract main content
        alt Readability success
            Readability-->>ScraperAPI: extracted HTML
            ScraperAPI->>HTML2Text: _configure_html2text + convert
            HTML2Text-->>ScraperAPI: markdown
        else Readability fail
            Readability-->>ScraperAPI: error
            ScraperAPI->>Fallback: _convert_fallback (title + html2text)
            Fallback-->>ScraperAPI: markdown
        end
    end
    ScraperAPI->>Trunc: _truncate_if_needed(markdown)
    Trunc-->>ScraperAPI: final markdown + metadata / error

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Possibly related PRs

Refactor #27 — touches services.content_scraper and related tests that patch html2text.HTML2Text; strongly related to the test and scraper refactor in this diff.

Poem

I nibble docs and craft a test,
Patch HTML to tame the rest.
Helpers hop, pipelines neat—
Truncate, convert, serve the treat.
A rabbit cheers with carrot zest 🥕

Pre-merge checks and finishing touches

✅ Passed checks (3 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title Check	✅ Passed	The title concisely describes the key change—mocking html2text in the content truncation test—and matches the modifications in the test file without extraneous details.
Docstring Coverage	✅ Passed	Docstring coverage is 81.74% which is sufficient. The required threshold is 80.00%.

✨ Finishing touches

📝 Generate docstrings

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment
Commit unit tests in branch fix/html2text-mock

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

- Create MockSerperResponse and SerperResponseFactory for Serper tests - Create MockDDGS, MockDDGSClass and DDGSFactory for DuckDuckGo tests - Remove all raw MagicMock property assignments (mock.json.return_value = ...) - Use factory methods instead: SerperResponseFactory.two_results() - Follows builder/factory pattern: one class per file, no raw mocks All 44 tests pass locally. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>

- Create APISearchResultBuilder with fluent interface - Add a_serper_result() and a_duckduckgo_result() pre-configured builders - Replace inline APISearchResult() creation in test_search_service.py - All tests still passing Remaining inline creations in: - tests/unit/services/test_search_processor.py (3 instances) - tests/unit/tools/test_search_tool.py (2 instances) - tests/unit/models/test_models.py (1 instance for testing model itself) 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>

…tion - Replace all inline APISearchResult() calls with builder pattern - Updated test_search_processor.py (3 instances) - Updated test_search_tool.py (2 instances) - Now using an_api_search_result().with_*().build() everywhere All inline object creation eliminated except test_models.py (acceptable - testing the model itself). No more raw mocks or inline test data creation. Full builder/factory pattern compliance. All 44 tests pass. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>

- Split mock_ddgs.py (3 classes) into 3 separate files: - mock_ddgs_search_client.py (MockDDGS) - mock_ddgs_context_manager.py (MockDDGSContextManager) - mock_ddgs_class.py (MockDDGSClass) - Extract _convert_organic_results() in serper_client.py to reduce nesting Now fully compliant with 1 class per file rule. Remaining: services/content_scraper.py has 6-level nesting (needs extraction) 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>

Refactored scrape_search_result() from 165-line function with 6-level nesting to well-organized helper methods with max 2-level nesting. Changes: - Extract _fetch_content() for HTTP requests - Extract _handle_plain_text() and _handle_binary_content() for special content types - Extract _configure_html2text() for converter configuration - Extract _extract_language_from_classes() to reduce code block processing nesting - Extract _process_code_blocks() and _process_math_elements() for HTML preprocessing - Extract _convert_to_markdown() for markdown conversion pipeline - Extract _convert_with_readability() and _convert_fallback() for content extraction strategies - Extract _truncate_if_needed() for content length management Results: - Reduced nesting from 6 levels to 2 levels (well under 3-level limit) - Main function now 40 lines (was 165 lines) - Each helper method has single, clear responsibility - All 44 unit tests pass Follows "methods for concepts" principle and max 3-level nesting requirement. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>

Document the architecture standards established during refactoring: - Maximum 3-level nesting depth with method extraction - One class per file (production and test code) - Methods for concepts (single responsibility helpers) - No raw mocks (typed mocks + factories + builders) 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>

Fixed nesting and raw mock violations in production and test code: **Production code fixes:** - clients/duckduckgo_client.py: Extract _convert_ddg_result() helper to reduce nesting from 6 to 3 levels **Test infrastructure fixes:** - Create MockHttpResponse class for typed HTTP mocks - Create HttpResponseFactory with methods: success(), plaintext(), html_with_title(), pdf(), error_404(), timeout(), connection_error() - tests/test_mcp_server.py: Replace all MagicMock() usage with HttpResponseFactory **Results:** - All 44 unit tests pass - No raw MagicMock() with property assignment - All high/medium priority violations fixed - Follows typed mocks + factory pattern standards 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>

Reduced control flow nesting from 6 to 3 levels by extracting helpers: - _get_health_status(), _get_unhealthy_status() - _client_not_found_response(), _client_error_response() - _load_client_file() - _get_server_configuration(), _get_server_endpoints(), _get_server_capabilities() - _get_detailed_status(), _get_status_error() - _get_root_info() Results: - Max nesting now 3 levels (was 6) - Each helper has single, clear responsibility - All endpoints simplified to try/return pattern 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>

docker/health.py

+    """
+    return JSONResponse(
+        status_code=500,
+        content={"error": "Failed to serve WebCat client", "details": error},


To fix the problem, we need to avoid exposing potentially sensitive exception messages to end users through the API. Instead, we should log the complete exception details on the server for debugging and provide users with a generic error message. The specific fix involves modifying the _client_error_response function (lines 70-83) to not include the raw exception details in the API response. Additionally, in the /client endpoint handler (lines 210-226), when catching an exception, log the exception details and return a generic error to the client.

Update _client_error_response(error: str) so it does not transmit the exception details, only a generic message.

At the exception handling site (line 223+), ensure that the raw exception is logged but the client only receives the generic error response.

Fix only the code shown in the snippet, without changing the overall functionality: logging stays; the only alteration is what is returned to the user.

No new imports or dependencies are required; use the existing logger.

docker/health.py

-                    "timestamp": time.time(),
-                },
-            )
+            return JSONResponse(status_code=500, content=_get_unhealthy_status(str(e)))


To fix this problem, we should stop passing str(e) (the string representation of the exception) to the client in the JSON response.
Instead, pass a generic error message such as "An internal error has occurred" (or similar) to the client, while logging the specific exception detail (and optionally traceback) for server-side debugging.
Specifically, in setup_health_endpoints, inside the /health endpoint, change:

return JSONResponse(status_code=500, content=_get_unhealthy_status(str(e)))

to:

return JSONResponse(status_code=500, content=_get_unhealthy_status("An internal error has occurred"))

and keep the logging as-is.

No change to imports or new methods is needed.

docker/health.py

-                    "timestamp": time.time(),
-                },
-            )
+            return JSONResponse(status_code=500, content=_get_status_error(str(e)))


The correct way to fix this issue is to ensure that only generic error information is sent to the client, with specific or sensitive error details logged for internal diagnosis.

In file docker/health.py, specifically in the /status endpoint handler, modify the error response so that the "details" field does not include str(e) (exception's string representation).

Instead, return a general message to the client (e.g., "Internal server error") while logging the full exception details (including a stack trace) server-side using the logging module.

Only edit the code in the region shown above, specifically the try/except in the /status endpoint, and the argument passed to _get_status_error.

No additional imports are necessary as logging is already set up.

Reduced control flow nesting from 7 to 3 levels by extracting: - _format_sse_message() for SSE message formatting - _handle_search_operation() for search logic - _handle_health_operation() for health checks - _get_server_info() for server metadata - _generate_webcat_stream() extracted to module level Results: - Max nesting reduced from 7 to 3 levels - SSE generator logic now at module level - Each operation handler has clear responsibility - Simplified webcat_stream endpoint 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>

Reduced control flow nesting from 7 to 3 levels by reusing helpers: - _format_sse_message() for SSE message formatting - _handle_search_operation() for search logic - _handle_health_operation() for health checks - _get_server_info() for server metadata - _generate_webcat_stream() extracted to module level Results: - Max nesting reduced from 7 to 3 levels - Simplified webcat_stream endpoint - Consistent structure with demo_server.py 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>

Reduced control flow nesting from 6 to 3 levels by extracting: - _fetch_with_serper() for Serper API calls - _fetch_with_duckduckgo() for DuckDuckGo fallback - _format_no_results_error() for error formatting - _format_search_error() for exception formatting - _process_and_format_results() for result processing Results: - Max nesting reduced from 6 to 3 levels in search_function - Each fetch method has clear responsibility - Simplified error handling with dedicated formatters 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>

coderabbitai

Actionable comments posted: 13

♻️ Duplicate comments (1)

docker/health.py (1)
81-81: Information exposure through exception details.

Exposing raw exception messages (str(e)) in API responses can leak sensitive information such as stack traces, file paths, or internal implementation details. Consider sanitizing error messages before returning them to external users.

For example, use generic messages in production:
-        content={"error": "Failed to serve WebCat client", "details": error},
+        content={"error": "Failed to serve WebCat client", "details": "Internal server error"},
Or implement a sanitization function that filters sensitive patterns from error messages before exposing them.

Also applies to: 208-208, 234-234

🧹 Nitpick comments (3)

docker/health.py (1)
207-207: Use logging.exception for automatic traceback inclusion.

Replace logging.error(f"... {str(e)}") with logging.exception(...) to automatically include the exception traceback, which aids debugging.

Apply this pattern to all three error handlers:
-            logger.error(f"Health check failed: {str(e)}")
+            logger.exception("Health check failed")
-            logger.error(f"Failed to serve WebCat client: {str(e)}")
+            logger.exception("Failed to serve WebCat client")
-            logger.error(f"Status check failed: {str(e)}")
+            logger.exception("Status check failed")
Also applies to: 224-224, 233-233
docker/demo_server.py (1)
199-201: Use logger.exception when surfacing unexpected failures.

Switch to logger.exception so we capture the traceback while keeping the SSE error payload.
-    except Exception as e:
-        logger.error(f"Error in SSE stream: {str(e)}")
-        yield _format_sse_message("error", message=str(e))
+    except Exception as exc:
+        logger.exception("Error in SSE stream")
+        yield _format_sse_message("error", message=str(exc))
docker/simple_demo.py (1)
179-181: Capture tracebacks with logger.exception.

Match the other module and emit the traceback while still returning the SSE error payload.
-    except Exception as e:
-        logger.error(f"Error in SSE stream: {str(e)}")
-        yield _format_sse_message("error", message=str(e))
+    except Exception as exc:
+        logger.exception("Error in SSE stream")
+        yield _format_sse_message("error", message=str(exc))

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between cf7787f and 818c3d0.

📒 Files selected for processing (23)

CLAUDE.md (1 hunks)
docker/api_tools.py (1 hunks)
docker/clients/duckduckgo_client.py (2 hunks)
docker/clients/serper_client.py (2 hunks)
docker/demo_server.py (2 hunks)
docker/health.py (1 hunks)
docker/services/content_scraper.py (2 hunks)
docker/simple_demo.py (2 hunks)
docker/tests/builders/api_search_result_builder.py (1 hunks)
docker/tests/factories/ddgs_factory.py (1 hunks)
docker/tests/factories/http_response_factory.py (1 hunks)
docker/tests/factories/mock_ddgs_class.py (1 hunks)
docker/tests/factories/mock_ddgs_context_manager.py (1 hunks)
docker/tests/factories/mock_ddgs_search_client.py (1 hunks)
docker/tests/factories/mock_http_response.py (1 hunks)
docker/tests/factories/mock_serper_response.py (1 hunks)
docker/tests/factories/serper_response_factory.py (1 hunks)
docker/tests/test_mcp_server.py (5 hunks)
docker/tests/unit/clients/test_duckduckgo_client.py (4 hunks)
docker/tests/unit/clients/test_serper_client.py (5 hunks)
docker/tests/unit/services/test_search_processor.py (4 hunks)
docker/tests/unit/services/test_search_service.py (3 hunks)
docker/tests/unit/tools/test_search_tool.py (3 hunks)

✅ Files skipped from review due to trivial changes (1)

CLAUDE.md

🧰 Additional context used

📓 Path-based instructions (4)

**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

**/*.py: Format all Python code with Black (line-length=88; target Python 3.11)
Sort imports with isort using profile="black"
Require type hints for all function signatures (checked by mypy)
Use Google-style docstrings for public functions

Files:

docker/tests/unit/clients/test_serper_client.py
docker/tests/unit/services/test_search_service.py
docker/tests/unit/clients/test_duckduckgo_client.py
docker/tests/factories/mock_ddgs_search_client.py
docker/health.py
docker/tests/factories/ddgs_factory.py
docker/services/content_scraper.py
docker/tests/unit/tools/test_search_tool.py
docker/tests/unit/services/test_search_processor.py
docker/simple_demo.py
docker/tests/factories/mock_serper_response.py
docker/api_tools.py
docker/clients/duckduckgo_client.py
docker/tests/factories/mock_ddgs_context_manager.py
docker/tests/factories/mock_http_response.py
docker/tests/factories/mock_ddgs_class.py
docker/tests/factories/serper_response_factory.py
docker/tests/builders/api_search_result_builder.py
docker/tests/test_mcp_server.py
docker/demo_server.py
docker/tests/factories/http_response_factory.py
docker/clients/serper_client.py

docker/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

docker/**/*.py: Prefer absolute imports (e.g., from docker.mcp_server import ...)
All Python files in docker/ are included in coverage; add tests rather than excluding files

Files:

docker/tests/unit/clients/test_serper_client.py
docker/tests/unit/services/test_search_service.py
docker/tests/unit/clients/test_duckduckgo_client.py
docker/tests/factories/mock_ddgs_search_client.py
docker/health.py
docker/tests/factories/ddgs_factory.py
docker/services/content_scraper.py
docker/tests/unit/tools/test_search_tool.py
docker/tests/unit/services/test_search_processor.py
docker/simple_demo.py
docker/tests/factories/mock_serper_response.py
docker/api_tools.py
docker/clients/duckduckgo_client.py
docker/tests/factories/mock_ddgs_context_manager.py
docker/tests/factories/mock_http_response.py
docker/tests/factories/mock_ddgs_class.py
docker/tests/factories/serper_response_factory.py
docker/tests/builders/api_search_result_builder.py
docker/tests/test_mcp_server.py
docker/demo_server.py
docker/tests/factories/http_response_factory.py
docker/clients/serper_client.py

docker/tests/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

docker/tests/**/*.py: Use pytest markers: @pytest.mark.unit for unit tests, @pytest.mark.integration for integration tests, @pytest.mark.api for external API tests
Unit tests should mock external dependencies and run fast; avoid network calls
Tests that depend on external services must be marked @pytest.mark.integration
When patching requests, patch at the module-under-test level (e.g., @patch('mcp_server.requests.get'))

Files:

docker/tests/unit/clients/test_serper_client.py
docker/tests/unit/services/test_search_service.py
docker/tests/unit/clients/test_duckduckgo_client.py
docker/tests/factories/mock_ddgs_search_client.py
docker/tests/factories/ddgs_factory.py
docker/tests/unit/tools/test_search_tool.py
docker/tests/unit/services/test_search_processor.py
docker/tests/factories/mock_serper_response.py
docker/tests/factories/mock_ddgs_context_manager.py
docker/tests/factories/mock_http_response.py
docker/tests/factories/mock_ddgs_class.py
docker/tests/factories/serper_response_factory.py
docker/tests/builders/api_search_result_builder.py
docker/tests/test_mcp_server.py
docker/tests/factories/http_response_factory.py

docker/tests/test_mcp_server.py

📄 CodeRabbit inference engine (CLAUDE.md)

Add unit tests for new MCP tools in docker/tests/test_mcp_server.py (plus additional test files as needed)

Files:

docker/tests/test_mcp_server.py

🧠 Learnings (1)

📚 Learning: 2025-10-03T20:19:57.835Z

Learnt from: CR
PR: Kode-Rex/webcat#0
File: CLAUDE.md:0-0
Timestamp: 2025-10-03T20:19:57.835Z
Learning: Applies to docker/tests/test_mcp_server.py : Add unit tests for new MCP tools in docker/tests/test_mcp_server.py (plus additional test files as needed)

Applied to files:

docker/tests/test_mcp_server.py

🧬 Code graph analysis (19)

docker/tests/unit/clients/test_serper_client.py (2)

docker/clients/serper_client.py (1)

fetch_search_results (38-64)

docker/tests/factories/serper_response_factory.py (4)

SerperResponseFactory (13-74)

two_results (38-57)

empty (29-35)

with_results (17-26)

docker/tests/unit/services/test_search_service.py (2)

docker/tests/builders/api_search_result_builder.py (3)

a_duckduckgo_result (67-74)

a_serper_result (57-64)

build (43-49)

docker/services/search_service.py (1)

fetch_with_fallback (18-46)

docker/tests/unit/clients/test_duckduckgo_client.py (1)

docker/tests/factories/ddgs_factory.py (5)

DDGSFactory (14-83)

two_results (41-53)

empty (31-38)

with_exception (56-66)

with_results (18-28)

docker/tests/factories/ddgs_factory.py (2)

docker/tests/factories/mock_ddgs_class.py (1)

MockDDGSClass (12-21)

docker/tests/factories/mock_ddgs_search_client.py (1)

MockDDGS (11-47)

docker/services/content_scraper.py (1)

docker/tests/factories/http_response_factory.py (1)

timeout (80-86)

docker/tests/unit/tools/test_search_tool.py (1)

docker/tests/builders/api_search_result_builder.py (2)

an_api_search_result (52-54)

with_link (33-36)

docker/tests/unit/services/test_search_processor.py (1)

docker/tests/builders/api_search_result_builder.py (2)

an_api_search_result (52-54)

with_link (33-36)

docker/simple_demo.py (1)

docker/demo_server.py (5)

_format_sse_message (64-75)

_handle_search_operation (78-102)

_handle_health_operation (105-127)

_get_server_info (130-142)

_generate_webcat_stream (145-201)

docker/api_tools.py (3)

docker/clients/serper_client.py (1)

fetch_search_results (38-64)

customgpt/function_app.py (1)

fetch_duckduckgo_search_results (147-185)

docker/services/search_processor.py (1)

process_search_results (15-39)

docker/clients/duckduckgo_client.py (1)

docker/models/api_search_result.py (1)

APISearchResult (11-22)

docker/tests/factories/mock_ddgs_context_manager.py (1)

docker/tests/factories/mock_ddgs_search_client.py (1)

MockDDGS (11-47)

docker/tests/factories/mock_http_response.py (1)

docker/tests/factories/mock_ddgs_search_client.py (1)

text (29-47)

docker/tests/factories/mock_ddgs_class.py (2)

docker/tests/factories/mock_ddgs_context_manager.py (1)

MockDDGSContextManager (11-24)

docker/tests/factories/mock_ddgs_search_client.py (1)

MockDDGS (11-47)

docker/tests/factories/serper_response_factory.py (1)

docker/tests/factories/mock_serper_response.py (1)

MockSerperResponse (11-37)

docker/tests/builders/api_search_result_builder.py (1)

docker/models/api_search_result.py (1)

APISearchResult (11-22)

docker/tests/test_mcp_server.py (2)

docker/tests/factories/http_response_factory.py (4)

HttpResponseFactory (13-124)

success (17-35)

with_http_error (52-65)

plaintext (38-49)

docker/tests/conftest.py (1)

mock_get (67-68)

docker/demo_server.py (1)

docker/simple_demo.py (5)

_format_sse_message (44-55)

_handle_search_operation (58-82)

_handle_health_operation (85-107)

_get_server_info (110-122)

_generate_webcat_stream (125-181)

docker/tests/factories/http_response_factory.py (1)

docker/tests/factories/mock_http_response.py (1)

MockHttpResponse (11-44)

docker/clients/serper_client.py (1)

docker/models/api_search_result.py (1)

APISearchResult (11-22)

🪛 GitHub Check: CodeQL

docker/health.py

[warning] 81-81: Information exposure through an exception
Stack trace information flows to this location and may be exposed to an external user.

[warning] 208-208: Information exposure through an exception
Stack trace information flows to this location and may be exposed to an external user.

[warning] 234-234: Information exposure through an exception
Stack trace information flows to this location and may be exposed to an external user.

🪛 Ruff (0.13.2)

docker/health.py

109-109: Probable insecure usage of temporary file or directory: "/tmp"

(S108)

206-206: Do not catch blind exception: Exception

(BLE001)

207-207: Use logging.exception instead of logging.error

Replace with exception

(TRY400)

207-207: Use explicit conversion flag

Replace with conversion flag

(RUF010)

223-223: Do not catch blind exception: Exception

(BLE001)

224-224: Use logging.exception instead of logging.error

Replace with exception

(TRY400)

224-224: Use explicit conversion flag

Replace with conversion flag

(RUF010)

232-232: Do not catch blind exception: Exception

(BLE001)

233-233: Use logging.exception instead of logging.error

Replace with exception

(TRY400)

233-233: Use explicit conversion flag

Replace with conversion flag

(RUF010)

docker/services/content_scraper.py

152-152: Use explicit conversion flag

Replace with conversion flag

(RUF010)

255-255: Do not catch blind exception: Exception

(BLE001)

257-257: Use explicit conversion flag

Replace with conversion flag

(RUF010)

docker/simple_demo.py

179-179: Do not catch blind exception: Exception

(BLE001)

180-180: Use logging.exception instead of logging.error

Replace with exception

(TRY400)

180-180: Use explicit conversion flag

Replace with conversion flag

(RUF010)

docker/tests/factories/mock_serper_response.py

37-37: Avoid specifying long messages outside the exception class

(TRY003)

docker/api_tools.py

391-391: Do not catch blind exception: Exception

(BLE001)

392-392: Use logging.exception instead of logging.error

Replace with exception

(TRY400)

392-392: Use explicit conversion flag

Replace with conversion flag

(RUF010)

docker/tests/factories/mock_ddgs_class.py

19-19: Unused method argument: args

(ARG002)

19-19: Unused method argument: kwargs

(ARG002)

docker/demo_server.py

199-199: Do not catch blind exception: Exception

(BLE001)

200-200: Use logging.exception instead of logging.error

Replace with exception

(TRY400)

200-200: Use explicit conversion flag

Replace with conversion flag

(RUF010)

docker/tests/factories/http_response_factory.py

19-19: PEP 484 prohibits implicit Optional

Convert to T | None

(RUF013)

🔇 Additional comments (11)

docker/tests/unit/tools/test_search_tool.py (2)

13-13: LGTM! Builder pattern improves test maintainability.

The migration to an_api_search_result().build() centralizes test data construction and aligns with the broader test infrastructure improvements across the codebase.

Also applies to: 25-25

65-71: LGTM! Clean fluent builder usage.

The fluent setter chaining clearly expresses test intent and makes the test data construction more readable.

docker/clients/duckduckgo_client.py (2)

25-38: LGTM! Well-structured helper function.

The extracted helper improves code organization by separating the result conversion logic. Type hints and Google-style docstring are properly provided as per coding guidelines.

63-63: LGTM! Clean refactor using the helper.

The list comprehension now delegates to _convert_ddg_result, improving maintainability while preserving the original behavior.

docker/tests/unit/services/test_search_service.py (1)

11-14: LGTM! Consistent builder usage across tests.

The migration to pre-configured builders (a_serper_result(), a_duckduckgo_result()) standardizes test data construction and improves maintainability. Updated assertions correctly reflect the builder defaults.

Also applies to: 23-23, 31-31, 39-39, 47-47, 56-56

docker/tests/unit/services/test_search_processor.py (1)

12-12: LGTM! Consistent builder pattern adoption.

All tests correctly use the an_api_search_result() builder with fluent setters, improving test maintainability and readability.

Also applies to: 21-27, 45-54, 85-91

docker/tests/factories/mock_ddgs_class.py (1)

12-21: LGTM! Clean mock wrapper implementation.

The MockDDGSClass correctly implements a callable wrapper that returns a context manager. The unused *args, **kwargs in __call__ are intentional for API compatibility with the real DDGS constructor signature. Type hints and docstrings are properly provided.

docker/tests/unit/clients/test_duckduckgo_client.py (2)

11-11: LGTM! Clean migration to DDGSFactory.

The migration from MagicMock to typed DDGSFactory improves type safety and test clarity. The double-call pattern (e.g., DDGSFactory.two_results()()) correctly constructs the mock class and then invokes it to get the context manager.

Also applies to: 20-20, 34-34, 46-48, 59-59

37-41: LGTM! Improved test approach.

The test now verifies behavior through results rather than inspecting mock call details, which is a more robust testing practice.

docker/tests/factories/mock_ddgs_context_manager.py (1)

11-24: LGTM! Correct context manager implementation.

The MockDDGSContextManager properly implements the context manager protocol with appropriate type hints and docstrings, enabling realistic with-statement usage in tests.

docker/health.py (1)

20-196: LGTM! Well-structured helper functions.

The extracted helper functions improve modularity and testability. All functions have proper type hints and Google-style docstrings as per coding guidelines. The refactoring maintains the original behavior while making the code more maintainable.

coderabbitai · 2025-10-03T21:26:53Z

docker/api_tools.py

+async def _fetch_with_serper(query: str, api_key: str):
+    """Fetch search results using Serper API.
+
+    Args:
+        query: Search query
+        api_key: Serper API key
+
+    Returns:
+        Tuple of (results, search_source)
+    """
+    import asyncio
+
+    from mcp_server import fetch_search_results
+
+    logger.info("Using Serper API for search")
+    results = await asyncio.get_event_loop().run_in_executor(
+        None, fetch_search_results, query, api_key
+    )
+    return results, "Serper API"


🛠️ Refactor suggestion | 🟠 Major

Add return types and favour asyncio.to_thread.

New helpers must have explicit return annotations per our typing rules, and we can avoid the deprecated asyncio.get_event_loop() by switching to asyncio.to_thread(...). Consider:

-from typing import TYPE_CHECKING, Any, Dict, List, Optional +from typing import TYPE_CHECKING, Any, Dict, List, Optional, Tuple ... -if TYPE_CHECKING: - pass +if TYPE_CHECKING: + from docker.models.api_search_result import APISearchResult ... -async def _fetch_with_serper(query: str, api_key: str): +async def _fetch_with_serper( + query: str, api_key: str +) -> Tuple[List["APISearchResult"], str]: ... - results = await asyncio.get_event_loop().run_in_executor( - None, fetch_search_results, query, api_key - ) + results = await asyncio.to_thread(fetch_search_results, query, api_key)

Apply the same pattern to the DuckDuckGo helper so static typing and asyncio usage stay consistent.

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

async def _fetch_with_serper(query: str, api_key: str):

"""Fetch search results using Serper API.

Args:

query: Search query

api_key: Serper API key

Returns:

Tuple of (results, search_source)

"""

import asyncio

from mcp_server import fetch_search_results

logger.info("Using Serper API for search")

results = await asyncio.get_event_loop().run_in_executor(

None, fetch_search_results, query, api_key

)

return results, "Serper API"

from typing import TYPE_CHECKING, Any, Dict, List, Optional, Tuple

if TYPE_CHECKING:

from docker.models.api_search_result import APISearchResult

async def _fetch_with_serper(

query: str, api_key: str

) -> Tuple[List["APISearchResult"], str]:

"""Fetch search results using Serper API.

Args:

query: Search query

api_key: Serper API key

Returns:

Tuple of (results, search_source)

"""

import asyncio

from mcp_server import fetch_search_results

logger.info("Using Serper API for search")

results = await asyncio.to_thread(fetch_search_results, query, api_key)

return results, "Serper API"

🤖 Prompt for AI Agents

In docker/api_tools.py around lines 254 to 272, the _fetch_with_serper helper is missing an explicit return type annotation and uses the deprecated asyncio.get_event_loop().run_in_executor; update its signature to include the correct async return type (e.g., -> Tuple[List[...], str] or the precise types used by fetch_search_results) and replace the run_in_executor call with asyncio.to_thread(fetch_search_results, query, api_key) to run the blocking function in a thread. Also apply the same explicit return annotation and switch to asyncio.to_thread for the DuckDuckGo helper so both helpers are properly typed and use modern asyncio patterns.

coderabbitai · 2025-10-03T21:26:53Z

docker/api_tools.py

+async def _fetch_with_duckduckgo(query: str, has_api_key: bool):
+    """Fetch search results using DuckDuckGo.
+
+    Args:
+        query: Search query
+        has_api_key: Whether Serper API key was configured
+
+    Returns:
+        Tuple of (results, search_source)
+    """
+    import asyncio
+
+    from mcp_server import fetch_duckduckgo_search_results
+
+    if not has_api_key:
+        logger.info("No Serper API key configured, using DuckDuckGo fallback")
+    else:
+        logger.warning("No results from Serper API, trying DuckDuckGo fallback")
+
+    results = await asyncio.get_event_loop().run_in_executor(
+        None, fetch_duckduckgo_search_results, query
+    )
+    return results, "DuckDuckGo (free fallback)"
+


🛠️ Refactor suggestion | 🟠 Major

Ensure the DuckDuckGo helper is fully typed.

Please mirror the Serper changes here: add a precise return type (e.g., Tuple[Optional[List["APISearchResult"]], str]) and swap asyncio.get_event_loop().run_in_executor for asyncio.to_thread(...) to satisfy our typing policy and avoid deprecated APIs.

coderabbitai · 2025-10-03T21:26:53Z

docker/api_tools.py

+def _format_no_results_error(query: str, search_source: str) -> dict:
+    """Format error response for no results.
+
+    Args:
+        query: Search query
+        search_source: Source that was used
+
+    Returns:
+        Error dictionary
+    """
+    return {
+        "error": "No search results found from any source.",
+        "query": query,
+        "search_source": search_source,
+    }
+
+
+def _format_search_error(error: str, query: str, search_source: str) -> dict:
+    """Format error response for search failure.
+
+    Args:
+        error: Error message
+        query: Search query
+        search_source: Source that was attempted
+
+    Returns:
+        Error dictionary
+    """
+    return {
+        "error": f"Search failed: {error}",
+        "query": query,
+        "search_source": search_source,
+    }
+
+
+async def _process_and_format_results(results, query: str, search_source: str):
+    """Process search results and format response.
+
+    Args:
+        results: Raw search results
+        query: Search query
+        search_source: Source of results
+
+    Returns:
+        Formatted results dictionary
+    """
+    import asyncio
+
+    from mcp_server import process_search_results
+
+    processed_results = await asyncio.get_event_loop().run_in_executor(
+        None, process_search_results, results
+    )
+
+    return {
+        "query": query,
+        "search_source": search_source,
+        "results": [result.model_dump() for result in processed_results],
+    }


🛠️ Refactor suggestion | 🟠 Major

Type the formatting utilities.

Our guidelines require explicit return types. _format_no_results_error, _format_search_error, and _process_and_format_results should return Dict[str, Any] (or a refined mapping) and accept properly typed results (e.g., List["APISearchResult"]). Please add those annotations and, for _process_and_format_results, consider asyncio.to_thread for consistency with the fetch helpers.

🤖 Prompt for AI Agents

In docker/api_tools.py around lines 300 to 358, the three helpers lack explicit typing: add return annotations of Dict[str, Any] for _format_no_results_error and _format_search_error and for _process_and_format_results change the signature to accept results: List["APISearchResult"] (or a more specific Sequence type) and return Dict[str, Any]; inside _process_and_format_results replace asyncio.get_event_loop().run_in_executor(...) with asyncio.to_thread(process_search_results, results) for consistency with fetch helpers and keep importing typing (Dict, Any, List) and forward-reference APISearchResult in quotes or via TYPE_CHECKING import.

coderabbitai · 2025-10-03T21:26:54Z

docker/demo_server.py

+async def _handle_search_operation(search_func, query: str, max_results: int):
+    """Handle search operation and yield results.
+
+    Args:
+        search_func: Search function to call
+        query: Search query
+        max_results: Maximum results to return
+
+    Yields:
+        SSE formatted messages
+    """
+    yield _format_sse_message("status", message=f"Searching for: {query}")
+
+    result = await search_func(query)
+
+    # Limit results if needed
+    if result.get("results") and len(result["results"]) > max_results:
+        result["results"] = result["results"][:max_results]
+        result["note"] = f"Results limited to {max_results} items"
+
+    yield _format_sse_message("data", data=result)
+    num_results = len(result.get("results", []))
+    yield _format_sse_message(
+        "complete", message=f"Search completed. Found {num_results} results."
+    )
+
+
+async def _handle_health_operation(health_func):
+    """Handle health check operation.
+
+    Args:
+        health_func: Health check function to call (or None)
+
+    Yields:
+        SSE formatted messages
+    """
+    yield _format_sse_message("status", message="Checking server health...")
+
+    if health_func:
+        result = await health_func()
+        yield _format_sse_message("data", data=result)
+        yield _format_sse_message("complete", message="Health check completed")
+    else:
+        basic_health = {
+            "status": "healthy",
+            "service": "webcat-demo",
+            "timestamp": time.time(),
+        }
+        yield _format_sse_message("data", data=basic_health)
+        yield _format_sse_message("complete", message="Basic health check completed")
+
+
+def _get_server_info() -> dict:
+    """Get server information dictionary.
+
+    Returns:
+        Server info dictionary
+    """
+    return {
+        "service": "WebCat MCP Demo Server",
+        "version": "2.2.0",
+        "status": "connected",
+        "operations": ["search", "health"],
+        "timestamp": time.time(),
+    }
+
+
+async def _generate_webcat_stream(
+    webcat_functions, operation: str, query: str, max_results: int
+):
+    """Generate SSE stream for WebCat operations.
+
+    Args:
+        webcat_functions: Dictionary of WebCat functions
+        operation: Operation to perform
+        query: Search query
+        max_results: Maximum results
+
+    Yields:
+        SSE formatted messages
+    """
+    try:
+        # Send connection message
+        yield _format_sse_message(
+            "connection",
+            status="connected",
+            message="WebCat stream started",
+            operation=operation,
+        )
+
+        if operation == "search" and query:
+            search_func = webcat_functions.get("search")
+            if search_func:
+                async for msg in _handle_search_operation(
+                    search_func, query, max_results
+                ):
+                    yield msg
+            else:
+                yield _format_sse_message(
+                    "error", message="Search function not available"
+                )
+
+        elif operation == "health":
+            health_func = webcat_functions.get("health_check")
+            async for msg in _handle_health_operation(health_func):
+                yield msg
+
+        else:
+            # Just connection - send server info
+            yield _format_sse_message("data", data=_get_server_info())
+            yield _format_sse_message("complete", message="Connection established")
+
+        # Keep alive with heartbeat
+        heartbeat_count = 0
+        while True:
+            await asyncio.sleep(30)
+            heartbeat_count += 1
+            yield _format_sse_message(
+                "heartbeat", timestamp=time.time(), count=heartbeat_count


🛠️ Refactor suggestion | 🟠 Major

Add the missing type hints per project standard.

Our Python guideline requires every function signature to be fully annotated. Please add the concrete typing for the new helpers (_handle_search_operation, _handle_health_operation, _generate_webcat_stream, and the _get_server_info return type) so mypy stays green and usage is clear. Suggested patch:

@@ -import json -import logging +import json +import logging +from typing import Any, AsyncGenerator, Awaitable, Callable, Dict, Mapping, Optional @@ -async def _handle_search_operation(search_func, query: str, max_results: int): +async def _handle_search_operation( + search_func: Callable[[str], Awaitable[Dict[str, Any]]], + query: str, + max_results: int, +) -> AsyncGenerator[str, None]: @@ -async def _handle_health_operation(health_func): +async def _handle_health_operation( + health_func: Optional[Callable[[], Awaitable[Dict[str, Any]]]], +) -> AsyncGenerator[str, None]: @@ -def _get_server_info() -> dict: +def _get_server_info() -> Dict[str, Any]: @@ -async def _generate_webcat_stream( - webcat_functions, operation: str, query: str, max_results: int -): +async def _generate_webcat_stream( + webcat_functions: Mapping[str, Callable[..., Awaitable[Any]]], + operation: str, + query: str, + max_results: int, +) -> AsyncGenerator[str, None]:

As per coding guidelines

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

async def _handle_search_operation(search_func, query: str, max_results: int):

"""Handle search operation and yield results.

Args:

search_func: Search function to call

query: Search query

max_results: Maximum results to return

Yields:

SSE formatted messages

"""

yield _format_sse_message("status", message=f"Searching for: {query}")

result = await search_func(query)

# Limit results if needed

if result.get("results") and len(result["results"]) > max_results:

result["results"] = result["results"][:max_results]

result["note"] = f"Results limited to {max_results} items"

yield _format_sse_message("data", data=result)

num_results = len(result.get("results", []))

yield _format_sse_message(

"complete", message=f"Search completed. Found {num_results} results."

)

async def _handle_health_operation(health_func):

"""Handle health check operation.

Args:

health_func: Health check function to call (or None)

Yields:

SSE formatted messages

"""

yield _format_sse_message("status", message="Checking server health...")

if health_func:

result = await health_func()

yield _format_sse_message("data", data=result)

yield _format_sse_message("complete", message="Health check completed")

else:

basic_health = {

"status": "healthy",

"service": "webcat-demo",

"timestamp": time.time(),

}

yield _format_sse_message("data", data=basic_health)

yield _format_sse_message("complete", message="Basic health check completed")

def _get_server_info() -> dict:

"""Get server information dictionary.

Returns:

Server info dictionary

"""

return {

"service": "WebCat MCP Demo Server",

"version": "2.2.0",

"status": "connected",

"operations": ["search", "health"],

"timestamp": time.time(),

}

async def _generate_webcat_stream(

webcat_functions, operation: str, query: str, max_results: int

):

"""Generate SSE stream for WebCat operations.

Args:

webcat_functions: Dictionary of WebCat functions

operation: Operation to perform

query: Search query

max_results: Maximum results

Yields:

SSE formatted messages

"""

try:

# Send connection message

yield _format_sse_message(

"connection",

status="connected",

message="WebCat stream started",

operation=operation,

)

if operation == "search" and query:

search_func = webcat_functions.get("search")

if search_func:

async for msg in _handle_search_operation(

search_func, query, max_results

):

yield msg

else:

yield _format_sse_message(

"error", message="Search function not available"

)

elif operation == "health":

health_func = webcat_functions.get("health_check")

async for msg in _handle_health_operation(health_func):

yield msg

else:

# Just connection - send server info

yield _format_sse_message("data", data=_get_server_info())

yield _format_sse_message("complete", message="Connection established")

# Keep alive with heartbeat

heartbeat_count = 0

while True:

await asyncio.sleep(30)

heartbeat_count += 1

yield _format_sse_message(

"heartbeat", timestamp=time.time(), count=heartbeat_count

import json

import logging

from typing import Any, AsyncGenerator, Awaitable, Callable, Dict, Mapping, Optional

async def _handle_search_operation(

search_func: Callable[[str], Awaitable[Dict[str, Any]]],

query: str,

max_results: int,

) -> AsyncGenerator[str, None]:

"""Handle search operation and yield results.

Args:

search_func: Search function to call

query: Search query

max_results: Maximum results to return

Yields:

SSE formatted messages

"""

yield _format_sse_message("status", message=f"Searching for: {query}")

result = await search_func(query)

# Limit results if needed

if result.get("results") and len(result["results"]) > max_results:

result["results"] = result["results"][:max_results]

result["note"] = f"Results limited to {max_results} items"

yield _format_sse_message("data", data=result)

num_results = len(result.get("results", []))

yield _format_sse_message(

"complete", message=f"Search completed. Found {num_results} results."

)

async def _handle_health_operation(

health_func: Optional[Callable[[], Awaitable[Dict[str, Any]]]],

) -> AsyncGenerator[str, None]:

"""Handle health check operation.

Args:

health_func: Health check function to call (or None)

Yields:

SSE formatted messages

"""

yield _format_sse_message("status", message="Checking server health...")

if health_func:

result = await health_func()

yield _format_sse_message("data", data=result)

yield _format_sse_message("complete", message="Health check completed")

else:

basic_health = {

"status": "healthy",

"service": "webcat-demo",

"timestamp": time.time(),

}

yield _format_sse_message("data", data=basic_health)

yield _format_sse_message("complete", message="Basic health check completed")

def _get_server_info() -> Dict[str, Any]:

"""Get server information dictionary.

Returns:

Server info dictionary

"""

return {

"service": "WebCat MCP Demo Server",

"version": "2.2.0",

"status": "connected",

"operations": ["search", "health"],

"timestamp": time.time(),

}

async def _generate_webcat_stream(

webcat_functions: Mapping[str, Callable[..., Awaitable[Any]]],

operation: str,

query: str,

max_results: int,

) -> AsyncGenerator[str, None]:

"""Generate SSE stream for WebCat operations.

Args:

webcat_functions: Dictionary of WebCat functions

operation: Operation to perform

query: Search query

max_results: Maximum results

Yields:

SSE formatted messages

"""

try:

# Send connection message

yield _format_sse_message(

"connection",

status="connected",

message="WebCat stream started",

operation=operation,

)

if operation == "search" and query:

search_func = webcat_functions.get("search")

if search_func:

async for msg in _handle_search_operation(

search_func, query, max_results

):

yield msg

else:

yield _format_sse_message(

"error", message="Search function not available"

)

elif operation == "health":

health_func = webcat_functions.get("health_check")

async for msg in _handle_health_operation(health_func):

yield msg

else:

# Just connection - send server info

yield _format_sse_message("data", data=_get_server_info())

yield _format_sse_message("complete", message="Connection established")

# Keep alive with heartbeat

heartbeat_count = 0

while True:

await asyncio.sleep(30)

heartbeat_count += 1

yield _format_sse_message(

"heartbeat", timestamp=time.time(), count=heartbeat_count

)

🤖 Prompt for AI Agents

In docker/demo_server.py lines 78-196, the new helper functions lack type hints; add concrete annotations to satisfy project typing rules. Annotate _handle_search_operation as accepting search_func: Callable[[str], Awaitable[Dict[str, Any]]], query: str, max_results: int and returning AsyncGenerator[str, None]; annotate _handle_health_operation as accepting health_func: Optional[Callable[[], Awaitable[Dict[str, Any]]]] and returning AsyncGenerator[str, None]; annotate _generate_webcat_stream as accepting webcat_functions: Dict[str, Callable[..., Awaitable[Dict[str, Any]]]], operation: str, query: str, max_results: int and returning AsyncGenerator[str, None]; annotate _get_server_info return type as Dict[str, Any]; ensure required typing imports (AsyncGenerator, Awaitable, Callable, Dict, Optional, Any) are added at the top of the file.

coderabbitai · 2025-10-03T21:26:54Z

docker/services/content_scraper.py

+    if len(content) > MAX_CONTENT_LENGTH:
+        return content[:MAX_CONTENT_LENGTH] + "... [content truncated]"
+    return content


⚠️ Potential issue | 🟠 Major

Fix truncation to honor MAX_CONTENT_LENGTH.

We slice to the max limit and then append the ellipsis, so the final string overshoots MAX_CONTENT_LENGTH. That breaks the contract of the cap (and defeats downstream safeguards relying on it). Trim enough characters before appending the suffix.

- if len(content) > MAX_CONTENT_LENGTH: - return content[:MAX_CONTENT_LENGTH] + "... [content truncated]" + if len(content) > MAX_CONTENT_LENGTH: + suffix = "... [content truncated]" + if len(suffix) >= MAX_CONTENT_LENGTH: + return suffix[:MAX_CONTENT_LENGTH] + return content[: MAX_CONTENT_LENGTH - len(suffix)] + suffix

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

if len(content) > MAX_CONTENT_LENGTH:

return content[:MAX_CONTENT_LENGTH] + "... [content truncated]"

return content

if len(content) > MAX_CONTENT_LENGTH:

suffix = "... [content truncated]"

if len(suffix) >= MAX_CONTENT_LENGTH:

return suffix[:MAX_CONTENT_LENGTH]

return content[:MAX_CONTENT_LENGTH - len(suffix)] + suffix

return content

🤖 Prompt for AI Agents

In docker/services/content_scraper.py around lines 219 to 221, the truncation logic appends the ellipsis/suffix after slicing to MAX_CONTENT_LENGTH which causes the returned string to exceed MAX_CONTENT_LENGTH; change to compute the suffix (e.g. "... [content truncated]") and if MAX_CONTENT_LENGTH is larger than the suffix length, slice content to MAX_CONTENT_LENGTH - len(suffix) and append the suffix, otherwise return the suffix trimmed to MAX_CONTENT_LENGTH so the final string never exceeds MAX_CONTENT_LENGTH.

coderabbitai · 2025-10-03T21:26:55Z

docker/tests/factories/http_response_factory.py

+        content: bytes = b"<html><body><p>Test content</p></body></html>",
+        headers: dict = None,
    ) -> MockHttpResponse:
-        """Create a successful HTTP response."""
+        """Create successful HTML response.
+
+        Args:
+            content: HTML content as bytes
+            headers: Response headers
+
+        Returns:
+            MockHttpResponse with 200 status
+        """
+        default_headers = {"Content-Type": "text/html"}
+        if headers:
+            default_headers.update(headers)
        return MockHttpResponse(
-            status_code=status_code,
-            content=content.encode("utf-8"),
-            text=content,
-            headers={"Content-Type": content_type},
-            should_raise_for_status=False,
+            content=content, headers=default_headers, status_code=200
        )


🛠️ Refactor suggestion | 🟠 Major

Tighten the headers type.

headers: dict = None trips our typing checks. Please change it to headers: Optional[Dict[str, str]] = None (and import Dict, Optional) so mypy stays happy.

🧰 Tools

🪛 Ruff (0.13.2)

19-19: PEP 484 prohibits implicit Optional

Convert to T | None

(RUF013)

🤖 Prompt for AI Agents

In docker/tests/factories/http_response_factory.py around lines 18 to 35, the parameter annotation uses a bare dict which fails mypy; change the signature to use headers: Optional[Dict[str, str]] = None and add the corresponding typing imports (from typing import Dict, Optional) at the top of the file; keep the existing runtime behavior (default_headers and update) unchanged.

coderabbitai · 2025-10-03T21:26:55Z

docker/tests/factories/http_response_factory.py

+    def connection_error(message: str = "Connection error"):
+        """Create exception for connection error.
+
+        Args:
+            message: Error message
+
+        Returns:
+            RequestException to be used with side_effect
        """
-        Create a Timeout exception.
+        return requests.exceptions.RequestException(message)



🛠️ Refactor suggestion | 🟠 Major

Add the return annotation.

connection_error should declare it returns requests.exceptions.RequestException to satisfy our type-hint policy.

🤖 Prompt for AI Agents

In docker/tests/factories/http_response_factory.py around lines 68 to 78, the function connection_error is missing a return type annotation; update the signature to declare it returns requests.exceptions.RequestException (e.g. def connection_error(message: str = "Connection error") -> requests.exceptions.RequestException:) and ensure the requests module is imported/available in the file so the annotation resolves.

coderabbitai · 2025-10-03T21:26:55Z

docker/tests/factories/http_response_factory.py

+    def timeout():
+        """Create timeout exception.
+
+        Returns:
+            Timeout exception to be used with side_effect
        """
-        return requests.Timeout("Request timed out after 5 seconds")
+        return requests.exceptions.Timeout("Request timed out")


🛠️ Refactor suggestion | 🟠 Major

Type the timeout helper.

Likewise, timeout() needs an explicit return type (e.g., -> requests.exceptions.Timeout) per the typing guidelines.

🤖 Prompt for AI Agents

In docker/tests/factories/http_response_factory.py around lines 80 to 86, the timeout() helper is missing an explicit return type; add a return type annotation -> requests.exceptions.Timeout to the function signature so it conforms to the typing guidelines, e.g., adjust the def line to include that return type and keep the existing docstring and return statement unchanged.

coderabbitai · 2025-10-03T21:26:55Z

docker/tests/factories/mock_http_response.py

+    def json(self) -> Dict[str, Any]:
+        """Return empty dict for JSON responses."""
+        return {}


⚠️ Potential issue | 🟠 Major

Restore JSON parsing behaviour.

Returning {} ignores any JSON payload configured on the mock and breaks callers that expect the text/body to be parsed. Please revert to decoding self.text (fall back to self.content when needed), for example:

def json(self) -> Dict[str, Any]: - """Return empty dict for JSON responses.""" - return {} + """Return parsed JSON payload.""" + if self.text: + return json.loads(self.text) + if self.content: + return json.loads(self.content.decode()) + return {}

(Import json at the top.) This keeps the mock faithful to requests.Response.json().

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

def json(self) -> Dict[str, Any]:

"""Return empty dict for JSON responses."""

return {}

def json(self) -> Dict[str, Any]:

"""Return parsed JSON payload."""

if self.text:

return json.loads(self.text)

if self.content:

return json.loads(self.content.decode())

return {}

🤖 Prompt for AI Agents

In docker/tests/factories/mock_http_response.py around lines 42-44, the json() method currently returns a hardcoded {} which ignores any configured payload; restore real JSON parsing by importing json at the top and replacing the body with code that attempts to parse self.text via json.loads(self.text) and, if self.text is empty, fall back to decoding self.content to a str and parsing that; let json decoding errors propagate the same way requests.Response.json() would (do not swallow exceptions) so callers receive the same behavior as the real requests client.

coderabbitai · 2025-10-03T21:26:55Z

docker/tests/factories/mock_serper_response.py

+    def raise_for_status(self):
+        """Raise HTTPError if status code indicates error."""
+        if self.status_code >= 400:
+            import requests
+
+            raise requests.HTTPError(f"{self.status_code} Error")


🛠️ Refactor suggestion | 🟠 Major

Add return type annotation for raise_for_status.

This helper is missing a return annotation, which violates the project requirement that every function signature be fully typed. Please add -> None to keep mypy and the shared guidelines happy. As per coding guidelines.

- def raise_for_status(self): + def raise_for_status(self) -> None:

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

def raise_for_status(self):

"""Raise HTTPError if status code indicates error."""

if self.status_code >= 400:

import requests

raise requests.HTTPError(f"{self.status_code} Error")

def raise_for_status(self) -> None:

"""Raise HTTPError if status code indicates error."""

if self.status_code >= 400:

import requests

raise requests.HTTPError(f"{self.status_code} Error")

🧰 Tools

🪛 Ruff (0.13.2)

37-37: Avoid specifying long messages outside the exception class

(TRY003)

🤖 Prompt for AI Agents

In docker/tests/factories/mock_serper_response.py around lines 32 to 37, the method raise_for_status is missing a return type annotation; update its signature to include -> None so the function is fully typed per project guidelines, leaving implementation unchanged.

fix: Mock html2text in content truncation test

T-rav and others added 9 commits October 3, 2025 14:30

fix: Format code with black

827b532

github-advanced-security bot found potential problems Oct 3, 2025

View reviewed changes

T-rav and others added 3 commits October 3, 2025 15:15

coderabbitai bot reviewed Oct 3, 2025

View reviewed changes

T-rav merged commit ace28ad into main Oct 3, 2025
11 checks passed

T-rav deleted the fix/html2text-mock branch October 3, 2025 21:28

T-rav added a commit that referenced this pull request Oct 10, 2025

Merge pull request #28 from Kode-Rex/fix/html2text-mock

a172edd

fix: Mock html2text in content truncation test

@@ -67,18 +67,11 @@
                 )
-            def _client_error_response(error: str) -> JSONResponse:
-                """Create response for client loading error.
-                Args:
-                    error: Error message
-                Returns:
-                    JSONResponse with 500 status
-                """
+            def _client_error_response() -> JSONResponse:
+                """Create response for client loading error (generic, without exposing exception details)."""
                 return JSONResponse(
                     status_code=500,
-                    content={"error": "Failed to serve WebCat client", "details": error},
+                    content={"error": "Failed to serve WebCat client"},
                 )
@@ -222,7 +213,7 @@
                         return _client_not_found_response(client_path)
                     except Exception as e:
                         logger.error(f"Failed to serve WebCat client: {str(e)}")
-                        return _client_error_response(str(e))
+                        return _client_error_response()
                 @app.get("/status")
                 async def server_status():

@@ -230,8 +230,8 @@
                     try:
                         return _get_detailed_status()
                     except Exception as e:
-                        logger.error(f"Status check failed: {str(e)}")
-                        return JSONResponse(status_code=500, content=_get_status_error(str(e)))
+                        logger.error("Status check failed", exc_info=True)
+                        return JSONResponse(status_code=500, content=_get_status_error("Internal server error"))
                 @app.get("/")
                 async def root():

@@ -205,7 +205,7 @@
                         return _get_health_status()
                     except Exception as e:
                         logger.error(f"Health check failed: {str(e)}")
-                        return JSONResponse(status_code=500, content=_get_unhealthy_status(str(e)))
+                        return JSONResponse(status_code=500, content=_get_unhealthy_status("An internal error has occurred"))
                 @app.get("/client")
                 async def sse_client():

Conversation

T-rav commented Oct 3, 2025 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by CodeRabbit

Uh oh!

coderabbitai bot commented Oct 3, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Possibly related PRs

Poem

Pre-merge checks and finishing touches

Uh oh!

Check warning

Uh oh!

Copilot Autofix

Check warning

Uh oh!

Copilot Autofix

Check warning

Uh oh!

Copilot Autofix

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Oct 3, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Oct 3, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Oct 3, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Oct 3, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Oct 3, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Oct 3, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Oct 3, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Oct 3, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Oct 3, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Oct 3, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

T-rav commented Oct 3, 2025 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Oct 3, 2025 •

edited

Loading