Skip to content

fix: Mock html2text in content truncation test#28

Merged
T-rav merged 13 commits intomainfrom
fix/html2text-mock
Oct 3, 2025
Merged

fix: Mock html2text in content truncation test#28
T-rav merged 13 commits intomainfrom
fix/html2text-mock

Conversation

@T-rav
Copy link
Collaborator

@T-rav T-rav commented Oct 3, 2025

Fixes test failure on main after dependabot merge.

  • Add @patch for html2text.HTML2Text in truncation test
  • Mock handle() to return large content for proper testing
  • Test was failing because html2text wasn't mocked, causing MagicMock to leak into assertions

All 44 tests now pass locally and should pass in CI.

Summary by CodeRabbit

  • New Features

    • Improved search and streaming responses with more consistent formatting and reliable fallback sources.
  • Bug Fixes

    • Enhanced content scraping to better handle HTML, plain text, PDFs and long content, reducing truncation and parsing errors.
    • More robust health and client endpoints with clearer, standardized responses.
  • Tests

    • Expanded and stabilized test coverage with deterministic mocks and new test builders/factories to improve reliability.
  • Documentation

    • Added architecture and testing guidelines to developer docs.

- Add @patch for html2text.HTML2Text in truncation test
- Mock handle() to return large content for proper testing
- Fixes test failure on CI after dependabot merge

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
@coderabbitai
Copy link

coderabbitai bot commented Oct 3, 2025

Walkthrough

Refactors and helper extraction across scraping, search, streaming, health, and client code; adds numerous test factories/builders/mocks and updates unit tests to use them; small doc updates. Changes introduce many private helpers, centralize logic, and standardize test fixtures and SSE message formatting.

Changes

Cohort / File(s) Summary
Content scraping service
docker/services/content_scraper.py
Large refactor: extracted many private helpers (_fetch_content, _configure_html2text, _convert_with_readability, _convert_fallback, _truncate_if_needed, _process_code_blocks, _process_math_elements, etc.) and rewrote scrape_search_result to use them; added explicit content-type handling and truncation.
Streaming / demo servers
docker/demo_server.py, docker/simple_demo.py
Extracted SSE helpers and operation-specific handlers (_format_sse_message, _generate_webcat_stream, _handle_search_operation, _handle_health_operation, _get_server_info); replaced inline generators with centralized generator and improved error handling and standardized SSE messages.
API/search orchestration helpers
docker/api_tools.py
Added internal async/worker helpers and formatters (_fetch_with_serper, _fetch_with_duckduckgo, _format_no_results_error, _format_search_error, _process_and_format_results) and refactored create_webcat_functions to use them.
Clients: Serper / DuckDuckGo
docker/clients/serper_client.py, docker/clients/duckduckgo_client.py
Introduced conversion helpers (_convert_organic_results, _convert_ddg_result) and refactored result-mapping to use these helpers (delegating APISearchResult construction).
Health & info endpoints
docker/health.py
Split health/status/client/root handlers into many helpers (_get_health_status, _get_unhealthy_status, _load_client_file, _get_detailed_status, _get_root_info, etc.) and adjusted endpoints to call them.
Unit tests — content scraper update
docker/tests/unit/services/test_content_scraper.py
Added @patch("services.content_scraper.html2text.HTML2Text") to test_truncates_content_exceeding_max_length and updated test signature to accept mock_html2text; mocked HTML2Text.handle output to control truncation assertions.
Tests — builders, factories, mocks
docker/tests/builders/api_search_result_builder.py, docker/tests/factories/*, docker/tests/factories/mock_http_response.py, docker/tests/factories/mock_serper_response.py, docker/tests/factories/...
Added APISearchResultBuilder and multiple test factories/mocks: DDGSFactory, MockDDGS, MockDDGSClass, MockDDGSContextManager, SerperResponseFactory, MockSerperResponse, many HTTP response factory helpers (success, html_with_title, pdf, with_http_error, error_404, timeout, connection_error), and simplified/typed MockHttpResponse. Tests updated to use these factories.
Tests — unit updates to use builders/factories
docker/tests/unit/clients/test_duckduckgo_client.py, docker/tests/unit/clients/test_serper_client.py, docker/tests/unit/services/test_search_processor.py, docker/tests/unit/services/test_search_service.py, docker/tests/unit/tools/test_search_tool.py, docker/tests/test_mcp_server.py
Replaced inline/mocked responses with the new factories/builders (DDGSFactory, SerperResponseFactory, APISearchResultBuilder, HttpResponseFactory) and updated assertions/patch usage accordingly; cleaned imports.
Docs
CLAUDE.md
Documentation additions: Architecture Standards, No Raw Mocks guidance, Single Source of Truth/tooling notes; no code changes.

Sequence Diagram(s)

sequenceDiagram
    participant ScraperAPI as scrape_search_result
    participant HTTP as _fetch_content (requests)
    participant Readability as _convert_with_readability
    participant HTML2Text as _configure_html2text / html2text.HTML2Text
    participant Fallback as _convert_fallback
    participant Trunc as _truncate_if_needed

    ScraperAPI->>HTTP: GET url with browser headers
    alt binary/pdf
        HTTP-->>ScraperAPI: binary content
        ScraperAPI->>ScraperAPI: _handle_binary_content -> markdown/pdf metadata
    else plaintext
        HTTP-->>ScraperAPI: text/plain
        ScraperAPI->>ScraperAPI: _handle_plain_text -> markdown
    else html
        HTTP-->>ScraperAPI: html content
        ScraperAPI->>Readability: try extract main content
        alt Readability success
            Readability-->>ScraperAPI: extracted HTML
            ScraperAPI->>HTML2Text: _configure_html2text + convert
            HTML2Text-->>ScraperAPI: markdown
        else Readability fail
            Readability-->>ScraperAPI: error
            ScraperAPI->>Fallback: _convert_fallback (title + html2text)
            Fallback-->>ScraperAPI: markdown
        end
    end
    ScraperAPI->>Trunc: _truncate_if_needed(markdown)
    Trunc-->>ScraperAPI: final markdown + metadata / error
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Possibly related PRs

  • Refactor #27 — touches services.content_scraper and related tests that patch html2text.HTML2Text; strongly related to the test and scraper refactor in this diff.

Poem

I nibble docs and craft a test,
Patch HTML to tame the rest.
Helpers hop, pipelines neat—
Truncate, convert, serve the treat.
A rabbit cheers with carrot zest 🥕

Pre-merge checks and finishing touches

✅ Passed checks (3 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title Check ✅ Passed The title concisely describes the key change—mocking html2text in the content truncation test—and matches the modifications in the test file without extraneous details.
Docstring Coverage ✅ Passed Docstring coverage is 81.74% which is sufficient. The required threshold is 80.00%.
✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch fix/html2text-mock

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

T-rav and others added 9 commits October 3, 2025 14:30
- Create MockSerperResponse and SerperResponseFactory for Serper tests
- Create MockDDGS, MockDDGSClass and DDGSFactory for DuckDuckGo tests
- Remove all raw MagicMock property assignments (mock.json.return_value = ...)
- Use factory methods instead: SerperResponseFactory.two_results()
- Follows builder/factory pattern: one class per file, no raw mocks

All 44 tests pass locally.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
- Create APISearchResultBuilder with fluent interface
- Add a_serper_result() and a_duckduckgo_result() pre-configured builders
- Replace inline APISearchResult() creation in test_search_service.py
- All tests still passing

Remaining inline creations in:
- tests/unit/services/test_search_processor.py (3 instances)
- tests/unit/tools/test_search_tool.py (2 instances)
- tests/unit/models/test_models.py (1 instance for testing model itself)

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
…tion

- Replace all inline APISearchResult() calls with builder pattern
- Updated test_search_processor.py (3 instances)
- Updated test_search_tool.py (2 instances)
- Now using an_api_search_result().with_*().build() everywhere

All inline object creation eliminated except test_models.py (acceptable - testing the model itself).

No more raw mocks or inline test data creation. Full builder/factory pattern compliance.

All 44 tests pass.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
- Split mock_ddgs.py (3 classes) into 3 separate files:
  - mock_ddgs_search_client.py (MockDDGS)
  - mock_ddgs_context_manager.py (MockDDGSContextManager)
  - mock_ddgs_class.py (MockDDGSClass)
- Extract _convert_organic_results() in serper_client.py to reduce nesting

Now fully compliant with 1 class per file rule.

Remaining: services/content_scraper.py has 6-level nesting (needs extraction)

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
Refactored scrape_search_result() from 165-line function with 6-level
nesting to well-organized helper methods with max 2-level nesting.

Changes:
- Extract _fetch_content() for HTTP requests
- Extract _handle_plain_text() and _handle_binary_content() for special content types
- Extract _configure_html2text() for converter configuration
- Extract _extract_language_from_classes() to reduce code block processing nesting
- Extract _process_code_blocks() and _process_math_elements() for HTML preprocessing
- Extract _convert_to_markdown() for markdown conversion pipeline
- Extract _convert_with_readability() and _convert_fallback() for content extraction strategies
- Extract _truncate_if_needed() for content length management

Results:
- Reduced nesting from 6 levels to 2 levels (well under 3-level limit)
- Main function now 40 lines (was 165 lines)
- Each helper method has single, clear responsibility
- All 44 unit tests pass

Follows "methods for concepts" principle and max 3-level nesting requirement.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
Document the architecture standards established during refactoring:
- Maximum 3-level nesting depth with method extraction
- One class per file (production and test code)
- Methods for concepts (single responsibility helpers)
- No raw mocks (typed mocks + factories + builders)

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
Fixed nesting and raw mock violations in production and test code:

**Production code fixes:**
- clients/duckduckgo_client.py: Extract _convert_ddg_result() helper
  to reduce nesting from 6 to 3 levels

**Test infrastructure fixes:**
- Create MockHttpResponse class for typed HTTP mocks
- Create HttpResponseFactory with methods: success(), plaintext(),
  html_with_title(), pdf(), error_404(), timeout(), connection_error()
- tests/test_mcp_server.py: Replace all MagicMock() usage with
  HttpResponseFactory

**Results:**
- All 44 unit tests pass
- No raw MagicMock() with property assignment
- All high/medium priority violations fixed
- Follows typed mocks + factory pattern standards

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
Reduced control flow nesting from 6 to 3 levels by extracting helpers:
- _get_health_status(), _get_unhealthy_status()
- _client_not_found_response(), _client_error_response()
- _load_client_file()
- _get_server_configuration(), _get_server_endpoints(), _get_server_capabilities()
- _get_detailed_status(), _get_status_error()
- _get_root_info()

Results:
- Max nesting now 3 levels (was 6)
- Each helper has single, clear responsibility
- All endpoints simplified to try/return pattern

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
"""
return JSONResponse(
status_code=500,
content={"error": "Failed to serve WebCat client", "details": error},

Check warning

Code scanning / CodeQL

Information exposure through an exception Medium

Stack trace information
flows to this location and may be exposed to an external user.

Copilot Autofix

AI 4 months ago

To fix the problem, we need to avoid exposing potentially sensitive exception messages to end users through the API. Instead, we should log the complete exception details on the server for debugging and provide users with a generic error message. The specific fix involves modifying the _client_error_response function (lines 70-83) to not include the raw exception details in the API response. Additionally, in the /client endpoint handler (lines 210-226), when catching an exception, log the exception details and return a generic error to the client.

  • Update _client_error_response(error: str) so it does not transmit the exception details, only a generic message.
  • At the exception handling site (line 223+), ensure that the raw exception is logged but the client only receives the generic error response.
  • Fix only the code shown in the snippet, without changing the overall functionality: logging stays; the only alteration is what is returned to the user.

No new imports or dependencies are required; use the existing logger.


Suggested changeset 1
docker/health.py

Autofix patch

Autofix patch
Run the following command in your local git repository to apply this patch
cat << 'EOF' | git apply
diff --git a/docker/health.py b/docker/health.py
--- a/docker/health.py
+++ b/docker/health.py
@@ -67,18 +67,11 @@
     )
 
 
-def _client_error_response(error: str) -> JSONResponse:
-    """Create response for client loading error.
-
-    Args:
-        error: Error message
-
-    Returns:
-        JSONResponse with 500 status
-    """
+def _client_error_response() -> JSONResponse:
+    """Create response for client loading error (generic, without exposing exception details)."""
     return JSONResponse(
         status_code=500,
-        content={"error": "Failed to serve WebCat client", "details": error},
+        content={"error": "Failed to serve WebCat client"},
     )
 
 
@@ -222,7 +213,7 @@
             return _client_not_found_response(client_path)
         except Exception as e:
             logger.error(f"Failed to serve WebCat client: {str(e)}")
-            return _client_error_response(str(e))
+            return _client_error_response()
 
     @app.get("/status")
     async def server_status():
EOF
@@ -67,18 +67,11 @@
)


def _client_error_response(error: str) -> JSONResponse:
"""Create response for client loading error.

Args:
error: Error message

Returns:
JSONResponse with 500 status
"""
def _client_error_response() -> JSONResponse:
"""Create response for client loading error (generic, without exposing exception details)."""
return JSONResponse(
status_code=500,
content={"error": "Failed to serve WebCat client", "details": error},
content={"error": "Failed to serve WebCat client"},
)


@@ -222,7 +213,7 @@
return _client_not_found_response(client_path)
except Exception as e:
logger.error(f"Failed to serve WebCat client: {str(e)}")
return _client_error_response(str(e))
return _client_error_response()

@app.get("/status")
async def server_status():
Copilot is powered by AI and may make mistakes. Always verify output.
"timestamp": time.time(),
},
)
return JSONResponse(status_code=500, content=_get_unhealthy_status(str(e)))

Check warning

Code scanning / CodeQL

Information exposure through an exception Medium

Stack trace information
flows to this location and may be exposed to an external user.

Copilot Autofix

AI 4 months ago

To fix this problem, we should stop passing str(e) (the string representation of the exception) to the client in the JSON response.
Instead, pass a generic error message such as "An internal error has occurred" (or similar) to the client, while logging the specific exception detail (and optionally traceback) for server-side debugging.
Specifically, in setup_health_endpoints, inside the /health endpoint, change:

return JSONResponse(status_code=500, content=_get_unhealthy_status(str(e)))

to:

return JSONResponse(status_code=500, content=_get_unhealthy_status("An internal error has occurred"))

and keep the logging as-is.

No change to imports or new methods is needed.


Suggested changeset 1
docker/health.py

Autofix patch

Autofix patch
Run the following command in your local git repository to apply this patch
cat << 'EOF' | git apply
diff --git a/docker/health.py b/docker/health.py
--- a/docker/health.py
+++ b/docker/health.py
@@ -205,7 +205,7 @@
             return _get_health_status()
         except Exception as e:
             logger.error(f"Health check failed: {str(e)}")
-            return JSONResponse(status_code=500, content=_get_unhealthy_status(str(e)))
+            return JSONResponse(status_code=500, content=_get_unhealthy_status("An internal error has occurred"))
 
     @app.get("/client")
     async def sse_client():
EOF
@@ -205,7 +205,7 @@
return _get_health_status()
except Exception as e:
logger.error(f"Health check failed: {str(e)}")
return JSONResponse(status_code=500, content=_get_unhealthy_status(str(e)))
return JSONResponse(status_code=500, content=_get_unhealthy_status("An internal error has occurred"))

@app.get("/client")
async def sse_client():
Copilot is powered by AI and may make mistakes. Always verify output.
"timestamp": time.time(),
},
)
return JSONResponse(status_code=500, content=_get_status_error(str(e)))

Check warning

Code scanning / CodeQL

Information exposure through an exception Medium

Stack trace information
flows to this location and may be exposed to an external user.

Copilot Autofix

AI 4 months ago

The correct way to fix this issue is to ensure that only generic error information is sent to the client, with specific or sensitive error details logged for internal diagnosis.

  • In file docker/health.py, specifically in the /status endpoint handler, modify the error response so that the "details" field does not include str(e) (exception's string representation).
  • Instead, return a general message to the client (e.g., "Internal server error") while logging the full exception details (including a stack trace) server-side using the logging module.
  • Only edit the code in the region shown above, specifically the try/except in the /status endpoint, and the argument passed to _get_status_error.
  • No additional imports are necessary as logging is already set up.

Suggested changeset 1
docker/health.py

Autofix patch

Autofix patch
Run the following command in your local git repository to apply this patch
cat << 'EOF' | git apply
diff --git a/docker/health.py b/docker/health.py
--- a/docker/health.py
+++ b/docker/health.py
@@ -230,8 +230,8 @@
         try:
             return _get_detailed_status()
         except Exception as e:
-            logger.error(f"Status check failed: {str(e)}")
-            return JSONResponse(status_code=500, content=_get_status_error(str(e)))
+            logger.error("Status check failed", exc_info=True)
+            return JSONResponse(status_code=500, content=_get_status_error("Internal server error"))
 
     @app.get("/")
     async def root():
EOF
@@ -230,8 +230,8 @@
try:
return _get_detailed_status()
except Exception as e:
logger.error(f"Status check failed: {str(e)}")
return JSONResponse(status_code=500, content=_get_status_error(str(e)))
logger.error("Status check failed", exc_info=True)
return JSONResponse(status_code=500, content=_get_status_error("Internal server error"))

@app.get("/")
async def root():
Copilot is powered by AI and may make mistakes. Always verify output.
T-rav and others added 3 commits October 3, 2025 15:15
Reduced control flow nesting from 7 to 3 levels by extracting:
- _format_sse_message() for SSE message formatting
- _handle_search_operation() for search logic
- _handle_health_operation() for health checks
- _get_server_info() for server metadata
- _generate_webcat_stream() extracted to module level

Results:
- Max nesting reduced from 7 to 3 levels
- SSE generator logic now at module level
- Each operation handler has clear responsibility
- Simplified webcat_stream endpoint

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
Reduced control flow nesting from 7 to 3 levels by reusing helpers:
- _format_sse_message() for SSE message formatting
- _handle_search_operation() for search logic
- _handle_health_operation() for health checks
- _get_server_info() for server metadata
- _generate_webcat_stream() extracted to module level

Results:
- Max nesting reduced from 7 to 3 levels
- Simplified webcat_stream endpoint
- Consistent structure with demo_server.py

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
Reduced control flow nesting from 6 to 3 levels by extracting:
- _fetch_with_serper() for Serper API calls
- _fetch_with_duckduckgo() for DuckDuckGo fallback
- _format_no_results_error() for error formatting
- _format_search_error() for exception formatting
- _process_and_format_results() for result processing

Results:
- Max nesting reduced from 6 to 3 levels in search_function
- Each fetch method has clear responsibility
- Simplified error handling with dedicated formatters

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 13

♻️ Duplicate comments (1)
docker/health.py (1)

81-81: Information exposure through exception details.

Exposing raw exception messages (str(e)) in API responses can leak sensitive information such as stack traces, file paths, or internal implementation details. Consider sanitizing error messages before returning them to external users.

For example, use generic messages in production:

-        content={"error": "Failed to serve WebCat client", "details": error},
+        content={"error": "Failed to serve WebCat client", "details": "Internal server error"},

Or implement a sanitization function that filters sensitive patterns from error messages before exposing them.

Also applies to: 208-208, 234-234

🧹 Nitpick comments (3)
docker/health.py (1)

207-207: Use logging.exception for automatic traceback inclusion.

Replace logging.error(f"... {str(e)}") with logging.exception(...) to automatically include the exception traceback, which aids debugging.

Apply this pattern to all three error handlers:

-            logger.error(f"Health check failed: {str(e)}")
+            logger.exception("Health check failed")
-            logger.error(f"Failed to serve WebCat client: {str(e)}")
+            logger.exception("Failed to serve WebCat client")
-            logger.error(f"Status check failed: {str(e)}")
+            logger.exception("Status check failed")

Also applies to: 224-224, 233-233

docker/demo_server.py (1)

199-201: Use logger.exception when surfacing unexpected failures.

Switch to logger.exception so we capture the traceback while keeping the SSE error payload.

-    except Exception as e:
-        logger.error(f"Error in SSE stream: {str(e)}")
-        yield _format_sse_message("error", message=str(e))
+    except Exception as exc:
+        logger.exception("Error in SSE stream")
+        yield _format_sse_message("error", message=str(exc))
docker/simple_demo.py (1)

179-181: Capture tracebacks with logger.exception.

Match the other module and emit the traceback while still returning the SSE error payload.

-    except Exception as e:
-        logger.error(f"Error in SSE stream: {str(e)}")
-        yield _format_sse_message("error", message=str(e))
+    except Exception as exc:
+        logger.exception("Error in SSE stream")
+        yield _format_sse_message("error", message=str(exc))
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between cf7787f and 818c3d0.

📒 Files selected for processing (23)
  • CLAUDE.md (1 hunks)
  • docker/api_tools.py (1 hunks)
  • docker/clients/duckduckgo_client.py (2 hunks)
  • docker/clients/serper_client.py (2 hunks)
  • docker/demo_server.py (2 hunks)
  • docker/health.py (1 hunks)
  • docker/services/content_scraper.py (2 hunks)
  • docker/simple_demo.py (2 hunks)
  • docker/tests/builders/api_search_result_builder.py (1 hunks)
  • docker/tests/factories/ddgs_factory.py (1 hunks)
  • docker/tests/factories/http_response_factory.py (1 hunks)
  • docker/tests/factories/mock_ddgs_class.py (1 hunks)
  • docker/tests/factories/mock_ddgs_context_manager.py (1 hunks)
  • docker/tests/factories/mock_ddgs_search_client.py (1 hunks)
  • docker/tests/factories/mock_http_response.py (1 hunks)
  • docker/tests/factories/mock_serper_response.py (1 hunks)
  • docker/tests/factories/serper_response_factory.py (1 hunks)
  • docker/tests/test_mcp_server.py (5 hunks)
  • docker/tests/unit/clients/test_duckduckgo_client.py (4 hunks)
  • docker/tests/unit/clients/test_serper_client.py (5 hunks)
  • docker/tests/unit/services/test_search_processor.py (4 hunks)
  • docker/tests/unit/services/test_search_service.py (3 hunks)
  • docker/tests/unit/tools/test_search_tool.py (3 hunks)
✅ Files skipped from review due to trivial changes (1)
  • CLAUDE.md
🧰 Additional context used
📓 Path-based instructions (4)
**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

**/*.py: Format all Python code with Black (line-length=88; target Python 3.11)
Sort imports with isort using profile="black"
Require type hints for all function signatures (checked by mypy)
Use Google-style docstrings for public functions

Files:

  • docker/tests/unit/clients/test_serper_client.py
  • docker/tests/unit/services/test_search_service.py
  • docker/tests/unit/clients/test_duckduckgo_client.py
  • docker/tests/factories/mock_ddgs_search_client.py
  • docker/health.py
  • docker/tests/factories/ddgs_factory.py
  • docker/services/content_scraper.py
  • docker/tests/unit/tools/test_search_tool.py
  • docker/tests/unit/services/test_search_processor.py
  • docker/simple_demo.py
  • docker/tests/factories/mock_serper_response.py
  • docker/api_tools.py
  • docker/clients/duckduckgo_client.py
  • docker/tests/factories/mock_ddgs_context_manager.py
  • docker/tests/factories/mock_http_response.py
  • docker/tests/factories/mock_ddgs_class.py
  • docker/tests/factories/serper_response_factory.py
  • docker/tests/builders/api_search_result_builder.py
  • docker/tests/test_mcp_server.py
  • docker/demo_server.py
  • docker/tests/factories/http_response_factory.py
  • docker/clients/serper_client.py
docker/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

docker/**/*.py: Prefer absolute imports (e.g., from docker.mcp_server import ...)
All Python files in docker/ are included in coverage; add tests rather than excluding files

Files:

  • docker/tests/unit/clients/test_serper_client.py
  • docker/tests/unit/services/test_search_service.py
  • docker/tests/unit/clients/test_duckduckgo_client.py
  • docker/tests/factories/mock_ddgs_search_client.py
  • docker/health.py
  • docker/tests/factories/ddgs_factory.py
  • docker/services/content_scraper.py
  • docker/tests/unit/tools/test_search_tool.py
  • docker/tests/unit/services/test_search_processor.py
  • docker/simple_demo.py
  • docker/tests/factories/mock_serper_response.py
  • docker/api_tools.py
  • docker/clients/duckduckgo_client.py
  • docker/tests/factories/mock_ddgs_context_manager.py
  • docker/tests/factories/mock_http_response.py
  • docker/tests/factories/mock_ddgs_class.py
  • docker/tests/factories/serper_response_factory.py
  • docker/tests/builders/api_search_result_builder.py
  • docker/tests/test_mcp_server.py
  • docker/demo_server.py
  • docker/tests/factories/http_response_factory.py
  • docker/clients/serper_client.py
docker/tests/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

docker/tests/**/*.py: Use pytest markers: @pytest.mark.unit for unit tests, @pytest.mark.integration for integration tests, @pytest.mark.api for external API tests
Unit tests should mock external dependencies and run fast; avoid network calls
Tests that depend on external services must be marked @pytest.mark.integration
When patching requests, patch at the module-under-test level (e.g., @patch('mcp_server.requests.get'))

Files:

  • docker/tests/unit/clients/test_serper_client.py
  • docker/tests/unit/services/test_search_service.py
  • docker/tests/unit/clients/test_duckduckgo_client.py
  • docker/tests/factories/mock_ddgs_search_client.py
  • docker/tests/factories/ddgs_factory.py
  • docker/tests/unit/tools/test_search_tool.py
  • docker/tests/unit/services/test_search_processor.py
  • docker/tests/factories/mock_serper_response.py
  • docker/tests/factories/mock_ddgs_context_manager.py
  • docker/tests/factories/mock_http_response.py
  • docker/tests/factories/mock_ddgs_class.py
  • docker/tests/factories/serper_response_factory.py
  • docker/tests/builders/api_search_result_builder.py
  • docker/tests/test_mcp_server.py
  • docker/tests/factories/http_response_factory.py
docker/tests/test_mcp_server.py

📄 CodeRabbit inference engine (CLAUDE.md)

Add unit tests for new MCP tools in docker/tests/test_mcp_server.py (plus additional test files as needed)

Files:

  • docker/tests/test_mcp_server.py
🧠 Learnings (1)
📚 Learning: 2025-10-03T20:19:57.835Z
Learnt from: CR
PR: Kode-Rex/webcat#0
File: CLAUDE.md:0-0
Timestamp: 2025-10-03T20:19:57.835Z
Learning: Applies to docker/tests/test_mcp_server.py : Add unit tests for new MCP tools in docker/tests/test_mcp_server.py (plus additional test files as needed)

Applied to files:

  • docker/tests/test_mcp_server.py
🧬 Code graph analysis (19)
docker/tests/unit/clients/test_serper_client.py (2)
docker/clients/serper_client.py (1)
  • fetch_search_results (38-64)
docker/tests/factories/serper_response_factory.py (4)
  • SerperResponseFactory (13-74)
  • two_results (38-57)
  • empty (29-35)
  • with_results (17-26)
docker/tests/unit/services/test_search_service.py (2)
docker/tests/builders/api_search_result_builder.py (3)
  • a_duckduckgo_result (67-74)
  • a_serper_result (57-64)
  • build (43-49)
docker/services/search_service.py (1)
  • fetch_with_fallback (18-46)
docker/tests/unit/clients/test_duckduckgo_client.py (1)
docker/tests/factories/ddgs_factory.py (5)
  • DDGSFactory (14-83)
  • two_results (41-53)
  • empty (31-38)
  • with_exception (56-66)
  • with_results (18-28)
docker/tests/factories/ddgs_factory.py (2)
docker/tests/factories/mock_ddgs_class.py (1)
  • MockDDGSClass (12-21)
docker/tests/factories/mock_ddgs_search_client.py (1)
  • MockDDGS (11-47)
docker/services/content_scraper.py (1)
docker/tests/factories/http_response_factory.py (1)
  • timeout (80-86)
docker/tests/unit/tools/test_search_tool.py (1)
docker/tests/builders/api_search_result_builder.py (2)
  • an_api_search_result (52-54)
  • with_link (33-36)
docker/tests/unit/services/test_search_processor.py (1)
docker/tests/builders/api_search_result_builder.py (2)
  • an_api_search_result (52-54)
  • with_link (33-36)
docker/simple_demo.py (1)
docker/demo_server.py (5)
  • _format_sse_message (64-75)
  • _handle_search_operation (78-102)
  • _handle_health_operation (105-127)
  • _get_server_info (130-142)
  • _generate_webcat_stream (145-201)
docker/api_tools.py (3)
docker/clients/serper_client.py (1)
  • fetch_search_results (38-64)
customgpt/function_app.py (1)
  • fetch_duckduckgo_search_results (147-185)
docker/services/search_processor.py (1)
  • process_search_results (15-39)
docker/clients/duckduckgo_client.py (1)
docker/models/api_search_result.py (1)
  • APISearchResult (11-22)
docker/tests/factories/mock_ddgs_context_manager.py (1)
docker/tests/factories/mock_ddgs_search_client.py (1)
  • MockDDGS (11-47)
docker/tests/factories/mock_http_response.py (1)
docker/tests/factories/mock_ddgs_search_client.py (1)
  • text (29-47)
docker/tests/factories/mock_ddgs_class.py (2)
docker/tests/factories/mock_ddgs_context_manager.py (1)
  • MockDDGSContextManager (11-24)
docker/tests/factories/mock_ddgs_search_client.py (1)
  • MockDDGS (11-47)
docker/tests/factories/serper_response_factory.py (1)
docker/tests/factories/mock_serper_response.py (1)
  • MockSerperResponse (11-37)
docker/tests/builders/api_search_result_builder.py (1)
docker/models/api_search_result.py (1)
  • APISearchResult (11-22)
docker/tests/test_mcp_server.py (2)
docker/tests/factories/http_response_factory.py (4)
  • HttpResponseFactory (13-124)
  • success (17-35)
  • with_http_error (52-65)
  • plaintext (38-49)
docker/tests/conftest.py (1)
  • mock_get (67-68)
docker/demo_server.py (1)
docker/simple_demo.py (5)
  • _format_sse_message (44-55)
  • _handle_search_operation (58-82)
  • _handle_health_operation (85-107)
  • _get_server_info (110-122)
  • _generate_webcat_stream (125-181)
docker/tests/factories/http_response_factory.py (1)
docker/tests/factories/mock_http_response.py (1)
  • MockHttpResponse (11-44)
docker/clients/serper_client.py (1)
docker/models/api_search_result.py (1)
  • APISearchResult (11-22)
🪛 GitHub Check: CodeQL
docker/health.py

[warning] 81-81: Information exposure through an exception
Stack trace information flows to this location and may be exposed to an external user.


[warning] 208-208: Information exposure through an exception
Stack trace information flows to this location and may be exposed to an external user.


[warning] 234-234: Information exposure through an exception
Stack trace information flows to this location and may be exposed to an external user.

🪛 Ruff (0.13.2)
docker/health.py

109-109: Probable insecure usage of temporary file or directory: "/tmp"

(S108)


206-206: Do not catch blind exception: Exception

(BLE001)


207-207: Use logging.exception instead of logging.error

Replace with exception

(TRY400)


207-207: Use explicit conversion flag

Replace with conversion flag

(RUF010)


223-223: Do not catch blind exception: Exception

(BLE001)


224-224: Use logging.exception instead of logging.error

Replace with exception

(TRY400)


224-224: Use explicit conversion flag

Replace with conversion flag

(RUF010)


232-232: Do not catch blind exception: Exception

(BLE001)


233-233: Use logging.exception instead of logging.error

Replace with exception

(TRY400)


233-233: Use explicit conversion flag

Replace with conversion flag

(RUF010)

docker/services/content_scraper.py

152-152: Use explicit conversion flag

Replace with conversion flag

(RUF010)


255-255: Do not catch blind exception: Exception

(BLE001)


257-257: Use explicit conversion flag

Replace with conversion flag

(RUF010)

docker/simple_demo.py

179-179: Do not catch blind exception: Exception

(BLE001)


180-180: Use logging.exception instead of logging.error

Replace with exception

(TRY400)


180-180: Use explicit conversion flag

Replace with conversion flag

(RUF010)

docker/tests/factories/mock_serper_response.py

37-37: Avoid specifying long messages outside the exception class

(TRY003)

docker/api_tools.py

391-391: Do not catch blind exception: Exception

(BLE001)


392-392: Use logging.exception instead of logging.error

Replace with exception

(TRY400)


392-392: Use explicit conversion flag

Replace with conversion flag

(RUF010)

docker/tests/factories/mock_ddgs_class.py

19-19: Unused method argument: args

(ARG002)


19-19: Unused method argument: kwargs

(ARG002)

docker/demo_server.py

199-199: Do not catch blind exception: Exception

(BLE001)


200-200: Use logging.exception instead of logging.error

Replace with exception

(TRY400)


200-200: Use explicit conversion flag

Replace with conversion flag

(RUF010)

docker/tests/factories/http_response_factory.py

19-19: PEP 484 prohibits implicit Optional

Convert to T | None

(RUF013)

🔇 Additional comments (11)
docker/tests/unit/tools/test_search_tool.py (2)

13-13: LGTM! Builder pattern improves test maintainability.

The migration to an_api_search_result().build() centralizes test data construction and aligns with the broader test infrastructure improvements across the codebase.

Also applies to: 25-25


65-71: LGTM! Clean fluent builder usage.

The fluent setter chaining clearly expresses test intent and makes the test data construction more readable.

docker/clients/duckduckgo_client.py (2)

25-38: LGTM! Well-structured helper function.

The extracted helper improves code organization by separating the result conversion logic. Type hints and Google-style docstring are properly provided as per coding guidelines.


63-63: LGTM! Clean refactor using the helper.

The list comprehension now delegates to _convert_ddg_result, improving maintainability while preserving the original behavior.

docker/tests/unit/services/test_search_service.py (1)

11-14: LGTM! Consistent builder usage across tests.

The migration to pre-configured builders (a_serper_result(), a_duckduckgo_result()) standardizes test data construction and improves maintainability. Updated assertions correctly reflect the builder defaults.

Also applies to: 23-23, 31-31, 39-39, 47-47, 56-56

docker/tests/unit/services/test_search_processor.py (1)

12-12: LGTM! Consistent builder pattern adoption.

All tests correctly use the an_api_search_result() builder with fluent setters, improving test maintainability and readability.

Also applies to: 21-27, 45-54, 85-91

docker/tests/factories/mock_ddgs_class.py (1)

12-21: LGTM! Clean mock wrapper implementation.

The MockDDGSClass correctly implements a callable wrapper that returns a context manager. The unused *args, **kwargs in __call__ are intentional for API compatibility with the real DDGS constructor signature. Type hints and docstrings are properly provided.

docker/tests/unit/clients/test_duckduckgo_client.py (2)

11-11: LGTM! Clean migration to DDGSFactory.

The migration from MagicMock to typed DDGSFactory improves type safety and test clarity. The double-call pattern (e.g., DDGSFactory.two_results()()) correctly constructs the mock class and then invokes it to get the context manager.

Also applies to: 20-20, 34-34, 46-48, 59-59


37-41: LGTM! Improved test approach.

The test now verifies behavior through results rather than inspecting mock call details, which is a more robust testing practice.

docker/tests/factories/mock_ddgs_context_manager.py (1)

11-24: LGTM! Correct context manager implementation.

The MockDDGSContextManager properly implements the context manager protocol with appropriate type hints and docstrings, enabling realistic with-statement usage in tests.

docker/health.py (1)

20-196: LGTM! Well-structured helper functions.

The extracted helper functions improve modularity and testability. All functions have proper type hints and Google-style docstrings as per coding guidelines. The refactoring maintains the original behavior while making the code more maintainable.

Comment on lines +254 to +272
async def _fetch_with_serper(query: str, api_key: str):
"""Fetch search results using Serper API.

Args:
query: Search query
api_key: Serper API key

Returns:
Tuple of (results, search_source)
"""
import asyncio

from mcp_server import fetch_search_results

logger.info("Using Serper API for search")
results = await asyncio.get_event_loop().run_in_executor(
None, fetch_search_results, query, api_key
)
return results, "Serper API"
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion | 🟠 Major

Add return types and favour asyncio.to_thread.

New helpers must have explicit return annotations per our typing rules, and we can avoid the deprecated asyncio.get_event_loop() by switching to asyncio.to_thread(...). Consider:

-from typing import TYPE_CHECKING, Any, Dict, List, Optional
+from typing import TYPE_CHECKING, Any, Dict, List, Optional, Tuple
...
-if TYPE_CHECKING:
-    pass
+if TYPE_CHECKING:
+    from docker.models.api_search_result import APISearchResult
...
-async def _fetch_with_serper(query: str, api_key: str):
+async def _fetch_with_serper(
+    query: str, api_key: str
+) -> Tuple[List["APISearchResult"], str]:
...
-    results = await asyncio.get_event_loop().run_in_executor(
-        None, fetch_search_results, query, api_key
-    )
+    results = await asyncio.to_thread(fetch_search_results, query, api_key)

Apply the same pattern to the DuckDuckGo helper so static typing and asyncio usage stay consistent.

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
async def _fetch_with_serper(query: str, api_key: str):
"""Fetch search results using Serper API.
Args:
query: Search query
api_key: Serper API key
Returns:
Tuple of (results, search_source)
"""
import asyncio
from mcp_server import fetch_search_results
logger.info("Using Serper API for search")
results = await asyncio.get_event_loop().run_in_executor(
None, fetch_search_results, query, api_key
)
return results, "Serper API"
from typing import TYPE_CHECKING, Any, Dict, List, Optional, Tuple
if TYPE_CHECKING:
from docker.models.api_search_result import APISearchResult
async def _fetch_with_serper(
query: str, api_key: str
) -> Tuple[List["APISearchResult"], str]:
"""Fetch search results using Serper API.
Args:
query: Search query
api_key: Serper API key
Returns:
Tuple of (results, search_source)
"""
import asyncio
from mcp_server import fetch_search_results
logger.info("Using Serper API for search")
results = await asyncio.to_thread(fetch_search_results, query, api_key)
return results, "Serper API"
🤖 Prompt for AI Agents
In docker/api_tools.py around lines 254 to 272, the _fetch_with_serper helper is
missing an explicit return type annotation and uses the deprecated
asyncio.get_event_loop().run_in_executor; update its signature to include the
correct async return type (e.g., -> Tuple[List[...], str] or the precise types
used by fetch_search_results) and replace the run_in_executor call with
asyncio.to_thread(fetch_search_results, query, api_key) to run the blocking
function in a thread. Also apply the same explicit return annotation and switch
to asyncio.to_thread for the DuckDuckGo helper so both helpers are properly
typed and use modern asyncio patterns.

Comment on lines +275 to +298
async def _fetch_with_duckduckgo(query: str, has_api_key: bool):
"""Fetch search results using DuckDuckGo.

Args:
query: Search query
has_api_key: Whether Serper API key was configured

Returns:
Tuple of (results, search_source)
"""
import asyncio

from mcp_server import fetch_duckduckgo_search_results

if not has_api_key:
logger.info("No Serper API key configured, using DuckDuckGo fallback")
else:
logger.warning("No results from Serper API, trying DuckDuckGo fallback")

results = await asyncio.get_event_loop().run_in_executor(
None, fetch_duckduckgo_search_results, query
)
return results, "DuckDuckGo (free fallback)"

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion | 🟠 Major

Ensure the DuckDuckGo helper is fully typed.

Please mirror the Serper changes here: add a precise return type (e.g., Tuple[Optional[List["APISearchResult"]], str]) and swap asyncio.get_event_loop().run_in_executor for asyncio.to_thread(...) to satisfy our typing policy and avoid deprecated APIs.

Comment on lines +300 to +358
def _format_no_results_error(query: str, search_source: str) -> dict:
"""Format error response for no results.

Args:
query: Search query
search_source: Source that was used

Returns:
Error dictionary
"""
return {
"error": "No search results found from any source.",
"query": query,
"search_source": search_source,
}


def _format_search_error(error: str, query: str, search_source: str) -> dict:
"""Format error response for search failure.

Args:
error: Error message
query: Search query
search_source: Source that was attempted

Returns:
Error dictionary
"""
return {
"error": f"Search failed: {error}",
"query": query,
"search_source": search_source,
}


async def _process_and_format_results(results, query: str, search_source: str):
"""Process search results and format response.

Args:
results: Raw search results
query: Search query
search_source: Source of results

Returns:
Formatted results dictionary
"""
import asyncio

from mcp_server import process_search_results

processed_results = await asyncio.get_event_loop().run_in_executor(
None, process_search_results, results
)

return {
"query": query,
"search_source": search_source,
"results": [result.model_dump() for result in processed_results],
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion | 🟠 Major

Type the formatting utilities.

Our guidelines require explicit return types. _format_no_results_error, _format_search_error, and _process_and_format_results should return Dict[str, Any] (or a refined mapping) and accept properly typed results (e.g., List["APISearchResult"]). Please add those annotations and, for _process_and_format_results, consider asyncio.to_thread for consistency with the fetch helpers.

🤖 Prompt for AI Agents
In docker/api_tools.py around lines 300 to 358, the three helpers lack explicit
typing: add return annotations of Dict[str, Any] for _format_no_results_error
and _format_search_error and for _process_and_format_results change the
signature to accept results: List["APISearchResult"] (or a more specific
Sequence type) and return Dict[str, Any]; inside _process_and_format_results
replace asyncio.get_event_loop().run_in_executor(...) with
asyncio.to_thread(process_search_results, results) for consistency with fetch
helpers and keep importing typing (Dict, Any, List) and forward-reference
APISearchResult in quotes or via TYPE_CHECKING import.

Comment on lines +78 to +196
async def _handle_search_operation(search_func, query: str, max_results: int):
"""Handle search operation and yield results.

Args:
search_func: Search function to call
query: Search query
max_results: Maximum results to return

Yields:
SSE formatted messages
"""
yield _format_sse_message("status", message=f"Searching for: {query}")

result = await search_func(query)

# Limit results if needed
if result.get("results") and len(result["results"]) > max_results:
result["results"] = result["results"][:max_results]
result["note"] = f"Results limited to {max_results} items"

yield _format_sse_message("data", data=result)
num_results = len(result.get("results", []))
yield _format_sse_message(
"complete", message=f"Search completed. Found {num_results} results."
)


async def _handle_health_operation(health_func):
"""Handle health check operation.

Args:
health_func: Health check function to call (or None)

Yields:
SSE formatted messages
"""
yield _format_sse_message("status", message="Checking server health...")

if health_func:
result = await health_func()
yield _format_sse_message("data", data=result)
yield _format_sse_message("complete", message="Health check completed")
else:
basic_health = {
"status": "healthy",
"service": "webcat-demo",
"timestamp": time.time(),
}
yield _format_sse_message("data", data=basic_health)
yield _format_sse_message("complete", message="Basic health check completed")


def _get_server_info() -> dict:
"""Get server information dictionary.

Returns:
Server info dictionary
"""
return {
"service": "WebCat MCP Demo Server",
"version": "2.2.0",
"status": "connected",
"operations": ["search", "health"],
"timestamp": time.time(),
}


async def _generate_webcat_stream(
webcat_functions, operation: str, query: str, max_results: int
):
"""Generate SSE stream for WebCat operations.

Args:
webcat_functions: Dictionary of WebCat functions
operation: Operation to perform
query: Search query
max_results: Maximum results

Yields:
SSE formatted messages
"""
try:
# Send connection message
yield _format_sse_message(
"connection",
status="connected",
message="WebCat stream started",
operation=operation,
)

if operation == "search" and query:
search_func = webcat_functions.get("search")
if search_func:
async for msg in _handle_search_operation(
search_func, query, max_results
):
yield msg
else:
yield _format_sse_message(
"error", message="Search function not available"
)

elif operation == "health":
health_func = webcat_functions.get("health_check")
async for msg in _handle_health_operation(health_func):
yield msg

else:
# Just connection - send server info
yield _format_sse_message("data", data=_get_server_info())
yield _format_sse_message("complete", message="Connection established")

# Keep alive with heartbeat
heartbeat_count = 0
while True:
await asyncio.sleep(30)
heartbeat_count += 1
yield _format_sse_message(
"heartbeat", timestamp=time.time(), count=heartbeat_count
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion | 🟠 Major

Add the missing type hints per project standard.

Our Python guideline requires every function signature to be fully annotated. Please add the concrete typing for the new helpers (_handle_search_operation, _handle_health_operation, _generate_webcat_stream, and the _get_server_info return type) so mypy stays green and usage is clear. Suggested patch:

@@
-import json
-import logging
+import json
+import logging
+from typing import Any, AsyncGenerator, Awaitable, Callable, Dict, Mapping, Optional
@@
-async def _handle_search_operation(search_func, query: str, max_results: int):
+async def _handle_search_operation(
+    search_func: Callable[[str], Awaitable[Dict[str, Any]]],
+    query: str,
+    max_results: int,
+) -> AsyncGenerator[str, None]:
@@
-async def _handle_health_operation(health_func):
+async def _handle_health_operation(
+    health_func: Optional[Callable[[], Awaitable[Dict[str, Any]]]],
+) -> AsyncGenerator[str, None]:
@@
-def _get_server_info() -> dict:
+def _get_server_info() -> Dict[str, Any]:
@@
-async def _generate_webcat_stream(
-    webcat_functions, operation: str, query: str, max_results: int
-):
+async def _generate_webcat_stream(
+    webcat_functions: Mapping[str, Callable[..., Awaitable[Any]]],
+    operation: str,
+    query: str,
+    max_results: int,
+) -> AsyncGenerator[str, None]:

As per coding guidelines

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
async def _handle_search_operation(search_func, query: str, max_results: int):
"""Handle search operation and yield results.
Args:
search_func: Search function to call
query: Search query
max_results: Maximum results to return
Yields:
SSE formatted messages
"""
yield _format_sse_message("status", message=f"Searching for: {query}")
result = await search_func(query)
# Limit results if needed
if result.get("results") and len(result["results"]) > max_results:
result["results"] = result["results"][:max_results]
result["note"] = f"Results limited to {max_results} items"
yield _format_sse_message("data", data=result)
num_results = len(result.get("results", []))
yield _format_sse_message(
"complete", message=f"Search completed. Found {num_results} results."
)
async def _handle_health_operation(health_func):
"""Handle health check operation.
Args:
health_func: Health check function to call (or None)
Yields:
SSE formatted messages
"""
yield _format_sse_message("status", message="Checking server health...")
if health_func:
result = await health_func()
yield _format_sse_message("data", data=result)
yield _format_sse_message("complete", message="Health check completed")
else:
basic_health = {
"status": "healthy",
"service": "webcat-demo",
"timestamp": time.time(),
}
yield _format_sse_message("data", data=basic_health)
yield _format_sse_message("complete", message="Basic health check completed")
def _get_server_info() -> dict:
"""Get server information dictionary.
Returns:
Server info dictionary
"""
return {
"service": "WebCat MCP Demo Server",
"version": "2.2.0",
"status": "connected",
"operations": ["search", "health"],
"timestamp": time.time(),
}
async def _generate_webcat_stream(
webcat_functions, operation: str, query: str, max_results: int
):
"""Generate SSE stream for WebCat operations.
Args:
webcat_functions: Dictionary of WebCat functions
operation: Operation to perform
query: Search query
max_results: Maximum results
Yields:
SSE formatted messages
"""
try:
# Send connection message
yield _format_sse_message(
"connection",
status="connected",
message="WebCat stream started",
operation=operation,
)
if operation == "search" and query:
search_func = webcat_functions.get("search")
if search_func:
async for msg in _handle_search_operation(
search_func, query, max_results
):
yield msg
else:
yield _format_sse_message(
"error", message="Search function not available"
)
elif operation == "health":
health_func = webcat_functions.get("health_check")
async for msg in _handle_health_operation(health_func):
yield msg
else:
# Just connection - send server info
yield _format_sse_message("data", data=_get_server_info())
yield _format_sse_message("complete", message="Connection established")
# Keep alive with heartbeat
heartbeat_count = 0
while True:
await asyncio.sleep(30)
heartbeat_count += 1
yield _format_sse_message(
"heartbeat", timestamp=time.time(), count=heartbeat_count
import json
import logging
from typing import Any, AsyncGenerator, Awaitable, Callable, Dict, Mapping, Optional
async def _handle_search_operation(
search_func: Callable[[str], Awaitable[Dict[str, Any]]],
query: str,
max_results: int,
) -> AsyncGenerator[str, None]:
"""Handle search operation and yield results.
Args:
search_func: Search function to call
query: Search query
max_results: Maximum results to return
Yields:
SSE formatted messages
"""
yield _format_sse_message("status", message=f"Searching for: {query}")
result = await search_func(query)
# Limit results if needed
if result.get("results") and len(result["results"]) > max_results:
result["results"] = result["results"][:max_results]
result["note"] = f"Results limited to {max_results} items"
yield _format_sse_message("data", data=result)
num_results = len(result.get("results", []))
yield _format_sse_message(
"complete", message=f"Search completed. Found {num_results} results."
)
async def _handle_health_operation(
health_func: Optional[Callable[[], Awaitable[Dict[str, Any]]]],
) -> AsyncGenerator[str, None]:
"""Handle health check operation.
Args:
health_func: Health check function to call (or None)
Yields:
SSE formatted messages
"""
yield _format_sse_message("status", message="Checking server health...")
if health_func:
result = await health_func()
yield _format_sse_message("data", data=result)
yield _format_sse_message("complete", message="Health check completed")
else:
basic_health = {
"status": "healthy",
"service": "webcat-demo",
"timestamp": time.time(),
}
yield _format_sse_message("data", data=basic_health)
yield _format_sse_message("complete", message="Basic health check completed")
def _get_server_info() -> Dict[str, Any]:
"""Get server information dictionary.
Returns:
Server info dictionary
"""
return {
"service": "WebCat MCP Demo Server",
"version": "2.2.0",
"status": "connected",
"operations": ["search", "health"],
"timestamp": time.time(),
}
async def _generate_webcat_stream(
webcat_functions: Mapping[str, Callable[..., Awaitable[Any]]],
operation: str,
query: str,
max_results: int,
) -> AsyncGenerator[str, None]:
"""Generate SSE stream for WebCat operations.
Args:
webcat_functions: Dictionary of WebCat functions
operation: Operation to perform
query: Search query
max_results: Maximum results
Yields:
SSE formatted messages
"""
try:
# Send connection message
yield _format_sse_message(
"connection",
status="connected",
message="WebCat stream started",
operation=operation,
)
if operation == "search" and query:
search_func = webcat_functions.get("search")
if search_func:
async for msg in _handle_search_operation(
search_func, query, max_results
):
yield msg
else:
yield _format_sse_message(
"error", message="Search function not available"
)
elif operation == "health":
health_func = webcat_functions.get("health_check")
async for msg in _handle_health_operation(health_func):
yield msg
else:
# Just connection - send server info
yield _format_sse_message("data", data=_get_server_info())
yield _format_sse_message("complete", message="Connection established")
# Keep alive with heartbeat
heartbeat_count = 0
while True:
await asyncio.sleep(30)
heartbeat_count += 1
yield _format_sse_message(
"heartbeat", timestamp=time.time(), count=heartbeat_count
)
🤖 Prompt for AI Agents
In docker/demo_server.py lines 78-196, the new helper functions lack type hints;
add concrete annotations to satisfy project typing rules. Annotate
_handle_search_operation as accepting search_func: Callable[[str],
Awaitable[Dict[str, Any]]], query: str, max_results: int and returning
AsyncGenerator[str, None]; annotate _handle_health_operation as accepting
health_func: Optional[Callable[[], Awaitable[Dict[str, Any]]]] and returning
AsyncGenerator[str, None]; annotate _generate_webcat_stream as accepting
webcat_functions: Dict[str, Callable[..., Awaitable[Dict[str, Any]]]],
operation: str, query: str, max_results: int and returning AsyncGenerator[str,
None]; annotate _get_server_info return type as Dict[str, Any]; ensure required
typing imports (AsyncGenerator, Awaitable, Callable, Dict, Optional, Any) are
added at the top of the file.

Comment on lines +219 to +221
if len(content) > MAX_CONTENT_LENGTH:
return content[:MAX_CONTENT_LENGTH] + "... [content truncated]"
return content
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Fix truncation to honor MAX_CONTENT_LENGTH.

We slice to the max limit and then append the ellipsis, so the final string overshoots MAX_CONTENT_LENGTH. That breaks the contract of the cap (and defeats downstream safeguards relying on it). Trim enough characters before appending the suffix.

-    if len(content) > MAX_CONTENT_LENGTH:
-        return content[:MAX_CONTENT_LENGTH] + "... [content truncated]"
+    if len(content) > MAX_CONTENT_LENGTH:
+        suffix = "... [content truncated]"
+        if len(suffix) >= MAX_CONTENT_LENGTH:
+            return suffix[:MAX_CONTENT_LENGTH]
+        return content[: MAX_CONTENT_LENGTH - len(suffix)] + suffix
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
if len(content) > MAX_CONTENT_LENGTH:
return content[:MAX_CONTENT_LENGTH] + "... [content truncated]"
return content
if len(content) > MAX_CONTENT_LENGTH:
suffix = "... [content truncated]"
if len(suffix) >= MAX_CONTENT_LENGTH:
return suffix[:MAX_CONTENT_LENGTH]
return content[:MAX_CONTENT_LENGTH - len(suffix)] + suffix
return content
🤖 Prompt for AI Agents
In docker/services/content_scraper.py around lines 219 to 221, the truncation
logic appends the ellipsis/suffix after slicing to MAX_CONTENT_LENGTH which
causes the returned string to exceed MAX_CONTENT_LENGTH; change to compute the
suffix (e.g. "... [content truncated]") and if MAX_CONTENT_LENGTH is larger than
the suffix length, slice content to MAX_CONTENT_LENGTH - len(suffix) and append
the suffix, otherwise return the suffix trimmed to MAX_CONTENT_LENGTH so the
final string never exceeds MAX_CONTENT_LENGTH.

Comment on lines +18 to 35
content: bytes = b"<html><body><p>Test content</p></body></html>",
headers: dict = None,
) -> MockHttpResponse:
"""Create a successful HTTP response."""
"""Create successful HTML response.

Args:
content: HTML content as bytes
headers: Response headers

Returns:
MockHttpResponse with 200 status
"""
default_headers = {"Content-Type": "text/html"}
if headers:
default_headers.update(headers)
return MockHttpResponse(
status_code=status_code,
content=content.encode("utf-8"),
text=content,
headers={"Content-Type": content_type},
should_raise_for_status=False,
content=content, headers=default_headers, status_code=200
)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion | 🟠 Major

Tighten the headers type.

headers: dict = None trips our typing checks. Please change it to headers: Optional[Dict[str, str]] = None (and import Dict, Optional) so mypy stays happy.

🧰 Tools
🪛 Ruff (0.13.2)

19-19: PEP 484 prohibits implicit Optional

Convert to T | None

(RUF013)

🤖 Prompt for AI Agents
In docker/tests/factories/http_response_factory.py around lines 18 to 35, the
parameter annotation uses a bare dict which fails mypy; change the signature to
use headers: Optional[Dict[str, str]] = None and add the corresponding typing
imports (from typing import Dict, Optional) at the top of the file; keep the
existing runtime behavior (default_headers and update) unchanged.

Comment on lines +68 to 78
def connection_error(message: str = "Connection error"):
"""Create exception for connection error.

Args:
message: Error message

Returns:
RequestException to be used with side_effect
"""
Create a Timeout exception.
return requests.exceptions.RequestException(message)

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion | 🟠 Major

Add the return annotation.

connection_error should declare it returns requests.exceptions.RequestException to satisfy our type-hint policy.

🤖 Prompt for AI Agents
In docker/tests/factories/http_response_factory.py around lines 68 to 78, the
function connection_error is missing a return type annotation; update the
signature to declare it returns requests.exceptions.RequestException (e.g. def
connection_error(message: str = "Connection error") ->
requests.exceptions.RequestException:) and ensure the requests module is
imported/available in the file so the annotation resolves.

Comment on lines +80 to +86
def timeout():
"""Create timeout exception.

Returns:
Timeout exception to be used with side_effect
"""
return requests.Timeout("Request timed out after 5 seconds")
return requests.exceptions.Timeout("Request timed out")
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion | 🟠 Major

Type the timeout helper.

Likewise, timeout() needs an explicit return type (e.g., -> requests.exceptions.Timeout) per the typing guidelines.

🤖 Prompt for AI Agents
In docker/tests/factories/http_response_factory.py around lines 80 to 86, the
timeout() helper is missing an explicit return type; add a return type
annotation -> requests.exceptions.Timeout to the function signature so it
conforms to the typing guidelines, e.g., adjust the def line to include that
return type and keep the existing docstring and return statement unchanged.

Comment on lines +42 to +44
def json(self) -> Dict[str, Any]:
"""Return empty dict for JSON responses."""
return {}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Restore JSON parsing behaviour.

Returning {} ignores any JSON payload configured on the mock and breaks callers that expect the text/body to be parsed. Please revert to decoding self.text (fall back to self.content when needed), for example:

     def json(self) -> Dict[str, Any]:
-        """Return empty dict for JSON responses."""
-        return {}
+        """Return parsed JSON payload."""
+        if self.text:
+            return json.loads(self.text)
+        if self.content:
+            return json.loads(self.content.decode())
+        return {}

(Import json at the top.) This keeps the mock faithful to requests.Response.json().

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
def json(self) -> Dict[str, Any]:
"""Return empty dict for JSON responses."""
return {}
def json(self) -> Dict[str, Any]:
"""Return parsed JSON payload."""
if self.text:
return json.loads(self.text)
if self.content:
return json.loads(self.content.decode())
return {}
🤖 Prompt for AI Agents
In docker/tests/factories/mock_http_response.py around lines 42-44, the json()
method currently returns a hardcoded {} which ignores any configured payload;
restore real JSON parsing by importing json at the top and replacing the body
with code that attempts to parse self.text via json.loads(self.text) and, if
self.text is empty, fall back to decoding self.content to a str and parsing
that; let json decoding errors propagate the same way requests.Response.json()
would (do not swallow exceptions) so callers receive the same behavior as the
real requests client.

Comment on lines +32 to +37
def raise_for_status(self):
"""Raise HTTPError if status code indicates error."""
if self.status_code >= 400:
import requests

raise requests.HTTPError(f"{self.status_code} Error")
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion | 🟠 Major

Add return type annotation for raise_for_status.

This helper is missing a return annotation, which violates the project requirement that every function signature be fully typed. Please add -> None to keep mypy and the shared guidelines happy. As per coding guidelines.

-    def raise_for_status(self):
+    def raise_for_status(self) -> None:
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
def raise_for_status(self):
"""Raise HTTPError if status code indicates error."""
if self.status_code >= 400:
import requests
raise requests.HTTPError(f"{self.status_code} Error")
def raise_for_status(self) -> None:
"""Raise HTTPError if status code indicates error."""
if self.status_code >= 400:
import requests
raise requests.HTTPError(f"{self.status_code} Error")
🧰 Tools
🪛 Ruff (0.13.2)

37-37: Avoid specifying long messages outside the exception class

(TRY003)

🤖 Prompt for AI Agents
In docker/tests/factories/mock_serper_response.py around lines 32 to 37, the
method raise_for_status is missing a return type annotation; update its
signature to include -> None so the function is fully typed per project
guidelines, leaving implementation unchanged.

@T-rav T-rav merged commit ace28ad into main Oct 3, 2025
11 checks passed
@T-rav T-rav deleted the fix/html2text-mock branch October 3, 2025 21:28
T-rav added a commit that referenced this pull request Oct 10, 2025
fix: Mock html2text in content truncation test
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant