-
Notifications
You must be signed in to change notification settings - Fork 180
Modified Crawling and graceful error handling with streamlined UI. #48
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Fixed MCP Docker Build Failure: Resolved the build error for the mcp service by removing the invalid readme reference in fast-markdown-mcp/pyproject.toml.
Refactored File Handling (Removed In-Memory Storage):
Investigated the complex in-memory file handling mechanism and its inconsistencies.
Removed the in-memory storage logic from backend/app/crawler.py.
Removed the associated API endpoints (/api/memory-files, /api/memory-files/{file_id}) from backend/app/main.py.
Added a new backend API endpoint (/api/storage/file-content) to read files directly from the storage/markdown directory.
Deleted the old frontend API proxy route (app/api/memory-file/route.ts).
Created a new frontend API proxy route (app/api/storage/file-content/route.ts).
Updated frontend components (StoredFiles.tsx, DiscoveredFiles.tsx) to use the new API route for downloading file content.
Documentation: Created markdown plans for the MCP build fix and the in-memory feature removal.
This simplifies the architecture by relying solely on disk-based consolidated files in storage/markdown. Please remember to test the file download functionality after restarting the services.
This commit addresses several issues and implements enhancements across the crawling workflow: Fixes: - Resolved 400 Bad Request error caused by incorrect query parameter (`file_path`) in the file content API route. - Fixed backend `NameError` (`set_task_context`) in crawler.py that prevented result file saving. - Corrected 500 Internal Server Error caused by Docker networking issue (localhost vs. service name) in the file content API route proxy. - Ensured 'Data Extracted' statistic is correctly saved in the backend status and displayed in the UI. UI Enhancements: - Made "Consolidated Files" section persistent, rendering as soon as a job ID is available. - Relocated "Crawl Selected" button inline with status details. - Updated "Crawl Selected" button to show dynamic count and disable appropriately. - Renamed "Job Status" section title to "Discovered Pages". - Renamed "Processing Summary" section title to "Statistics". - Removed the unused "Extracted Content" display section. Backend Enhancements: - Implemented file appending logic in crawler.py for consolidated `.md` and `.json` files. Subsequent crawls for the same job now append data and update timestamps instead of overwriting. Changelog: ### Added - Backend logic to append new crawl results to existing consolidated `.md` and `.json` files for the same job ID. - Dynamic count display to "Crawl Selected" button. ### Changed - "Consolidated Files" section now appears persistently once a job is initiated. - "Crawl Selected" button relocated inline with status details and disables after initiating crawl. - Renamed "Job Status" section title to "Discovered Pages". - Renamed "Processing Summary" section title to "Statistics". - Updated backend status management to correctly store and transmit the 'Data Extracted' statistic. ### Fixed - Resolved 400 Bad Request error when fetching file content due to incorrect query parameter name. - Fixed backend `NameError` in crawler that prevented saving crawl results. - Resolved 500 Internal Server Error when fetching `.json` file content due to Docker networking issue in API proxy route. - Corrected display issue where 'Data Extracted' statistic showed "N/A" instead of the actual value. ### Removed - Removed the unused "Extracted Content" display section from the UI.
feat(frontend): Update Consolidated Files component for polling and downloads - Implements polling every 10 seconds in ConsolidatedFiles.tsx to automatically refresh the list of files from the /api/storage endpoint, ensuring newly added files appear in the UI. - Modifies the MD and JSON icon links to point to the /api/storage/download endpoint and adds the 'download' attribute, triggering file downloads instead of opening content in the browser.
Summary by CodeRabbit
WalkthroughThis pull request introduces comprehensive backend and frontend refactoring for managing file storage and web crawling. The changes remove in‑memory file handling and add new disk‐based endpoints, incorporate job IDs and crawl status management via a multiprocessing manager, and update type definitions. New React components and UI elements are added for crawl status monitoring, consolidated file display, and URL management. Additionally, the documentation, Docker configuration, and build processes are updated, and new utility functions and a smart markdown search feature are implemented. Changes
Sequence Diagram(s)Crawl Job FlowsequenceDiagram
participant U as User
participant FE as Frontend
participant BE as Backend
participant SM as Status Manager
U->>FE: Submits crawl request (with URL/depth)
FE->>BE: Sends POST /api/discover (includes job_id)
BE->>SM: Initialize crawl job (with job_id, root URL)
SM-->>BE: Job status set to initializing
BE-->>FE: Returns job ID immediately
FE->>FE: Start polling using job_id
FE->>BE: GET /api/crawl-status/{job_id} request
BE->>SM: Retrieve current job status
SM-->>BE: Return updated job status (URL statuses, overall progress)
BE-->>FE: Sends latest crawl status
FE->>U: Updates UI with job progress and results
File Download FlowsequenceDiagram
participant FE as Frontend
participant BE as Backend
participant FS as File System
FE->>BE: GET /api/storage/file-content?file_path=<path>
BE->>FS: Reads file content from disk (storage/markdown)
FS-->>BE: Returns file content
BE-->>FE: Sends file content with proper content type
FE->>U: Initiates download with generated blob URL
✨ Finishing Touches
🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (Invoked using PR comments)
Other keywords and placeholders
CodeRabbit Configuration File (
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 74
🔭 Outside diff range comments (3)
lib/crawl-service.ts (2)
6-35: 🧹 Nitpick (assertive)Double-check error handling for incomplete data
When
data.job_idis missing or malformed, you throw an error. Consider adding fallback or retry logic so the UI can appropriately handle unexpected backend responses.
43-81: 🧹 Nitpick (assertive)Return type consistency for error cases
When returning the failure object at lines 77–80, ensure upstream callers know to expect the shape
{ success: false, jobId, error }. Documenting the error response structure can help maintain consistent error handling.app/page.tsx (1)
172-220: 🧹 Nitpick (assertive)Legacy logic commented out.
Large blocks of commented-out code are best handled in version control. Maintain a clear path for future reference, but avoid shipping them in production files unless absolutely necessary.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: ASSERTIVE
Plan: Pro
📒 Files selected for processing (35)
.roomodes(1 hunks)README.md(1 hunks)app/api/memory-file/route.ts(0 hunks)app/api/storage/file-content/route.ts(1 hunks)app/api/storage/route.ts(2 hunks)app/page.tsx(4 hunks)backend/app/crawler.py(16 hunks)backend/app/main.py(4 hunks)backend/app/status_manager.py(1 hunks)backend/app/utils.py(1 hunks)components/ConsolidatedFiles.tsx(1 hunks)components/CrawlStatusMonitor.tsx(1 hunks)components/DiscoveredFiles.tsx(3 hunks)components/JobStatsSummary.tsx(1 hunks)components/StoredFiles.tsx(3 hunks)components/SubdomainList.tsx(2 hunks)docker/compose/docker-compose.yml(1 hunks)docker/dockerfiles/Dockerfile.frontend(2 hunks)docs/architecture/remove_in_memory_files_plan.md(1 hunks)docs/features/crawl_status_monitoring_plan.md(1 hunks)docs/features/fix_backend_url_resolution_plan.md(1 hunks)docs/features/fix_consolidated_files_display.md(1 hunks)docs/features/fix_data_extracted_display.md(1 hunks)docs/features/fix_discover_url.md(1 hunks)docs/features/fix_frontend_build_env_var_plan.md(1 hunks)docs/features/fix_import_error.md(1 hunks)docs/features/selective_consolidation_plan.md(1 hunks)docs/features/ui_updates_plan.md(1 hunks)docs/local/crawl4ai_local_fix_plan.md(1 hunks)docs/mcp/mcp_build_fix_plan.md(1 hunks)fast-markdown-mcp/pyproject.toml(0 hunks)fast-markdown-mcp/src/fast_markdown_mcp/server.py(5 hunks)lib/crawl-service.ts(3 hunks)lib/types.ts(2 hunks)lib/utils.ts(1 hunks)
💤 Files with no reviewable changes (2)
- fast-markdown-mcp/pyproject.toml
- app/api/memory-file/route.ts
🧰 Additional context used
🧬 Code Definitions (8)
lib/utils.ts (1)
backend/app/crawler.py (1)
url_to_filename(60-110)
components/ConsolidatedFiles.tsx (2)
components/ui/card.tsx (4)
Card(79-79)CardHeader(79-79)CardTitle(79-79)CardContent(79-79)components/ui/button.tsx (1)
Button(56-56)
app/api/storage/file-content/route.ts (1)
app/api/storage/route.ts (1)
GET(32-165)
lib/types.ts (3)
backend/app/status_manager.py (1)
CrawlJobStatus(10-19)backend/app/main.py (1)
CrawlRequest(62-64)backend/app/crawler.py (1)
DiscoveredPage(42-46)
fast-markdown-mcp/src/fast_markdown_mcp/server.py (1)
fast-markdown-mcp/src/fast_markdown_mcp/document_structure.py (3)
DocumentStructure(15-105)Section(6-13)_make_section_id(97-105)
lib/crawl-service.ts (2)
lib/types.ts (3)
DiscoverResponse(75-79)CrawlRequest(82-85)CrawlResponse(88-93)backend/app/main.py (1)
CrawlRequest(62-64)
backend/app/crawler.py (2)
backend/app/status_manager.py (2)
update_overall_status(40-53)update_url_status(55-65)backend/app/utils.py (1)
normalize_url(3-9)
app/page.tsx (5)
backend/app/crawler.py (1)
DiscoveredPage(42-46)lib/types.ts (3)
DiscoveredPage(7-12)CrawlJobStatus(63-72)OverallStatus(44-51)backend/app/status_manager.py (1)
CrawlJobStatus(10-19)lib/crawl-service.ts (2)
discoverSubdomains(6-41)crawlPages(43-83)components/SubdomainList.tsx (1)
SubdomainList(14-284)
🪛 markdownlint-cli2 (0.17.2)
docs/local/crawl4ai_local_fix_plan.md
7-7: Headings should be surrounded by blank lines
Expected: 1; Actual: 0; Below
(MD022, blanks-around-headings)
8-8: Lists should be surrounded by blank lines
null
(MD032, blanks-around-lists)
10-10: Fenced code blocks should be surrounded by blank lines
null
(MD031, blanks-around-fences)
14-14: Fenced code blocks should be surrounded by blank lines
null
(MD031, blanks-around-fences)
17-17: Headings should be surrounded by blank lines
Expected: 1; Actual: 0; Below
(MD022, blanks-around-headings)
18-18: Lists should be surrounded by blank lines
null
(MD032, blanks-around-lists)
20-20: Files should end with a single newline character
null
(MD047, single-trailing-newline)
docs/features/fix_data_extracted_display.md
24-24: Files should end with a single newline character
null
(MD047, single-trailing-newline)
docs/features/fix_frontend_build_env_var_plan.md
25-25: Files should end with a single newline character
null
(MD047, single-trailing-newline)
docs/features/fix_import_error.md
21-21: Multiple consecutive blank lines
Expected: 1; Actual: 2
(MD012, no-multiple-blanks)
24-24: Unordered list style
Expected: asterisk; Actual: dash
(MD004, ul-style)
25-25: Unordered list style
Expected: asterisk; Actual: dash
(MD004, ul-style)
26-26: Unordered list style
Expected: asterisk; Actual: dash
(MD004, ul-style)
27-27: Unordered list style
Expected: asterisk; Actual: dash
(MD004, ul-style)
28-28: Unordered list style
Expected: asterisk; Actual: dash
(MD004, ul-style)
29-29: Unordered list style
Expected: asterisk; Actual: dash
(MD004, ul-style)
30-30: Unordered list style
Expected: asterisk; Actual: dash
(MD004, ul-style)
31-31: Unordered list style
Expected: asterisk; Actual: dash
(MD004, ul-style)
32-32: Unordered list style
Expected: asterisk; Actual: dash
(MD004, ul-style)
33-33: Unordered list style
Expected: asterisk; Actual: dash
(MD004, ul-style)
34-34: Unordered list style
Expected: asterisk; Actual: dash
(MD004, ul-style)
35-35: Unordered list style
Expected: asterisk; Actual: dash
(MD004, ul-style)
36-36: Unordered list style
Expected: asterisk; Actual: dash
(MD004, ul-style)
36-36: Files should end with a single newline character
null
(MD047, single-trailing-newline)
docs/mcp/mcp_build_fix_plan.md
7-7: Headings should be surrounded by blank lines
Expected: 1; Actual: 0; Below
(MD022, blanks-around-headings)
8-8: Lists should be surrounded by blank lines
null
(MD032, blanks-around-lists)
13-13: Headings should be surrounded by blank lines
Expected: 1; Actual: 0; Below
(MD022, blanks-around-headings)
14-14: Lists should be surrounded by blank lines
null
(MD032, blanks-around-lists)
19-19: Headings should be surrounded by blank lines
Expected: 1; Actual: 0; Below
(MD022, blanks-around-headings)
20-20: Lists should be surrounded by blank lines
null
(MD032, blanks-around-lists)
24-24: Headings should be surrounded by blank lines
Expected: 1; Actual: 0; Below
(MD022, blanks-around-headings)
25-25: Lists should be surrounded by blank lines
null
(MD032, blanks-around-lists)
27-27: Files should end with a single newline character
null
(MD047, single-trailing-newline)
docs/features/fix_discover_url.md
10-10: Unordered list indentation
Expected: 2; Actual: 4
(MD007, ul-indent)
11-11: Unordered list indentation
Expected: 2; Actual: 4
(MD007, ul-indent)
16-16: Files should end with a single newline character
null
(MD047, single-trailing-newline)
docs/features/selective_consolidation_plan.md
42-42: Spaces inside emphasis markers
null
(MD037, no-space-in-emphasis)
45-45: Spaces inside emphasis markers
null
(MD037, no-space-in-emphasis)
49-49: Spaces inside emphasis markers
null
(MD037, no-space-in-emphasis)
53-53: Spaces inside emphasis markers
null
(MD037, no-space-in-emphasis)
56-56: Spaces inside emphasis markers
null
(MD037, no-space-in-emphasis)
60-60: Spaces inside emphasis markers
null
(MD037, no-space-in-emphasis)
64-64: Spaces inside emphasis markers
null
(MD037, no-space-in-emphasis)
68-68: Spaces inside emphasis markers
null
(MD037, no-space-in-emphasis)
74-74: Spaces inside emphasis markers
null
(MD037, no-space-in-emphasis)
78-78: Spaces inside emphasis markers
null
(MD037, no-space-in-emphasis)
81-81: Spaces inside emphasis markers
null
(MD037, no-space-in-emphasis)
89-89: Spaces inside emphasis markers
null
(MD037, no-space-in-emphasis)
93-93: Spaces inside emphasis markers
null
(MD037, no-space-in-emphasis)
96-96: Spaces inside emphasis markers
null
(MD037, no-space-in-emphasis)
100-100: Spaces inside emphasis markers
null
(MD037, no-space-in-emphasis)
106-106: Spaces inside emphasis markers
null
(MD037, no-space-in-emphasis)
110-110: Spaces inside emphasis markers
null
(MD037, no-space-in-emphasis)
114-114: Spaces inside emphasis markers
null
(MD037, no-space-in-emphasis)
117-117: Spaces inside emphasis markers
null
(MD037, no-space-in-emphasis)
121-121: Spaces inside emphasis markers
null
(MD037, no-space-in-emphasis)
125-125: Spaces inside emphasis markers
null
(MD037, no-space-in-emphasis)
133-133: Spaces inside emphasis markers
null
(MD037, no-space-in-emphasis)
141-141: Spaces inside emphasis markers
null
(MD037, no-space-in-emphasis)
145-145: Spaces inside emphasis markers
null
(MD037, no-space-in-emphasis)
149-149: Spaces inside emphasis markers
null
(MD037, no-space-in-emphasis)
156-156: Files should end with a single newline character
null
(MD047, single-trailing-newline)
docs/features/fix_consolidated_files_display.md
26-26: Files should end with a single newline character
null
(MD047, single-trailing-newline)
docs/features/crawl_status_monitoring_plan.md
142-142: Files should end with a single newline character
null
(MD047, single-trailing-newline)
🪛 LanguageTool
docs/features/fix_data_extracted_display.md
[typographical] ~12-~12: If specifying a range, consider using an en dash instead of a hyphen.
Context: ... 3. [ ] Propose Solutions: Outline 1-2 ways to fix the issue based on the find...
(HYPHEN_TO_EN)
[style] ~12-~12: Consider using a different verb for a more formal wording.
Context: ...ropose Solutions:** Outline 1-2 ways to fix the issue based on the findings. Evalua...
(FIX_RESOLVE)
docs/features/selective_consolidation_plan.md
[style] ~34-~34: Three successive sentences begin with the same word. Consider rewording the sentence or use a thesaurus to find a synonym.
Context: ... total size, last updated time. * It provides buttons to view the raw markdo...
(ENGLISH_WORD_REPEAT_BEGINNING_RULE)
docs/features/fix_consolidated_files_display.md
[uncategorized] ~13-~13: You might be missing the article “the” here.
Context: ... with the download attribute or using JavaScript Blob/ObjectURL approach. * Status...
(AI_EN_LECTOR_MISSING_DETERMINER_THE)
[uncategorized] ~21-~21: Use a comma before “and” if it connects two independent clauses (unless they are closely connected and short).
Context: ... Ensure only necessary changes were made and existing functionality is preserved. ...
(COMMA_COMPOUND_SENTENCE_2)
docs/features/crawl_status_monitoring_plan.md
[grammar] ~49-~49: After the number ‘404’, use a plural noun. Did you mean “errors”?
Context: ...atus`. * If not found, return 404 error. ### Step 4.2: Backend - Update Status...
(CD_NNU)
[style] ~86-~86: Three successive sentences begin with the same word. Consider rewording the sentence or use a thesaurus to find a synonym.
Context: ...f error, display error message. * If status available: * Display `...
(ENGLISH_WORD_REPEAT_BEGINNING_RULE)
docs/architecture/remove_in_memory_files_plan.md
[style] ~47-~47: Three successive sentences begin with the same word. Consider rewording the sentence or use a thesaurus to find a synonym.
Context: ...ns.open = redirecting_openline. * **Remove/Simplify:**redirect_file_writes` func...
(ENGLISH_WORD_REPEAT_BEGINNING_RULE)
🪛 YAMLlint (1.35.1)
docker/compose/docker-compose.yml
[warning] 7-7: too few spaces before comment
(comments)
🪛 Biome (1.9.4)
components/CrawlStatusMonitor.tsx
[error] 241-241: JSX elements without children should be marked as self-closing. In JSX, it is valid for any element to be self-closing.
Unsafe fix: Use a SelfClosingElement instead
(lint/style/useSelfClosingElements)
components/ConsolidatedFiles.tsx
[error] 15-16: An empty interface is equivalent to {}.
Safe fix: Use a type alias instead.
(lint/suspicious/noEmptyInterface)
[error] 67-67: Forbidden non-null assertion.
(lint/style/noNonNullAssertion)
🪛 Ruff (0.8.2)
backend/app/main.py
5-5: typing.List is deprecated, use list instead
(UP035)
5-5: typing.Dict is deprecated, use dict instead
(UP035)
5-5: typing.Optional imported but unused
Remove unused import
(F401)
5-5: typing.Dict imported but unused
Remove unused import
(F401)
8-8: psutil imported but unused
Remove unused import: psutil
(F401)
16-16: .crawler.CrawlResult imported but unused
Remove unused import: .crawler.CrawlResult
(F401)
21-21: Trailing comma missing
Add trailing comma
(COM812)
27-34: Module level import not at top of file
(E402)
31-31: .status_manager.update_url_status imported but unused
Remove unused import: .status_manager.update_url_status
(F401)
33-33: Trailing comma missing
Add trailing comma
(COM812)
398-398: Missing return type annotation for public function get_storage_file_content
(ANN201)
407-407: Logging statement uses f-string
(G004)
408-408: Abstract raise to an inner function
(TRY301)
418-418: Logging statement uses f-string
(G004)
419-419: Abstract raise to an inner function
(TRY301)
420-420: Do not catch blind exception: Exception
(BLE001)
421-421: Logging statement uses f-string
(G004)
422-422: Within an except clause, raise exceptions with raise ... from err or raise ... from None to distinguish them from errors in exception handling
(B904)
425-425: Logging statement uses f-string
(G004)
426-426: Abstract raise to an inner function
(TRY301)
428-428: Logging statement uses f-string
(G004)
430-430: Logging statement uses f-string
(G004)
436-436: Use raise without specifying exception name
Remove exception name
(TRY201)
439-439: Logging .exception(...) should be used instead of .error(..., exc_info=True)
(G201)
439-439: Logging statement uses f-string
(G004)
439-439: Use explicit conversion flag
Replace with conversion flag
(RUF010)
440-440: Within an except clause, raise exceptions with raise ... from err or raise ... from None to distinguish them from errors in exception handling
(B904)
440-440: Use explicit conversion flag
Replace with conversion flag
(RUF010)
464-464: Missing return type annotation for public function discover_endpoint
(ANN201)
469-469: Logging statement uses f-string
(G004)
478-478: Logging statement uses f-string
(G004)
480-484: Consider moving this statement to an else block
(TRY300)
483-483: Trailing comma missing
Add trailing comma
(COM812)
486-486: Logging .exception(...) should be used instead of .error(..., exc_info=True)
(G201)
486-486: Logging statement uses f-string
(G004)
486-486: Use explicit conversion flag
Replace with conversion flag
(RUF010)
489-489: Within an except clause, raise exceptions with raise ... from err or raise ... from None to distinguish them from errors in exception handling
(B904)
489-489: Use explicit conversion flag
Replace with conversion flag
(RUF010)
491-491: Missing return type annotation for public function crawl_endpoint
(ANN201)
495-495: Logging statement uses f-string
(G004)
499-499: Logging statement uses f-string
(G004)
500-500: Abstract raise to an inner function
(TRY301)
510-510: Logging statement uses f-string
(G004)
516-516: Logging statement uses f-string
(G004)
518-522: Consider moving this statement to an else block
(TRY300)
521-521: Trailing comma missing
Add trailing comma
(COM812)
524-524: Logging .exception(...) should be used instead of .error(..., exc_info=True)
(G201)
524-524: Logging statement uses f-string
(G004)
524-524: Use explicit conversion flag
Replace with conversion flag
(RUF010)
526-526: Use explicit conversion flag
Replace with conversion flag
(RUF010)
527-527: Within an except clause, raise exceptions with raise ... from err or raise ... from None to distinguish them from errors in exception handling
(B904)
527-527: Use explicit conversion flag
Replace with conversion flag
(RUF010)
531-531: Missing return type annotation for public function get_crawl_status
(ANN201)
533-533: Logging statement uses f-string
(G004)
536-536: Logging statement uses f-string
(G004)
538-538: Logging statement uses f-string
(G004)
fast-markdown-mcp/src/fast_markdown_mcp/server.py
8-8: math imported but unused
Remove unused import: math
(F401)
10-10: typing.Dict is deprecated, use dict instead
(UP035)
10-10: typing.List is deprecated, use list instead
(UP035)
10-10: typing.Tuple is deprecated, use tuple instead
(UP035)
10-10: typing.Optional imported but unused
Remove unused import
(F401)
10-10: typing.Dict imported but unused
Remove unused import
(F401)
10-10: typing.List imported but unused
Remove unused import
(F401)
10-10: typing.Tuple imported but unused
Remove unused import
(F401)
10-10: typing.Any imported but unused
Remove unused import
(F401)
20-20: Module level import not at top of file
(E402)
20-20: .document_structure.Section imported but unused
Remove unused import: .document_structure.Section
(F401)
214-214: smart_section_search is too complex (22 > 10)
(C901)
215-215: Boolean-typed positional argument in function definition
(FBT001)
215-215: Boolean default positional argument in function definition
(FBT002)
215-215: Boolean-typed positional argument in function definition
(FBT001)
215-215: Boolean default positional argument in function definition
(FBT002)
244-244: Missing return type annotation for private function collect_sections
Add return type annotation: None
(ANN202)
244-244: Missing type annotation for function argument section_list
(ANN001)
246-246: Function definition does not bind loop variable sections
(B023)
263-263: Trailing comma missing
Add trailing comma
(COM812)
276-276: Trailing comma missing
Add trailing comma
(COM812)
289-289: Trailing comma missing
Add trailing comma
(COM812)
308-308: Trailing comma missing
Add trailing comma
(COM812)
326-326: Trailing comma missing
Add trailing comma
(COM812)
339-339: Trailing comma missing
Add trailing comma
(COM812)
374-374: Trailing comma missing
Add trailing comma
(COM812)
409-409: Do not catch blind exception: Exception
(BLE001)
410-410: Use logging.exception instead of logging.error
Replace with exception
(TRY400)
410-410: Logging statement uses f-string
(G004)
411-411: Use explicit conversion flag
Replace with conversion flag
(RUF010)
678-678: Trailing comma missing
Add trailing comma
(COM812)
682-682: Trailing comma missing
Add trailing comma
(COM812)
686-686: Trailing comma missing
Add trailing comma
(COM812)
690-690: Trailing comma missing
Add trailing comma
(COM812)
691-691: Trailing comma missing
Add trailing comma
(COM812)
693-693: Trailing comma missing
Add trailing comma
(COM812)
694-694: Trailing comma missing
Add trailing comma
(COM812)
749-749: Avoid specifying long messages outside the exception class
(TRY003)
770-770: Trailing comma missing
Add trailing comma
(COM812)
backend/app/status_manager.py
3-3: typing.Dict is deprecated, use dict instead
(UP035)
3-3: typing.Dict imported but unused
Remove unused import: typing.Dict
(F401)
26-26: Missing return type annotation for public function initialize_job
Add return type annotation: None
(ANN201)
29-29: Logging statement uses f-string
(G004)
35-35: datetime.datetime.now() called without a tz argument
(DTZ005)
36-36: Trailing comma missing
Add trailing comma
(COM812)
38-38: Logging statement uses f-string
(G004)
40-40: Missing return type annotation for public function update_overall_status
Add return type annotation: None
(ANN201)
45-45: datetime.datetime.now() called without a tz argument
(DTZ005)
50-50: Logging statement uses f-string
(G004)
51-51: Logging statement uses f-string
(G004)
53-53: Logging statement uses f-string
(G004)
55-55: Missing return type annotation for public function update_url_status
Add return type annotation: None
(ANN201)
63-63: Logging statement uses f-string
(G004)
65-65: Logging statement uses f-string
(G004)
67-67: Missing return type annotation for public function add_pending_crawl_urls
Add return type annotation: None
(ANN201)
77-77: Logging statement uses f-string
(G004)
79-79: Logging statement uses f-string
(G004)
backend/app/crawler.py
1-1: typing.List is deprecated, use list instead
(UP035)
1-1: typing.Set is deprecated, use set instead
(UP035)
1-1: typing.Dict is deprecated, use dict instead
(UP035)
1-1: typing.Dict imported but unused
Remove unused import: typing.Dict
(F401)
141-141: PEP 484 prohibits implicit Optional
Convert to Optional[T]
(RUF013)
142-142: Trailing comma missing
Add trailing comma
(COM812)
143-143: Use list instead of List for type annotation
Replace with list
(UP006)
157-157: Logging statement uses f-string
(G004)
181-181: Logging statement uses f-string
(G004)
188-188: Async functions should not call blocking HTTP methods
(ASYNC210)
192-192: Trailing comma missing
Add trailing comma
(COM812)
196-196: Logging statement uses f-string
(G004)
203-203: Logging statement uses f-string
(G004)
204-204: Async functions should not call blocking HTTP methods
(ASYNC210)
207-207: Trailing comma missing
Add trailing comma
(COM812)
214-214: Logging statement uses f-string
(G004)
219-219: Unnecessary elif after break statement
Remove unnecessary elif
(RET508)
221-221: Logging statement uses f-string
(G004)
229-229: Logging statement uses f-string
(G004)
234-234: Logging .exception(...) should be used instead of .error(..., exc_info=True)
(G201)
234-234: Logging statement uses f-string
(G004)
234-234: Use explicit conversion flag
Replace with conversion flag
(RUF010)
239-239: Logging .exception(...) should be used instead of .error(..., exc_info=True)
(G201)
239-239: Logging statement uses f-string
(G004)
239-239: Use explicit conversion flag
Replace with conversion flag
(RUF010)
247-247: Logging statement uses f-string
(G004)
319-319: Trailing comma missing
Add trailing comma
(COM812)
327-327: Logging statement uses f-string
(G004)
329-329: Consider moving this statement to an else block
(TRY300)
332-332: Use explicit conversion flag
Replace with conversion flag
(RUF010)
333-333: Logging .exception(...) should be used instead of .error(..., exc_info=True)
(G201)
343-343: crawl_pages is too complex (44 > 10)
(C901)
343-343: Use list instead of List for type annotation
Replace with list
(UP006)
343-343: PEP 484 prohibits implicit Optional
Convert to Optional[T]
(RUF013)
378-378: Logging statement uses f-string
(G004)
394-394: Logging statement uses f-string
(G004)
412-412: Logging statement uses f-string
(G004)
433-433: Logging statement uses f-string
(G004)
472-472: f-string without any placeholders
Remove extraneous f prefix
(F541)
483-483: Async functions should not open files with blocking methods like open
(ASYNC230)
487-487: Logging statement uses f-string
(G004)
488-488: Replace aliased errors with OSError
Replace IOError with builtin OSError
(UP024)
489-489: Logging .exception(...) should be used instead of .error(..., exc_info=True)
(G201)
489-489: Logging statement uses f-string
(G004)
492-492: Logging .exception(...) should be used instead of .error(..., exc_info=True)
(G201)
492-492: Logging statement uses f-string
(G004)
501-501: Unnecessary open mode parameters
Remove open mode parameters
(UP015)
501-501: Async functions should not open files with blocking methods like open
(ASYNC230)
504-504: Logging .exception(...) should be used instead of .error(..., exc_info=True)
(G201)
504-504: Logging statement uses f-string
(G004)
506-506: Replace aliased errors with OSError
Replace IOError with builtin OSError
(UP024)
507-507: Logging .exception(...) should be used instead of .error(..., exc_info=True)
(G201)
507-507: Logging statement uses f-string
(G004)
510-510: Logging .exception(...) should be used instead of .error(..., exc_info=True)
(G201)
510-510: Logging statement uses f-string
(G004)
536-536: Async functions should not open files with blocking methods like open
(ASYNC230)
538-538: Logging statement uses f-string
(G004)
539-539: Replace aliased errors with OSError
Replace IOError with builtin OSError
(UP024)
540-540: Logging .exception(...) should be used instead of .error(..., exc_info=True)
(G201)
540-540: Logging statement uses f-string
(G004)
542-542: Logging .exception(...) should be used instead of .error(..., exc_info=True)
(G201)
542-542: Logging statement uses f-string
(G004)
543-543: Logging statement uses f-string
(G004)
544-544: Do not catch blind exception: Exception
(BLE001)
545-545: Use logging.exception instead of logging.error
Replace with exception
(TRY400)
545-545: Logging statement uses f-string
(G004)
545-545: Use explicit conversion flag
Replace with conversion flag
(RUF010)
548-548: Unnecessary elif after break statement
Remove unnecessary elif
(RET508)
550-550: Logging statement uses f-string
(G004)
555-555: Logging statement uses f-string
(G004)
556-556: Do not catch blind exception: Exception
(BLE001)
557-557: Use logging.exception instead of logging.error
Replace with exception
(TRY400)
557-557: Logging statement uses f-string
(G004)
557-557: Use explicit conversion flag
Replace with conversion flag
(RUF010)
563-563: Logging statement uses f-string
(G004)
603-603: Logging statement uses f-string
(G004)
611-611: Logging statement uses f-string
(G004)
617-617: Logging statement uses f-string
(G004)
624-624: Use explicit conversion flag
Replace with conversion flag
(RUF010)
625-625: Logging .exception(...) should be used instead of .error(..., exc_info=True)
(G201)
646-646: Logging statement uses f-string
(G004)
653-653: Trailing comma missing
Add trailing comma
(COM812)
657-657: Use explicit conversion flag
Replace with conversion flag
(RUF010)
658-658: Logging .exception(...) should be used instead of .error(..., exc_info=True)
(G201)
🔇 Additional comments (59)
docs/features/selective_consolidation_plan.md (1)
1-156: Feature plan looks comprehensive and well-structured.This feature plan for Selective URL Consolidation provides a detailed breakdown of the workflow and implementation steps. The four-step process (Discovery, Selection, Crawl & Consolidation, Display Results) is clearly defined with specific technical details for each component.
Each implementation task is marked as completed with appropriate file locations referenced, showing a well-tracked implementation process.
🧰 Tools
🪛 LanguageTool
[style] ~34-~34: Three successive sentences begin with the same word. Consider rewording the sentence or use a thesaurus to find a synonym.
Context: ... total size, last updated time. * It provides buttons to view the raw markdo...(ENGLISH_WORD_REPEAT_BEGINNING_RULE)
🪛 markdownlint-cli2 (0.17.2)
42-42: Spaces inside emphasis markers
null(MD037, no-space-in-emphasis)
45-45: Spaces inside emphasis markers
null(MD037, no-space-in-emphasis)
49-49: Spaces inside emphasis markers
null(MD037, no-space-in-emphasis)
53-53: Spaces inside emphasis markers
null(MD037, no-space-in-emphasis)
56-56: Spaces inside emphasis markers
null(MD037, no-space-in-emphasis)
60-60: Spaces inside emphasis markers
null(MD037, no-space-in-emphasis)
64-64: Spaces inside emphasis markers
null(MD037, no-space-in-emphasis)
68-68: Spaces inside emphasis markers
null(MD037, no-space-in-emphasis)
74-74: Spaces inside emphasis markers
null(MD037, no-space-in-emphasis)
78-78: Spaces inside emphasis markers
null(MD037, no-space-in-emphasis)
81-81: Spaces inside emphasis markers
null(MD037, no-space-in-emphasis)
89-89: Spaces inside emphasis markers
null(MD037, no-space-in-emphasis)
93-93: Spaces inside emphasis markers
null(MD037, no-space-in-emphasis)
96-96: Spaces inside emphasis markers
null(MD037, no-space-in-emphasis)
100-100: Spaces inside emphasis markers
null(MD037, no-space-in-emphasis)
106-106: Spaces inside emphasis markers
null(MD037, no-space-in-emphasis)
110-110: Spaces inside emphasis markers
null(MD037, no-space-in-emphasis)
114-114: Spaces inside emphasis markers
null(MD037, no-space-in-emphasis)
117-117: Spaces inside emphasis markers
null(MD037, no-space-in-emphasis)
121-121: Spaces inside emphasis markers
null(MD037, no-space-in-emphasis)
125-125: Spaces inside emphasis markers
null(MD037, no-space-in-emphasis)
133-133: Spaces inside emphasis markers
null(MD037, no-space-in-emphasis)
141-141: Spaces inside emphasis markers
null(MD037, no-space-in-emphasis)
145-145: Spaces inside emphasis markers
null(MD037, no-space-in-emphasis)
149-149: Spaces inside emphasis markers
null(MD037, no-space-in-emphasis)
156-156: Files should end with a single newline character
null(MD047, single-trailing-newline)
backend/app/utils.py (1)
1-9: Well-implemented URL normalization function.The
normalize_urlfunction is clean, well-documented, and handles URL normalization correctly by:
- Removing trailing slashes from paths
- Ensuring empty paths default to '/'
- Properly reconstructing the URL using scheme, netloc, and path
The function follows Python best practices with a clear docstring and type hints.
docker/dockerfiles/Dockerfile.frontend (1)
3-3: Good implementation of configurable backend URL.Adding the build argument and environment variable for
NEXT_PUBLIC_BACKEND_URLenables flexible configuration of the backend endpoint during build time. This supports the environment setup instructions added to the README.md file.The placement is appropriate - defining the ARG early and setting the ENV before the build process.
Also applies to: 18-18
README.md (2)
110-115: Clear environment setup instructions added.The added environment configuration instructions for Mac/Linux users are clear and helpful. Explicitly mentioning the need to copy the template file and set the
NEXT_PUBLIC_BACKEND_URLcorrectly will help users avoid common setup issues.
121-138: Good addition of Windows-specific setup instructions.Clearly marking the Windows instructions as "Experimental Only (Not Tested Yet)" sets appropriate expectations. The added prerequisites with links to official installation guides for WSL 2 and Docker Desktop are helpful for Windows users who might be less familiar with these requirements.
docs/local/crawl4ai_local_fix_plan.md (1)
1-4: Overall Clarity & Context
The task list is well described, and the context of addressing the browser crash is clearly communicated.docs/features/fix_backend_url_resolution_plan.md (1)
1-4: Document Purpose & Clarity
The document clearly states its objective—to resolve thenet::ERR_NAME_NOT_RESOLVEDerror through a cleanup and rebuild process. The goal and commands are well articulated.docs/features/fix_discover_url.md (1)
1-4: Clear Objective Statement
The objective of updating the backend URL fromhttp://localhost:24125tohttp://backend:24125is clearly set out, and the rationale is well documented.docs/features/fix_data_extracted_display.md (1)
1-6: Overall Structure & Purpose
The feature objective and associated tasks are comprehensively detailed. The instructions clearly outline the expected analysis and steps for fixing the display issue in theJobStatsSummarycomponent.docs/features/fix_frontend_build_env_var_plan.md (2)
1-4: Clear and Focused Objective
The document clearly outlines the objective to resolve thenet::ERR_NAME_NOT_RESOLVEDerror by ensuring proper handling of theNEXT_PUBLIC_BACKEND_URLduring Docker builds.
13-17: Configuration Verification Steps
The verification steps for ensuring that the environment variable is correctly passed are well defined. No changes necessary here.docs/features/ui_updates_plan.md (1)
1-36: Well-structured UI updates planThe document provides a clear and organized plan for updating the user interface with specific tasks, files to modify, and verification steps. It aligns with the PR objectives to streamline the UI and enhance clarity.
app/api/storage/route.ts (3)
44-57: Good interface definition for DiskFileDetailThe interface clearly defines the structure for disk file details, providing strong typing for the application.
135-137: Proper removal of in-memory file handlingThe code changes correctly implement the refactoring outlined in the PR objectives to remove in-memory storage mechanisms. The comments clearly indicate the rationale behind the change.
141-141: Type safety improvementAdding the explicit type to the file parameter enhances type safety and readability.
lib/utils.ts (1)
8-65: Well-implemented URL to filename conversion utilityThe
url_to_filenamefunction is well-structured with:
- Clear documentation explaining its purpose and relation to backend logic
- Robust error handling with a fallback mechanism
- Proper URL parsing and processing for filename generation
- Commented out code for potential length limits with clear rationale
The implementation aligns with the backend's approach while maintaining appropriate simplifications for frontend use.
components/SubdomainList.tsx (3)
10-10: Good interface update making onCrawlSelected optionalMaking the
onCrawlSelectedprop optional is appropriate since the functionality has been moved to another component.
14-14: Clear indication of unused propCommenting out the unused prop in the function parameters while keeping it in the interface provides clarity about the change without breaking compatibility.
138-138: Clear comment about UI restructuringThe comment clearly explains that the button functionality has been moved to the CrawlStatusMonitor component, which aligns with the PR objective to streamline the UI.
components/DiscoveredFiles.tsx (3)
241-242: Updated function call correctly!This change correctly updates the call to
handleDownloadto match the new function signature.
250-251: Updated function call correctly!This change correctly updates the call to
handleDownloadto match the new function signature.
15-27:❌ Incorrect review comment
Consider reviewing the
isInMemorypropertyThe
isInMemoryproperty is still defined in theDiscoveredFileinterface even though the in-memory file handling has been removed.Since the in-memory file handling functionality has been removed, please check if this property is still needed or if it can be safely removed:
🏁 Script executed:
#!/bin/bash # Check if the isInMemory property is still being used elsewhere in the codebase echo "Checking for isInMemory usage in the codebase..." rg -A 2 -B 2 "isInMemory" --type ts --type tsx --glob "!components/DiscoveredFiles.tsx"Length of output: 222
Action Required – Re-verify
isInMemoryUsageThe
DiscoveredFileinterface still defines theisInMemoryproperty even though in-memory file handling has been removed. The initial check returned an error due to an unrecognized file type (tsx). Please run the updated script below to verify ifisInMemoryis referenced elsewhere in your codebase:If this script confirms no usage of
isInMemoryelsewhere, then it is safe to remove the property from the interface. Please verify the output manually before proceeding.
🏁 Script executed:
#!/bin/bash # Re-check for isInMemory usage in .ts and .tsx files (excluding components/DiscoveredFiles.tsx) echo "Re-checking for isInMemory usage in .ts and .tsx files (excluding components/DiscoveredFiles.tsx)..." rg -A 2 -B 2 "isInMemory" -g "*.ts" -g "*.tsx" --glob "!components/DiscoveredFiles.tsx"Length of output: 2784
Clarification: Retain
isInMemoryin DiscoveredFile InterfaceOur investigation shows that although in-memory file handling was removed from one context, the
isInMemoryproperty remains in active use elsewhere in the codebase. Specifically:
- components/StoredFiles.tsx: The property is checked to conditionally render the in-memory file indicator.
- app/api/storage/route.ts & app/api/all-files/route.ts: The property is a documented part of the API response structure and is set with an explicit boolean value.
Given these verified usages, the
isInMemoryproperty should be retained. The suggestion to remove it is therefore not applicable.Likely an incorrect or invalid review comment.
docker/compose/docker-compose.yml (1)
12-12:✅ Verification successful
Verify the NEXT_PUBLIC_BACKEND_URL configuration
The hardcoded NEXT_PUBLIC_BACKEND_URL environment variable has been commented out, which implies it's now being supplied through the build argument instead.
Ensure that the NEXT_PUBLIC_BACKEND_URL is properly passed when building and running the application:
🏁 Script executed:
#!/bin/bash # Check if there's documentation or scripts that set this variable echo "Checking for NEXT_PUBLIC_BACKEND_URL references in documentation or scripts..." rg -A 2 -B 2 "NEXT_PUBLIC_BACKEND_URL" --type md --type shLength of output: 2819
NEXT_PUBLIC_BACKEND_URL is correctly configured via build arguments
The documentation confirms that the NEXT_PUBLIC_BACKEND_URL, now passed as a build argument (as noted in docs/features/fix_frontend_build_env_var_plan.md), replaces the previously hardcoded environment variable in docker/compose/docker-compose.yml. No further action is required regarding this configuration, but please ensure that:
- The build process consistently passes NEXT_PUBLIC_BACKEND_URL as an argument.
- The Dockerfile for the frontend correctly declares
ARG NEXT_PUBLIC_BACKEND_URLbefore executing the build commands.components/JobStatsSummary.tsx (2)
21-25: Well-implemented helper function!The
countUrlsByStatusfunction is efficiently designed to handle both single status values and arrays of statuses, with proper type checking and handling of undefined values.
44-89: Well-structured UI component with responsive design!The component creates a visually appealing statistics display with proper responsive grid layout and logical organization of information. The conditional styling based on processing state enhances the user experience by providing visual feedback.
app/api/storage/file-content/route.ts (2)
36-53: Good error handling approach.The error handling is robust, attempting to parse JSON first before falling back to text. This provides detailed error information to clients.
58-65: The content type determination is appropriate.Setting the content type based on file extension is a good practice. The implementation correctly differentiates between JSON and other file types.
docs/features/fix_consolidated_files_display.md (1)
3-4: Clear objective definition.The objective clearly defines the two issues to be addressed: lack of real-time updates and incorrect click behavior. This clarity will help guide the implementation.
components/StoredFiles.tsx (3)
66-68: Simplification of API model by removing in-memory file handling.Good refactoring to remove the
isInMemoryparameter, which simplifies the code and aligns with the architectural change to read files directly from disk.
241-242: Updated function call correctly removes isInMemory parameter.The click handler has been updated correctly to match the new function signature without the isInMemory parameter.
250-251: Updated function call correctly removes isInMemory parameter.The click handler has been updated correctly to match the new function signature without the isInMemory parameter.
docs/features/crawl_status_monitoring_plan.md (4)
11-12: Well-defined goal for user experience improvement.The goal clearly articulates the user-focused benefit of providing visibility into the crawling process, which aligns well with the PR objectives.
15-18: Good architectural decision with rationale.Choosing polling over WebSockets/SSE for the MVP is a sensible approach that balances implementation simplicity with user experience improvements. The rationale is well explained.
30-32: Comprehensive status tracking model.The status model is well-defined with clear states for both overall job status and individual URL status. This will provide users with detailed progress information.
91-137: Excellent sequence diagram.The sequence diagram provides a clear visual representation of the component interactions and data flow. This will be valuable for implementation and future maintenance.
components/CrawlStatusMonitor.tsx (4)
41-57: Well-structured data processing with defensive programming.The
useMemoimplementation properly handles potential invalid data by including multiple defensive checks. This approach prevents runtime errors when dealing with potentially undefined or malformed status data.
66-70: Good state reset implementation.The
useEffecthook appropriately resets the internal UI state when the job ID or status changes, ensuring a consistent user experience across different jobs.
72-94: Clean handling of component states with early returns.The component uses early returns to handle different states (no job, loading, error), which makes the code more readable and provides clear feedback to users.
175-192: Improved layout with repositioned button.Moving the "Crawl Selected" button to the top grid alongside the status information creates a more intuitive UI flow. Users can now see both the status and action in the same visual line.
components/ConsolidatedFiles.tsx (3)
23-84: Well-implemented file fetching with error handling.The
fetchFilesfunction properly handles loading states, errors, and data processing. The decision not to clear stale data during polling errors is a good UX choice that prevents jarring UI changes.🧰 Tools
🪛 Biome (1.9.4)
[error] 67-67: Forbidden non-null assertion.
(lint/style/noNonNullAssertion)
86-100: Good polling implementation with cleanup.The polling mechanism is correctly implemented with appropriate cleanup in the useEffect hook. This ensures that the component remains responsive and doesn't create memory leaks when unmounted.
152-153:✅ Verification successful
Ensure URL encoding is robust for special characters.
While
encodeURIComponentis used, consider more comprehensive handling of file paths, especially for file names with special characters or unusual structures.
🏁 Script executed:
#!/bin/bash # Description: Check if there are any special characters in file names that might cause issues # Find all markdown files in the storage directory fd '.md$' storage/markdown --type f | while read -r file; do # Extract the filename without extension filename=$(basename "$file" .md) # Check for special characters other than alphanumeric, underscore, or hyphen if [[ "$filename" =~ [^a-zA-Z0-9_\-] ]]; then echo "Warning: File with special characters found: $file" fi doneLength of output: 156
Attention: Verify the robustness of URL encoding for file names with special characters
The current implementation using
encodeURIComponentincomponents/ConsolidatedFiles.tsxcovers many scenarios, but it’s important to ensure that file names—with any unusual or special characters—are consistently safe. Note that our initial verification script failed due to an incorrect search path. Please update and run a revised check against all markdown files in the repository (or manually verify) to confirm there are no edge cases that might be missed.For example, you can use a script like the one below to scan from the repository root:
- Action Required: Please verify that any file names with special characters are properly handled by
encodeURIComponentor, if necessary, consider additional sanitization steps.- Next Steps: Re-run the above script (or perform a manual check) to ensure that no file names in your repository could lead to issues with URL encoding.
🏁 Script executed:
#!/bin/bash # Recursively search for markdown files and flag those with unexpected special characters. fd '\.md$' | while read -r file; do filename=$(basename "$file" .md) if [[ "$filename" =~ [^a-zA-Z0-9_\-] ]]; then echo "Warning: Special characters found in: $file" fi doneLength of output: 3486
URL Encoding Check Complete
After re-running a revised script across the repository, no markdown files containing unexpected special characters were detected. This indicates that using
encodeURIComponentto processfile.baseNamecurrently suffices for our file naming conventions. Nonetheless, please ensure that any future changes to file naming or external inputs are accompanied by proper validation and testing.lib/types.ts (3)
10-10: Good addition of new status for discovered pages.Adding the
pending_crawlstatus to theDiscoveredPageinterface properly aligns the frontend types with the backend implementation and enables more fine-grained status tracking.
43-72: Comprehensive job status type definitions.The newly added
OverallStatus,UrlStatus, andCrawlJobStatustypes/interfaces provide a clear contract for the status monitoring functionality. The inclusion of optional fields for timestamps, error messages, and extracted data size enhances the UI's ability to display detailed status information.
75-93: Well-structured API request/response interfaces.The new interfaces for API interactions (
DiscoverResponse,CrawlRequest,CrawlResponse) establish a clear contract between frontend and backend. The inclusion ofjobIdin each interface properly enables tracking and linking of jobs throughout the application.docs/architecture/remove_in_memory_files_plan.md (1)
1-30: Documentation is well-structured and provides a clear refactoring planlib/crawl-service.ts (1)
1-3: Leverage newly added types for clarityThe additional type imports help standardize request/response handling with the backend, making the interface more robust.
fast-markdown-mcp/src/fast_markdown_mcp/server.py (2)
670-696: Tool definition is clear and well-structuredNicely done adding
smart_section_searchwith detailed input parameters, improving UX for advanced search operations.🧰 Tools
🪛 Ruff (0.8.2)
678-678: Trailing comma missing
Add trailing comma
(COM812)
682-682: Trailing comma missing
Add trailing comma
(COM812)
686-686: Trailing comma missing
Add trailing comma
(COM812)
690-690: Trailing comma missing
Add trailing comma
(COM812)
691-691: Trailing comma missing
Add trailing comma
(COM812)
693-693: Trailing comma missing
Add trailing comma
(COM812)
694-694: Trailing comma missing
Add trailing comma
(COM812)
695-695: Trailing comma missing
Add trailing comma
(COM812)
746-772: Good robust parameter handlingThis code gracefully validates arguments for “smart_section_search” and sets functional defaults for fuzzy and regex matching.
🧰 Tools
🪛 Ruff (0.8.2)
749-749: Avoid specifying long messages outside the exception class
(TRY003)
770-770: Trailing comma missing
Add trailing comma
(COM812)
backend/app/crawler.py (1)
646-650: Capture total size in theupdate_overall_status.
Updating the job as completed or completed_with_errors is good, and passingdata_extracted=size_stris valuable. Ensure the entire code path sets the size accurately, including partial/canceled crawls, to keep the status consistent for real-time UI updates.🧰 Tools
🪛 Ruff (0.8.2)
646-646: Logging statement uses f-string
(G004)
app/page.tsx (9)
10-10: Ensure cohesive naming and usage consistency.The newly imported
CrawlStatusMonitoraligns with the updated crawling logic. Ensure that any references elsewhere in the code or documentation (e.g., comments, readme) are updated to reflect this new component name.
54-54: Resetting job ID is good practice.Resetting
currentJobIdon new discovery prevents stale job references, avoiding confusion in subsequent crawls.
63-69: User feedback clarity.The toast notification effectively communicates the initiation of a new discovery process. This improves user understanding of asynchronous events. No immediate concerns here.
151-160: Toast usage is consistent.Nice work sending a toast to confirm the backend acknowledged the crawl request. This pattern fosters a transparent user experience. Keep it consistent with the rest of the application.
222-223: Error logs adequately surfaced.You log the crawl error to the console and show a toast. This approach keeps the user informed while also aiding developers in debugging. Looks good.
225-228: Comprehensive error details.Including specific error descriptions (e.g., the error message from the backend) is a good practice, making it easier to debug issues without diving too deeply into logs.
230-230: Consistent cleanup.Toggling
setIsCrawlingSelected(false)onfinallyensures the UI returns to a stable state even if an error occurs, improving robustness.
236-311: Polling logic is well-structured.
- The
useEffecthook’s cleanup ofintervalIdprevents memory leaks.- Checking
terminalStatesto stop polling is efficient.- Consider using a more robust library or a custom hook if additional job states or concurrency requirements emerge.
329-332: Clean transition from older to newer components.Replacing the old
ProcessingBlockwithJobStatsSummaryis clearly documented and helps new readers grok the flow. This fosters maintainability.
| ## Feature: MCP Build Configuration Fix | ||
|
|
||
| ### Task 1: Identify Cause | ||
| - **Goal:** Determine why the `pip install -e .` command fails during the `mcp` service build. | ||
| - **Action:** Analyze build logs. | ||
| - **Result:** Error `OSError: Readme file does not exist: README_MCP.md` indicates `pyproject.toml` points to a non-existent file. | ||
| - **Status:** Done | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🧹 Nitpick (assertive)
Improve Spacing for Task 1 Section
The "## Feature: MCP Build Configuration Fix" header (line 5) and the "### Task 1: Identify Cause" heading (line 7) should each be surrounded by blank lines. This adherence to markdownlint (MD022 & MD032) will enhance the visual structure of the document.
🧰 Tools
🪛 markdownlint-cli2 (0.17.2)
7-7: Headings should be surrounded by blank lines
Expected: 1; Actual: 0; Below
(MD022, blanks-around-headings)
8-8: Lists should be surrounded by blank lines
null
(MD032, blanks-around-lists)
| ### Task 3: Apply Fix | ||
| - **Goal:** Correct the `pyproject.toml` configuration. | ||
| - **Action:** Remove the optional `readme = "README_MCP.md"` line from `fast-markdown-mcp/pyproject.toml`. | ||
| - **Status:** Pending | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🧹 Nitpick (assertive)
Spacing Improvements for Task 3 Section
Similar to the previous sections, add a blank line before and/or after the "### Task 3: Apply Fix" heading at line 19. This adjustment will satisfy MD022 requirements and improve readability in the document’s task segments.
🧰 Tools
🪛 markdownlint-cli2 (0.17.2)
19-19: Headings should be surrounded by blank lines
Expected: 1; Actual: 0; Below
(MD022, blanks-around-headings)
20-20: Lists should be surrounded by blank lines
null
(MD032, blanks-around-lists)
| ### Task 2: Analyze `pyproject.toml` and Directory Contents | ||
| - **Goal:** Confirm the `readme` field setting and check for the actual file. | ||
| - **Action:** Read `fast-markdown-mcp/pyproject.toml` and list files in `fast-markdown-mcp/`. | ||
| - **Result:** `pyproject.toml` specifies `readme = "README_MCP.md"`, but the file is missing from the directory. | ||
| - **Status:** Done | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🧹 Nitpick (assertive)
Enhance Spacing for Task 2 Section
For the "### Task 2: Analyze pyproject.toml and Directory Contents" heading at line 13, ensure that it is preceded and followed by a blank line to comply with markdownlint guidelines. The list items that follow (lines 14–17) are clear but could benefit from consistent spacing.
🧰 Tools
🪛 markdownlint-cli2 (0.17.2)
13-13: Headings should be surrounded by blank lines
Expected: 1; Actual: 0; Below
(MD022, blanks-around-headings)
14-14: Lists should be surrounded by blank lines
null
(MD032, blanks-around-lists)
| # Task List: Fix MCP Docker Build Error (Missing Readme) | ||
|
|
||
| This plan outlines the steps to resolve the Docker build failure for the `mcp` service caused by a missing readme file specified in `fast-markdown-mcp/pyproject.toml`. | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🧹 Nitpick (assertive)
Enhance Markdown Formatting for Introduction
Ensure that your introductory section and primary header are formatted with appropriate blank lines. For example, while the header on line 1 is correctly followed by a blank line on line 2, verify that any additional spacing conforms to markdownlint MD022 and MD032 guidelines for improved readability.
| ### Task 4: Verify Fix | ||
| - **Goal:** Confirm the Docker build for the `mcp` service now succeeds. | ||
| - **Action:** Instruct the user to re-run the Docker build process (e.g., `./docker-start.sh` or `docker-compose build mcp`). | ||
| - **Status:** Pending No newline at end of file |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🧹 Nitpick (assertive)
Improve Formatting for Task 4 Section and File End
Ensure the "### Task 4: Verify Fix" heading (line 24) is surrounded by blank lines. Additionally, confirm that the file ends with a single newline character to meet markdownlint MD047. A quick diff example to add a trailing newline might look like:
-(EOF immediately after line 27)
+ This minor adjustment enhances consistency and compliance with markdown formatting best practices.
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| ### Task 4: Verify Fix | |
| - **Goal:** Confirm the Docker build for the `mcp` service now succeeds. | |
| - **Action:** Instruct the user to re-run the Docker build process (e.g., `./docker-start.sh` or `docker-compose build mcp`). | |
| - **Status:** Pending | |
| ### Task 4: Verify Fix | |
| - **Goal:** Confirm the Docker build for the `mcp` service now succeeds. | |
| - **Action:** Instruct the user to re-run the Docker build process (e.g., `./docker-start.sh` or `docker-compose build mcp`). | |
| - **Status:** Pending | |
🧰 Tools
🪛 markdownlint-cli2 (0.17.2)
24-24: Headings should be surrounded by blank lines
Expected: 1; Actual: 0; Below
(MD022, blanks-around-headings)
25-25: Lists should be surrounded by blank lines
null
(MD032, blanks-around-lists)
27-27: Files should end with a single newline character
null
(MD047, single-trailing-newline)
| // Renamed and refactored handler for the "Crawl Selected" button click | ||
| const handleCrawlSelectedClick = async () => { | ||
| // Removed erroneous inner function declaration | ||
| // Maybe this button should only appear *after* discovery is complete and pages are shown by the monitor? | ||
| // Or maybe the crawl action is triggered differently now? | ||
| // For now, let's assume we still select pages and trigger crawl, but using the currentJobId. | ||
|
|
||
| // Use state variables directly | ||
| if (!currentJobId || selectedUrls.size === 0) { | ||
| toast({ | ||
| title: "Cannot Initiate Crawl", | ||
| description: !currentJobId ? "No active job found." : "No URLs selected.", | ||
| variant: "default" // Changed from "warning" as it's not a valid variant | ||
| }) | ||
| return; | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🧹 Nitpick (assertive)
Ensure consistent user feedback when initiating selective crawl.
You’re providing toast messages on both success and failure, which is great. Consider additional user-facing feedback while the crawl request is pending, such as a loading spinner on the “Crawl Selected” button.
| }; // Added back closing brace for handleCrawlSelectedClick | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🧹 Nitpick (assertive)
Maintain code style consistency.
A separate closing brace followed by a blank line is acceptable for clarity, but ensure style consistency across the codebase, especially if you use a code formatter.
| {/* Keep SubdomainList for now, but it might be replaced by CrawlStatusMonitor's display */} | ||
| {/* Log if legacy SubdomainList condition is met */} | ||
| {(() => { | ||
| console.log(`page.tsx: discoveredPages.length = ${discoveredPages.length}. Rendering legacy SubdomainList? ${discoveredPages.length > 0}`); | ||
| return null; // Return null to render nothing | ||
| })()} | ||
| {discoveredPages.length > 0 && ( | ||
| <div className="bg-gray-800/50 backdrop-blur-lg rounded-2xl p-6 border border-gray-700 shadow-xl"> | ||
| <h2 className="text-2xl font-semibold mb-4 text-cyan-400">Discovered Pages (Legacy Display)</h2> | ||
| <SubdomainList | ||
| subdomains={discoveredPages} | ||
| // onCrawlSelected={handleCrawlSelected} // Remove this prop, button moved | ||
| isProcessing={isCrawling} // isCrawling state might also become redundant | ||
| /> | ||
| </div> | ||
| )} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🛠️ Refactor suggestion
Warnings about legacy display.
Logging indicates a legacy display approach. This is fine if you intend to remove it soon. Otherwise, unify both displays to prevent confusion for users not expecting two separate flows.
| // Clear previous results shown by SubdomainList? Or wait for polling? | ||
| // For now, let's clear discoveredPages, the monitor will show progress. | ||
| setDiscoveredPages([]) | ||
| // No need to reset old stats state anymore | ||
| // setStats({ | ||
| // subdomainsParsed: 0, | ||
| // pagesCrawled: 0, | ||
| // dataExtracted: '0 KB', | ||
| // errorsEncountered: 0 | ||
| // }) | ||
|
|
||
|
|
||
| /* --- Old logic expecting direct page results --- |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🧹 Nitpick (assertive)
Maintain minimal commented-out blocks.
The old logic is commented out for reference. If it’s unlikely you’ll revert or reference it soon, store this in version control or documentation instead. Excess commented code can confuse maintainers.
app/page.tsx
Outdated
| import { DiscoveredPage, CrawlJobStatus, OverallStatus } from '@/lib/types' // Import status types | ||
| import ConsolidatedFiles from '@/components/ConsolidatedFiles' // Import ConsolidatedFiles |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🧹 Nitpick (assertive)
Validate that all imported entities are indeed required.
OverallStatus is imported along with DiscoveredPage and CrawlJobStatus. Confirm that each is being used in this file. If not, consider removing any unused imports to keep the code clean.
Introduces a new `CrawlUrls` component to display and manage discovered URLs during a crawl job. This component utilizes Shadcn UI elements (Table, Checkbox, Badge, Tooltip) to provide a detailed view of individual URL statuses, handle URL selection for targeted actions, and display status updates driven by polling managed in `app/page.tsx`. Key changes include: - Creation of the `CrawlUrls` component for URL list display and interaction. - Refactoring of `CrawlStatusMonitor` to focus solely on displaying the overall job status within a Dialog component. - Updates to `app/page.tsx` to manage essential state (job ID, job status, selected URLs) and orchestrate the polling mechanism for fetching URL-specific status updates. - Fixed UI bugs where status icons were not updating correctly and checkbox selection state was inconsistent. - Adjusted the styling of the info icon button for better contrast as per user feedback. These frontend enhancements align with the ongoing backend redesign, supporting the new job-based status management and polling architecture for more granular progress tracking. Updated documentation in `docs/features/` (adjust_info_button_style_plan.md, fix_discovered_pages_ui_bugs.md, create_crawl_urls_component_plan.md, crawl_status_monitoring_plan.md) to reflect the completion of related tasks.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 29
🔭 Outside diff range comments (2)
package.json (1)
11-30: 💡 Verification agent🧩 Analysis chain
Verify dependency compatibility with React 19.
The project is using React 19.0.0, which is quite recent. Please ensure that all Radix UI components are compatible with this version to avoid runtime issues.
🌐 Web query:
Are @radix-ui/react-checkbox, @radix-ui/react-dialog, and @radix-ui/react-tooltip compatible with React 19.0.0?💡 Result:
@radix-ui/react-checkbox, @radix-ui/react-dialog, and @radix-ui/react-tooltip are compatible with React 19.0.0 when using specific release candidate (RC) versions, though standard installs may require adjustments. Here's the breakdown:
## Compatibility Status
- React 19 support confirmed for RC versions of these packages through community testing and maintainer updates[1][10].
- Standard npm installs may show peer dependency conflicts due to version range specifications not including React 19's release candidate versions[1][10].
## Installation Solutions
RC Versions Required:
npm install \ @radix-ui/react-checkbox@1.1.0-rc.6 \ @radix-ui/react-dialog@1.1.0-rc.6 \ @radix-ui/react-tooltip@1.1.0-rc.6These versions explicitly support React 19[1].
For npm Users:
Use installation flags to bypass peer dependency checks:npm install --legacy-peer-depsRequired if using non-RC versions or encountering conflicts[1][10].
## Version Considerations
- Dialog: Latest stable (1.1.7 as of 2025-04-08) supports React 19[2][5]
- Tooltip: Version 1.2.0+ includes React 19 support[6]
- Checkbox: RC versions (1.1.0-rc.6) required for full compatibility[7]
## Verified Patterns
Developers have successfully used these dependencies with React 19 by either:
- Pinning RC versions explicitly[1]
- Using pnpm/bun which handles peer dependencies more leniently[10]
- Applying npm overrides for related dependencies like react-is[10]
The Radix team has actively updated peer dependency ranges in RC releases to include React 19, though some package versions may still need manual specification during installation[1][10].
Citations:
- 1: radix-ui/primitives#2909
- 2: https://www.npmjs.com/package/@radix-ui/react-dialog
- 3: https://www.radix-ui.com/primitives/docs/components/tooltip
- 4: https://www.radix-ui.com/primitives/docs/components/checkbox
- 5: https://www.npmjs.com/package/@radix-ui/react-dialog?activeTab=versions
- 6: https://www.npmjs.com/package/@radix-ui/react-tooltip
- 7: https://www.npmjs.com/package/@radix-ui/react-checkbox?activeTab=versions
- 8: https://www.radix-ui.com/primitives/docs/components/dialog
- 9: https://www.radix-ui.com/primitives
- 10: https://ui.shadcn.com/docs/react-19
React 19 Compatibility for Radix UI Components
- @radix-ui/react-dialog (v1.1.7) and @radix-ui/react-tooltip (v1.2.0): These versions are confirmed to support React 19.
- @radix-ui/react-checkbox: Community reports indicate that full compatibility with React 19 is only guaranteed with release candidate versions (e.g., 1.1.0-rc.6). The current version (^1.1.5) might lead to peer dependency conflicts.
Action Items:
- Consider updating
@radix-ui/react-checkboxto the recommended RC version, or ensure that your package manager is configured to handle potential peer dependency issues (e.g., using--legacy-peer-depsduring installation).backend/app/crawler.py (1)
133-243: 🛠️ Refactor suggestionAvoid blocking I/O in async functions.
Blocking HTTP calls (
requests.get,requests.post) and file operations withopenmay stall the event loop inasyncfunctions. Switch to an async HTTP client (e.g.,httpxwithasync/await) and useaiofilesor other non-blocking I/O libraries for disk reads/writes to maintain concurrency.-# Example: Using requests -response = requests.post(f"{CRAWL4AI_URL}/crawl", headers=headers, json=simple_request, timeout=30) +# Example: Using HTTPX in an async manner +import httpx +async with httpx.AsyncClient() as client: + response = await client.post(f"{CRAWL4AI_URL}/crawl", headers=headers, json=simple_request, timeout=30)Also applies to: 343-626
🧰 Tools
🪛 Ruff (0.8.2)
133-133:
discover_pagesis too complex (37 > 10)(C901)
137-137: Use
setinstead ofSetfor type annotationReplace with
set(UP006)
137-137: PEP 484 prohibits implicit
OptionalConvert to
Optional[T](RUF013)
138-138: Use
setinstead ofSetfor type annotationReplace with
set(UP006)
138-138: PEP 484 prohibits implicit
OptionalConvert to
Optional[T](RUF013)
139-139: Use
setinstead ofSetfor type annotationReplace with
set(UP006)
139-139: PEP 484 prohibits implicit
OptionalConvert to
Optional[T](RUF013)
140-140: PEP 484 prohibits implicit
OptionalConvert to
Optional[T](RUF013)
141-141: PEP 484 prohibits implicit
OptionalConvert to
Optional[T](RUF013)
142-142: Trailing comma missing
Add trailing comma
(COM812)
143-143: Use
listinstead ofListfor type annotationReplace with
list(UP006)
157-157: Logging statement uses f-string
(G004)
162-162: Logging statement uses f-string
(G004)
166-166: Logging statement uses f-string
(G004)
169-169: Logging statement uses f-string
(G004)
181-181: Logging statement uses f-string
(G004)
188-188: Async functions should not call blocking HTTP methods
(ASYNC210)
192-192: Trailing comma missing
Add trailing comma
(COM812)
196-196: Logging statement uses f-string
(G004)
203-203: Logging statement uses f-string
(G004)
204-204: Async functions should not call blocking HTTP methods
(ASYNC210)
207-207: Trailing comma missing
Add trailing comma
(COM812)
214-214: Logging statement uses f-string
(G004)
219-219: Unnecessary
elifafterbreakstatementRemove unnecessary
elif(RET508)
221-221: Logging statement uses f-string
(G004)
229-229: Logging statement uses f-string
(G004)
234-234: Logging
.exception(...)should be used instead of.error(..., exc_info=True)(G201)
234-234: Logging statement uses f-string
(G004)
234-234: Use explicit conversion flag
Replace with conversion flag
(RUF010)
239-239: Logging
.exception(...)should be used instead of.error(..., exc_info=True)(G201)
239-239: Logging statement uses f-string
(G004)
239-239: Use explicit conversion flag
Replace with conversion flag
(RUF010)
♻️ Duplicate comments (5)
.roomodes (1)
5-5: Remove trailing space in mode name.The mode name has a trailing space, potentially causing future references to break or mismatch. This issue was raised previously.
- "name": "boomerang ", + "name": "boomerang",docs/features/crawl_status_monitoring_plan.md (1)
49-49: 🧹 Nitpick (assertive)Minor grammar fix needed.
After the number '404', it's better to use a plural noun.
- * [x] If not found, return 404 error. + * [x] If not found, return 404 errors.Alternatively:
- * [x] If not found, return 404 error. + * [x] If not found, return a 404 error response.🧰 Tools
🪛 LanguageTool
[grammar] ~49-~49: After the number ‘404’, use a plural noun. Did you mean “errors”?
Context: ...`. * [x] If not found, return 404 error. ### Step 4.2: Backend - Update Status...(CD_NNU)
components/CrawlStatusMonitor.tsx (1)
74-76: Conditionally remove or guard console logging.You have a console log statement at line 75 referencing "CrawlStatusMonitor rendering...". For production builds, consider wrapping logging in a development-only check or removing it entirely to avoid leaking internal details.
backend/app/status_manager.py (1)
34-38: Be mindful of in-memory concurrency.
crawl_jobsis maintained in-memory and is not necessarily thread-safe. For high-traffic or multi-worker setups, consider using thread locks or an external persistence layer to avoid race conditions.🧰 Tools
🪛 Ruff (0.8.2)
38-38: Use
dictinstead ofDictfor type annotationReplace with
dict(UP006)
app/page.tsx (1)
67-69: 🧹 Nitpick (assertive)Avoid console logs in production code.
Using
console.logfor debugging is common, but it can clutter the console in production. Consider removing these logs or using a logging utility with configurable levels.-console.log('Initiating discovery for:', submittedUrl, 'with depth:', depth) -// ... -console.log('Discovery initiated. Job ID:', jobId) +// Remove or replace these logs with a more robust logging approach
📜 Review details
Configuration used: CodeRabbit UI
Review profile: ASSERTIVE
Plan: Pro
⛔ Files ignored due to path filters (1)
package-lock.jsonis excluded by!**/package-lock.json
📒 Files selected for processing (26)
.roo/rules-boomerang/rules.md(1 hunks).roomodes(1 hunks)app/page.tsx(4 hunks)backend/app/crawler.py(16 hunks)backend/app/status_manager.py(1 hunks)components/CrawlStatusMonitor.tsx(1 hunks)components/CrawlUrls.tsx(1 hunks)components/ui/checkbox.tsx(1 hunks)components/ui/dialog.tsx(1 hunks)components/ui/table.tsx(1 hunks)components/ui/tooltip.tsx(1 hunks)docs/features/adjust_info_button_style_plan.md(1 hunks)docs/features/crawl_status_monitoring_plan.md(1 hunks)docs/features/create_crawl_urls_component_plan.md(1 hunks)docs/features/fix_checkbox_disable_logic.md(1 hunks)docs/features/fix_crawl_button_and_checkbox_style.md(1 hunks)docs/features/fix_crawl_button_and_checkbox_style_v2.md(1 hunks)docs/features/fix_crawl_button_and_checkbox_style_v3.md(1 hunks)docs/features/fix_crawl_selected_button_enablement.md(1 hunks)docs/features/fix_discovered_pages_ui_bugs.md(1 hunks)docs/features/fix_url_status_normalization.md(1 hunks)docs/features/frontend_crawl_status_polling_plan.md(1 hunks)docs/features/permanently_enable_crawl_selected_button.md(1 hunks)docs/features/ui_refinements_consolidated_files.md(1 hunks)lib/types.ts(2 hunks)package.json(1 hunks)
🧰 Additional context used
🧬 Code Graph Analysis (7)
components/ui/checkbox.tsx (1)
lib/utils.ts (1)
cn(4-6)
components/ui/tooltip.tsx (1)
lib/utils.ts (1)
cn(4-6)
components/ui/table.tsx (1)
lib/utils.ts (1)
cn(4-6)
components/ui/dialog.tsx (1)
lib/utils.ts (1)
cn(4-6)
components/CrawlUrls.tsx (1)
lib/types.ts (2)
UrlStatus(53-60)CrawlUrlsProps(96-103)
lib/types.ts (3)
backend/app/status_manager.py (1)
CrawlJobStatus(23-32)backend/app/main.py (1)
CrawlRequest(62-64)backend/app/crawler.py (1)
DiscoveredPage(42-46)
backend/app/crawler.py (2)
backend/app/status_manager.py (2)
update_overall_status(71-110)update_url_status(112-146)backend/app/utils.py (1)
normalize_url(3-9)
🪛 LanguageTool
.roo/rules-boomerang/rules.md
[uncategorized] ~3-~3: Use a comma before ‘and’ if it connects two independent clauses (unless they are closely connected and short).
Context: ...u deligate to either Code or Debug modes and after you have created a subtask. Consi...
(COMMA_COMPOUND_SENTENCE)
[uncategorized] ~3-~3: Possible missing comma found.
Context: ...the most accurate and right path moving forward and then present those findings to the ...
(AI_HYDRA_LEO_MISSING_COMMA)
docs/features/fix_crawl_button_and_checkbox_style_v2.md
[uncategorized] ~3-~3: Possible missing comma found.
Context: ... Selected" button when pending URLs are checked and fix the visual styling of enabled c...
(AI_HYDRA_LEO_MISSING_COMMA)
docs/features/fix_crawl_selected_button_enablement.md
[uncategorized] ~13-~13: This verb may not be in the correct form. Consider using a different form for this context.
Context: ...Applied useMemo with Array.some and correct dependencies). 4. [ ] Report: Use ...
(AI_EN_LECTOR_REPLACEMENT_VERB_FORM)
docs/features/fix_crawl_button_and_checkbox_style.md
[uncategorized] ~15-~15: You might be missing the article “the” here.
Context: ...** Mark this task list as complete once user confirms the fixes are working. ## Aff...
(AI_EN_LECTOR_MISSING_DETERMINER_THE)
docs/features/create_crawl_urls_component_plan.md
[uncategorized] ~18-~18: Possible missing preposition found.
Context: .../page.tsx:** Examine the current state management and where CrawlStatusMonitor` is used....
(AI_HYDRA_LEO_MISSING_OF)
[grammar] ~31-~31: The verb ‘refactor’ is plural. Did you mean: “refactors”? Did you use a verb instead of a noun?
Context: ...ved from CrawlStatusMonitor.tsx. This refactor resolved the UI state issues for checkb...
(PLURAL_VERB_AFTER_THIS)
docs/features/crawl_status_monitoring_plan.md
[grammar] ~49-~49: After the number ‘404’, use a plural noun. Did you mean “errors”?
Context: ...`. * [x] If not found, return 404 error. ### Step 4.2: Backend - Update Status...
(CD_NNU)
docs/features/ui_refinements_consolidated_files.md
[uncategorized] ~17-~17: Possible missing preposition found.
Context: ...ge the color of Pending Crawl to Yellow black text and Crawl completed to Green with ...
(AI_HYDRA_LEO_MISSING_TO)
docs/features/frontend_crawl_status_polling_plan.md
[duplication] ~83-~83: Possible typo: you repeated a word.
Context: ...completed', 'completed_with_errors', or 'error'` - Or after a timeout (e.g., 5 minutes) -...
(ENGLISH_WORD_REPEAT_RULE)
[uncategorized] ~148-~148: You might be missing the article “a” here.
Context: ...ess. - Implement simple polling for minimal, robust fix. --- ## Next - Confi...
(AI_EN_LECTOR_MISSING_DETERMINER_A)
🪛 markdownlint-cli2 (0.17.2)
docs/features/fix_checkbox_disable_logic.md
14-14: Files should end with a single newline character
null
(MD047, single-trailing-newline)
docs/features/permanently_enable_crawl_selected_button.md
12-12: Files should end with a single newline character
null
(MD047, single-trailing-newline)
docs/features/fix_crawl_button_and_checkbox_style_v2.md
15-15: Files should end with a single newline character
null
(MD047, single-trailing-newline)
docs/features/fix_crawl_button_and_checkbox_style_v3.md
18-18: Files should end with a single newline character
null
(MD047, single-trailing-newline)
docs/features/fix_crawl_button_and_checkbox_style.md
3-3: Headings should be surrounded by blank lines
Expected: 1; Actual: 0; Below
(MD022, blanks-around-headings)
5-5: Lists should be surrounded by blank lines
null
(MD032, blanks-around-lists)
17-17: Headings should be surrounded by blank lines
Expected: 1; Actual: 0; Below
(MD022, blanks-around-headings)
18-18: Lists should be surrounded by blank lines
null
(MD032, blanks-around-lists)
18-18: Files should end with a single newline character
null
(MD047, single-trailing-newline)
docs/features/create_crawl_urls_component_plan.md
31-31: Files should end with a single newline character
null
(MD047, single-trailing-newline)
docs/features/frontend_crawl_status_polling_plan.md
155-155: Files should end with a single newline character
null
(MD047, single-trailing-newline)
🪛 Biome (1.9.4)
components/CrawlUrls.tsx
[error] 40-40: Useless case clause.
because the default clause is present:
Unsafe fix: Remove the useless case.
(lint/complexity/noUselessSwitchCase)
[error] 41-41: Useless case clause.
because the default clause is present:
Unsafe fix: Remove the useless case.
(lint/complexity/noUselessSwitchCase)
[error] 208-208: JSX elements without children should be marked as self-closing. In JSX, it is valid for any element to be self-closing.
Unsafe fix: Use a SelfClosingElement instead
(lint/style/useSelfClosingElements)
🪛 Ruff (0.8.2)
backend/app/crawler.py
1-1: typing.List is deprecated, use list instead
(UP035)
1-1: typing.Set is deprecated, use set instead
(UP035)
1-1: typing.Dict is deprecated, use dict instead
(UP035)
1-1: typing.Dict imported but unused
Remove unused import: typing.Dict
(F401)
141-141: PEP 484 prohibits implicit Optional
Convert to Optional[T]
(RUF013)
142-142: Trailing comma missing
Add trailing comma
(COM812)
143-143: Use list instead of List for type annotation
Replace with list
(UP006)
157-157: Logging statement uses f-string
(G004)
181-181: Logging statement uses f-string
(G004)
188-188: Async functions should not call blocking HTTP methods
(ASYNC210)
192-192: Trailing comma missing
Add trailing comma
(COM812)
196-196: Logging statement uses f-string
(G004)
203-203: Logging statement uses f-string
(G004)
204-204: Async functions should not call blocking HTTP methods
(ASYNC210)
207-207: Trailing comma missing
Add trailing comma
(COM812)
214-214: Logging statement uses f-string
(G004)
219-219: Unnecessary elif after break statement
Remove unnecessary elif
(RET508)
221-221: Logging statement uses f-string
(G004)
229-229: Logging statement uses f-string
(G004)
234-234: Logging .exception(...) should be used instead of .error(..., exc_info=True)
(G201)
234-234: Logging statement uses f-string
(G004)
234-234: Use explicit conversion flag
Replace with conversion flag
(RUF010)
239-239: Logging .exception(...) should be used instead of .error(..., exc_info=True)
(G201)
239-239: Logging statement uses f-string
(G004)
239-239: Use explicit conversion flag
Replace with conversion flag
(RUF010)
247-247: Logging statement uses f-string
(G004)
319-319: Trailing comma missing
Add trailing comma
(COM812)
327-327: Logging statement uses f-string
(G004)
329-329: Consider moving this statement to an else block
(TRY300)
332-332: Use explicit conversion flag
Replace with conversion flag
(RUF010)
333-333: Logging .exception(...) should be used instead of .error(..., exc_info=True)
(G201)
343-343: crawl_pages is too complex (45 > 10)
(C901)
343-343: Use list instead of List for type annotation
Replace with list
(UP006)
343-343: PEP 484 prohibits implicit Optional
Convert to Optional[T]
(RUF013)
378-378: Logging statement uses f-string
(G004)
394-394: Logging statement uses f-string
(G004)
412-412: Logging statement uses f-string
(G004)
433-433: Logging statement uses f-string
(G004)
475-475: f-string without any placeholders
Remove extraneous f prefix
(F541)
486-486: Async functions should not open files with blocking methods like open
(ASYNC230)
490-490: Logging statement uses f-string
(G004)
491-491: Replace aliased errors with OSError
Replace IOError with builtin OSError
(UP024)
492-492: Logging .exception(...) should be used instead of .error(..., exc_info=True)
(G201)
492-492: Logging statement uses f-string
(G004)
495-495: Logging .exception(...) should be used instead of .error(..., exc_info=True)
(G201)
495-495: Logging statement uses f-string
(G004)
504-504: Unnecessary open mode parameters
Remove open mode parameters
(UP015)
504-504: Async functions should not open files with blocking methods like open
(ASYNC230)
507-507: Logging .exception(...) should be used instead of .error(..., exc_info=True)
(G201)
507-507: Logging statement uses f-string
(G004)
509-509: Replace aliased errors with OSError
Replace IOError with builtin OSError
(UP024)
510-510: Logging .exception(...) should be used instead of .error(..., exc_info=True)
(G201)
510-510: Logging statement uses f-string
(G004)
513-513: Logging .exception(...) should be used instead of .error(..., exc_info=True)
(G201)
513-513: Logging statement uses f-string
(G004)
539-539: Async functions should not open files with blocking methods like open
(ASYNC230)
541-541: Logging statement uses f-string
(G004)
542-542: Replace aliased errors with OSError
Replace IOError with builtin OSError
(UP024)
543-543: Logging .exception(...) should be used instead of .error(..., exc_info=True)
(G201)
543-543: Logging statement uses f-string
(G004)
545-545: Logging .exception(...) should be used instead of .error(..., exc_info=True)
(G201)
545-545: Logging statement uses f-string
(G004)
546-546: Logging statement uses f-string
(G004)
547-547: Do not catch blind exception: Exception
(BLE001)
548-548: Use logging.exception instead of logging.error
Replace with exception
(TRY400)
548-548: Logging statement uses f-string
(G004)
548-548: Use explicit conversion flag
Replace with conversion flag
(RUF010)
551-551: Unnecessary elif after break statement
Remove unnecessary elif
(RET508)
553-553: Logging statement uses f-string
(G004)
558-558: Logging statement uses f-string
(G004)
559-559: Do not catch blind exception: Exception
(BLE001)
560-560: Use logging.exception instead of logging.error
Replace with exception
(TRY400)
560-560: Logging statement uses f-string
(G004)
560-560: Use explicit conversion flag
Replace with conversion flag
(RUF010)
566-566: Logging statement uses f-string
(G004)
606-606: Logging statement uses f-string
(G004)
614-614: Logging statement uses f-string
(G004)
620-620: Logging statement uses f-string
(G004)
627-627: Use explicit conversion flag
Replace with conversion flag
(RUF010)
628-628: Logging .exception(...) should be used instead of .error(..., exc_info=True)
(G201)
649-649: Logging statement uses f-string
(G004)
656-656: Trailing comma missing
Add trailing comma
(COM812)
660-660: Use explicit conversion flag
Replace with conversion flag
(RUF010)
661-661: Logging .exception(...) should be used instead of .error(..., exc_info=True)
(G201)
backend/app/status_manager.py
3-3: typing.Dict is deprecated, use dict instead
(UP035)
18-18: Logging .exception(...) should be used instead of .error(..., exc_info=True)
(G201)
18-18: Logging statement uses f-string
(G004)
38-38: Use dict instead of Dict for type annotation
Replace with dict
(UP006)
40-40: Missing return type annotation for public function initialize_job
Add return type annotation: None
(ANN201)
44-44: Logging statement uses f-string
(G004)
54-54: datetime.datetime.now() called without a tz argument
(DTZ005)
55-55: Trailing comma missing
Add trailing comma
(COM812)
65-65: Logging statement uses f-string
(G004)
67-67: Avoid specifying long messages outside the exception class
(TRY003)
69-69: Logging statement uses f-string
(G004)
71-71: Missing return type annotation for public function update_overall_status
Add return type annotation: None
(ANN201)
83-83: Logging .exception(...) should be used instead of .error(..., exc_info=True)
(G201)
83-83: Logging statement uses f-string
(G004)
90-90: datetime.datetime.now() called without a tz argument
(DTZ005)
95-95: Logging statement uses f-string
(G004)
106-106: Logging statement uses f-string
(G004)
108-108: Logging .exception(...) should be used instead of .error(..., exc_info=True)
(G201)
108-108: Logging statement uses f-string
(G004)
110-110: Logging statement uses f-string
(G004)
112-112: Missing return type annotation for public function update_url_status
Add return type annotation: None
(ANN201)
124-124: Logging .exception(...) should be used instead of .error(..., exc_info=True)
(G201)
124-124: Logging statement uses f-string
(G004)
142-142: Logging statement uses f-string
(G004)
144-144: Logging .exception(...) should be used instead of .error(..., exc_info=True)
(G201)
144-144: Logging statement uses f-string
(G004)
146-146: Logging statement uses f-string
(G004)
148-148: Missing return type annotation for public function add_pending_crawl_urls
Add return type annotation: None
(ANN201)
160-160: Logging .exception(...) should be used instead of .error(..., exc_info=True)
(G201)
160-160: Logging statement uses f-string
(G004)
184-184: Logging statement uses f-string
(G004)
186-186: Logging .exception(...) should be used instead of .error(..., exc_info=True)
(G201)
186-186: Logging statement uses f-string
(G004)
188-188: Logging statement uses f-string
(G004)
201-201: Logging .exception(...) should be used instead of .error(..., exc_info=True)
(G201)
201-201: Logging statement uses f-string
(G004)
🔇 Additional comments (22)
docs/features/crawl_status_monitoring_plan.md (2)
1-146: Clear and comprehensive feature plan for implementing real-time crawl status monitoring.The document provides a well-structured plan for implementing the crawl status monitoring feature using a backend in-memory state with frontend polling. The approach described is pragmatic, following KISS principles while still providing valuable user feedback during the crawling process.
🧰 Tools
🪛 LanguageTool
[grammar] ~49-~49: After the number ‘404’, use a plural noun. Did you mean “errors”?
Context: ...`. * [x] If not found, return 404 error. ### Step 4.2: Backend - Update Status...(CD_NNU)
144-146: Missing step linking the status monitoring with consolidated files.While the implementation steps for the status monitoring feature are comprehensive, there's no explicit mention of how this interacts with the consolidated files section mentioned in the PR objectives, where data should be appended during subsequent crawls.
Is there a separate plan document that addresses how the crawl status monitoring integrates with the consolidated files functionality?
lib/types.ts (4)
10-10: Status type expansion aligns with backend changes.Adding 'pending_crawl' status properly aligns the frontend type definition with the backend implementation and supports the status tracking feature.
41-72: Well-structured type definitions for crawl status monitoring.The new type definitions clearly match the backend implementation described in the feature plan. The
OverallStatusandUrlStatustypes properly enumerate all possible states, while theCrawlJobStatusinterface accurately reflects the backend model structure.
75-93: API interface definitions maintain type safety.The new request and response interfaces properly type the communication between frontend and backend for the crawl status monitoring feature. The commented-out fields provide good documentation of what the backend returns, even if they're not used in the frontend currently.
96-103: Component props interface enables proper type checking.The
CrawlUrlsPropsinterface provides appropriate typing for the new component's props, ensuring type safety when passing data between components.docs/features/ui_refinements_consolidated_files.md (2)
1-28: Comprehensive plan for UI refinements and file consolidation.This task list provides a clear roadmap for improving the UI layout, implementing popup components, and ensuring file display consistency. The step-by-step approach with verification steps helps ensure a thorough implementation.
🧰 Tools
🪛 LanguageTool
[uncategorized] ~17-~17: Possible missing preposition found.
Context: ...ge the color of Pending Crawl to Yellow black text and Crawl completed to Green with ...(AI_HYDRA_LEO_MISSING_TO)
19-21: Integration with ConsolidatedFiles needs clearer API specification.The task for integrating with
ConsolidatedFilesmentions a possible API endpoint (/api/all-files?) but seems uncertain. Consider documenting the actual API endpoint to be used for fetching consolidated files to ensure clarity during implementation.What is the correct API endpoint for fetching consolidated files? Is it
/api/all-files,/api/files, or something else? This should be clearly specified to avoid implementation confusion.package.json (2)
16-17: Added Radix UI Checkbox and Dialog components.These dependencies support the UI refinements planned in the feature document, specifically for implementing checkboxes in the
CrawlUrlscomponent and dialog/popup functionality.
21-21: Added Radix UI Tooltip component.The Tooltip component will help improve UI usability by providing contextual information to users.
docs/features/permanently_enable_crawl_selected_button.md (1)
1-12: Clear Documentation on Permanently Enabling the Button.
The steps to locate and remove the conditional (canCrawl &&) rendering logic are clearly specified, which will help ensure that the "Crawl Selected" button is always visible.🧰 Tools
🪛 markdownlint-cli2 (0.17.2)
12-12: Files should end with a single newline character
null(MD047, single-trailing-newline)
docs/features/fix_checkbox_disable_logic.md (1)
1-14: Comprehensive Task List for Checkbox Disabling Logic Fix.
The file clearly describes steps to analyze and fix the checkbox disabling logic in theCrawlStatusMonitorcomponent, ensuring that only checkboxes for URLs with a 'completed' status are disabled.🧰 Tools
🪛 markdownlint-cli2 (0.17.2)
14-14: Files should end with a single newline character
null(MD047, single-trailing-newline)
docs/features/fix_url_status_normalization.md (1)
1-11: Clear Documentation on URL Status Normalization Fix.
This document effectively outlines the root cause and tasks for ensuring consistent URL normalization before updating the URL status. The clarity in the task list should help in verifying that UI statuses will accurately reflect the crawl results.components/ui/tooltip.tsx (1)
1-30: Well-structured Radix UI tooltip implementation!This tooltip component implementation follows Radix UI best practices and includes proper animation states. Good job on using the
cnutility function for class name management and setting up appropriate defaults like thesideOffset.components/ui/checkbox.tsx (1)
1-30: Clean Checkbox component implementationThe component correctly implements Radix UI's checkbox primitive with appropriate styling and accessibility features. The use of React.forwardRef ensures proper ref forwarding, and the Check icon from lucide-react provides a clear visual indicator.
docs/features/adjust_info_button_style_plan.md (1)
1-23: Well-documented feature with clear completion statusThe feature documentation clearly outlines the tasks, rationale, and resolution for adjusting the info button style. This provides good context for the UI enhancement based on user feedback.
components/ui/table.tsx (1)
1-117: Well-structured table component implementation following React best practices.This file introduces a comprehensive set of table components using React's forwardRef pattern. The components are properly typed with TypeScript, making them type-safe while allowing flexibility through className props and spreading of additional HTML attributes.
The code follows a consistent pattern across all table components and uses the
cnutility function from@/lib/utilsto handle className composition elegantly.docs/features/fix_discovered_pages_ui_bugs.md (1)
1-42: Comprehensive documentation of the bug fixing process.This document provides excellent visibility into the process of identifying, diagnosing, and resolving UI bugs. The methodical approach—combining frontend logging, backend investigation, and iterative user testing—demonstrates good troubleshooting practices.
The root cause identification of concurrency issues with the global
crawl_jobsdictionary and the decision to refactor the frontend with a newCrawlUrlscomponent aligns well with the PR objectives to enhance the crawling workflow.components/CrawlUrls.tsx (3)
71-144: Well-implemented component with good performance optimizations.The component effectively manages URL selection state, including selective update handlers and memoized values. The detailed console logging is helpful for debugging but should be considered for conditional enablement in production.
The organization of state management, handlers, and derived state calculation follows React best practices, particularly the careful use of
useMemoanduseCallbackfor performance optimization.
150-180: Button enablement logic aligns with feature requirements.The implementation of the "Crawl Selected" button addresses the requirements specified in
fix_crawl_button_and_checkbox_style_v3.md. The button is correctly enabled only when pending URLs are selected and crawling is not in progress, providing clear feedback through its disabled state and label.
197-205: Checkbox styling addresses the feature requirements.The implementation adds a white border to the checkboxes for better visibility against the dark background, addressing the requirement in
fix_crawl_button_and_checkbox_style_v3.mdfor checkboxes to appear interactive rather than grayed out.components/ui/dialog.tsx (1)
1-123: All good here!The implementation follows best practices for Radix Dialog, with sensible styling, accessibility considerations (sr-only for Close button), and forwardRef usage. Great work!
| @@ -0,0 +1,3 @@ | |||
| Special Rules for Critiquing plans and strategies: | |||
|
|
|||
| Once you have created a subtask markdown file or in memory, you are to get a second opinion from the mode called "Expert Opinion" This mode is designed to only accept your subtask plans and strategies before you present it to the user to approve. You are to do this before you deligate to either Code or Debug modes and after you have created a subtask. Consider this mode as your personal brainstorm. Argue with it from ground truth about the codebase, both you and the Expert Opinion mode as complete knowledge of the codebase. I want you to counter the points and come to a common understanding for the most accurate and right path moving forward and then present those findings to the user. No newline at end of file | |||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🧹 Nitpick (assertive)
Fix spelling and add missing commas.
There is a spelling mistake (“deligate”) and sentences that would benefit from commas to separate independent clauses, clarifying the text flow.
-Once you have created a subtask markdown file or in memory, you are to get a second opinion from the mode called "Expert Opinion" This mode is designed to only accept your subtask plans and strategies before you present it to the user to approve. You are to do this before you deligate to either Code or Debug modes and after you have created a subtask. Consider this mode as your personal brainstorm. Argue with it from ground truth about the codebase, both you and the Expert Opinion mode as complete knowledge of the codebase. I want you to counter the points and come to a common understanding for the most accurate and right path moving forward and then present those findings to the user.
+Once you have created a subtask markdown file or in memory, you are to get a second opinion from the mode called "Expert Opinion." This mode is designed to only accept your subtask plans and strategies before you present them to the user for approval. You are to do this before you delegate to either Code or Debug modes, and after you have created a subtask. Consider this mode as your personal brainstorm. Argue with it from ground truth about the codebase; both you and the Expert Opinion mode have complete knowledge of the codebase. I want you to counter the points and come to a common understanding for the most accurate and right path moving forward, and then present those findings to the user.📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| Once you have created a subtask markdown file or in memory, you are to get a second opinion from the mode called "Expert Opinion" This mode is designed to only accept your subtask plans and strategies before you present it to the user to approve. You are to do this before you deligate to either Code or Debug modes and after you have created a subtask. Consider this mode as your personal brainstorm. Argue with it from ground truth about the codebase, both you and the Expert Opinion mode as complete knowledge of the codebase. I want you to counter the points and come to a common understanding for the most accurate and right path moving forward and then present those findings to the user. | |
| Once you have created a subtask markdown file or in memory, you are to get a second opinion from the mode called "Expert Opinion." This mode is designed to only accept your subtask plans and strategies before you present them to the user for approval. You are to do this before you delegate to either Code or Debug modes, and after you have created a subtask. Consider this mode as your personal brainstorm. Argue with it from ground truth about the codebase; both you and the Expert Opinion mode have complete knowledge of the codebase. I want you to counter the points and come to a common understanding for the most accurate and right path moving forward, and then present those findings to the user. |
🧰 Tools
🪛 LanguageTool
[uncategorized] ~3-~3: Use a comma before ‘and’ if it connects two independent clauses (unless they are closely connected and short).
Context: ...u deligate to either Code or Debug modes and after you have created a subtask. Consi...
(COMMA_COMPOUND_SENTENCE)
[uncategorized] ~3-~3: Possible missing comma found.
Context: ...the most accurate and right path moving forward and then present those findings to the ...
(AI_HYDRA_LEO_MISSING_COMMA)
| { | ||
| "slug": "boomerang", | ||
| "name": "boomerang ", | ||
| "roleDefinition": "You are Roo, a strategic workflow orchestrator who coordinates complex tasks by delegating them to appropriate specialized modes. You have a comprehensive understanding of each mode's capabilities and limitations, allowing you to effectively break down complex problems into discrete tasks that can be solved by different specialists.\n\nProcedure to follow is to ask the coder for an implementation plan without writing any code yet, forward the implementation plan to the Expert Opinion mode for review, and based the feedback from Expert Opinion mode give the coder the go-ahead to create a updated task list and then ask for user to approve the task list which has the feedbac of Expert Opinion mode and previous plans with pros and cons for each. Your prompts for those roles will have to ensure they know the procedure too, so they’re all singing from the same hymn-sheet.", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🧹 Nitpick (assertive)
Add “on” for grammatical clarity.
The phrase “based the feedback” is missing “on.” This minor fix ensures the instructions flow clearly.
-...and based the feedback from Expert Opinion mode give the coder the go-ahead to create...
+...and based on the feedback from Expert Opinion mode, give the coder the go-ahead to create...📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| "roleDefinition": "You are Roo, a strategic workflow orchestrator who coordinates complex tasks by delegating them to appropriate specialized modes. You have a comprehensive understanding of each mode's capabilities and limitations, allowing you to effectively break down complex problems into discrete tasks that can be solved by different specialists.\n\nProcedure to follow is to ask the coder for an implementation plan without writing any code yet, forward the implementation plan to the Expert Opinion mode for review, and based the feedback from Expert Opinion mode give the coder the go-ahead to create a updated task list and then ask for user to approve the task list which has the feedbac of Expert Opinion mode and previous plans with pros and cons for each. Your prompts for those roles will have to ensure they know the procedure too, so they’re all singing from the same hymn-sheet.", | |
| "roleDefinition": "You are Roo, a strategic workflow orchestrator who coordinates complex tasks by delegating them to appropriate specialized modes. You have a comprehensive understanding of each mode's capabilities and limitations, allowing you to effectively break down complex problems into discrete tasks that can be solved by different specialists.\n\nProcedure to follow is to ask the coder for an implementation plan without writing any code yet, forward the implementation plan to the Expert Opinion mode for review, and based on the feedback from Expert Opinion mode, give the coder the go-ahead to create an updated task list and then ask for user to approve the task list which has the feedbac of Expert Opinion mode and previous plans with pros and cons for each. Your prompts for those roles will have to ensure they know the procedure too, so they’re all singing from the same hymn-sheet.", |
| "slug": "boomerang", | ||
| "name": "boomerang ", | ||
| "roleDefinition": "You are Roo, a strategic workflow orchestrator who coordinates complex tasks by delegating them to appropriate specialized modes. You have a comprehensive understanding of each mode's capabilities and limitations, allowing you to effectively break down complex problems into discrete tasks that can be solved by different specialists.\n\nProcedure to follow is to ask the coder for an implementation plan without writing any code yet, forward the implementation plan to the Expert Opinion mode for review, and based the feedback from Expert Opinion mode give the coder the go-ahead to create a updated task list and then ask for user to approve the task list which has the feedbac of Expert Opinion mode and previous plans with pros and cons for each. Your prompts for those roles will have to ensure they know the procedure too, so they’re all singing from the same hymn-sheet.", | ||
| "customInstructions": "Your role is to coordinate complex workflows by delegating tasks to specialized modes. As an orchestrator, you should:\n\n1. When given a complex task, break it down into logical subtasks that can be delegated to appropriate specialized modes.\n\n2. For each subtask, use the `new_task` tool to delegate. Choose the most appropriate mode for the subtask's specific goal and provide comprehensive instructions in the `message` parameter. These instructions must include:\n * All necessary context from the parent task or previous subtasks required to complete the work.\n * A clearly defined scope, specifying exactly what the subtask should accomplish.\n * An explicit statement that the subtask should *only* perform the work outlined in these instructions and not deviate.\n * An instruction for the subtask to signal completion by using the `attempt_completion` tool, providing a concise yet thorough summary of the outcome in the `result` parameter, keeping in mind that this summary will be the source of truth used to keep track of what was completed on this project.\n * A statement that these specific instructions supersede any conflicting general instructions the subtask's mode might have.\n * Once you have the plan created by the coder forward the implementation plan to the Expert Opinion mode for review, and based on the result ask for improvements or give the coder the go-ahead. Your prompts for those roles will have to ensure they know the procedure too, so they’re all singing from the same hymn-sheet.\n\n3. Track and manage the progress of all subtasks in a markdown file in the codebase. If its a bug then start the heading with BUG: if its a feature then write FEATURE:. When a subtask is completed, analyze its results from the user and determine the next steps and then go back to complete the markdown file subtask. \n\n4. Help the user understand how the different subtasks fit together in the overall workflow. Provide clear reasoning about why you're delegating specific tasks to specific modes.\n\n5. When all subtasks are completed, synthesize the results and provide a comprehensive overview of what was accomplished.\n\n6. Always Ask clarifying questions when necessary to better understand how to break down complex tasks effectively in as much detail as possible. \n\n7. Suggest improvements to the workflow based on the results of completed subtasks.\n\nUse subtasks to maintain clarity. If a request significantly shifts focus or requires a different expertise (mode), consider creating a subtask rather than overloading the current one.", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🧹 Nitpick (assertive)
Use correct contraction for “it’s.”
When discussing a bug or feature heading, write “If it’s a bug” instead of “If its a bug.”
-If its a bug then start the heading with BUG:
+If it’s a bug, then start the heading with BUG:📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| "customInstructions": "Your role is to coordinate complex workflows by delegating tasks to specialized modes. As an orchestrator, you should:\n\n1. When given a complex task, break it down into logical subtasks that can be delegated to appropriate specialized modes.\n\n2. For each subtask, use the `new_task` tool to delegate. Choose the most appropriate mode for the subtask's specific goal and provide comprehensive instructions in the `message` parameter. These instructions must include:\n * All necessary context from the parent task or previous subtasks required to complete the work.\n * A clearly defined scope, specifying exactly what the subtask should accomplish.\n * An explicit statement that the subtask should *only* perform the work outlined in these instructions and not deviate.\n * An instruction for the subtask to signal completion by using the `attempt_completion` tool, providing a concise yet thorough summary of the outcome in the `result` parameter, keeping in mind that this summary will be the source of truth used to keep track of what was completed on this project.\n * A statement that these specific instructions supersede any conflicting general instructions the subtask's mode might have.\n * Once you have the plan created by the coder forward the implementation plan to the Expert Opinion mode for review, and based on the result ask for improvements or give the coder the go-ahead. Your prompts for those roles will have to ensure they know the procedure too, so they’re all singing from the same hymn-sheet.\n\n3. Track and manage the progress of all subtasks in a markdown file in the codebase. If its a bug then start the heading with BUG: if its a feature then write FEATURE:. When a subtask is completed, analyze its results from the user and determine the next steps and then go back to complete the markdown file subtask. \n\n4. Help the user understand how the different subtasks fit together in the overall workflow. Provide clear reasoning about why you're delegating specific tasks to specific modes.\n\n5. When all subtasks are completed, synthesize the results and provide a comprehensive overview of what was accomplished.\n\n6. Always Ask clarifying questions when necessary to better understand how to break down complex tasks effectively in as much detail as possible. \n\n7. Suggest improvements to the workflow based on the results of completed subtasks.\n\nUse subtasks to maintain clarity. If a request significantly shifts focus or requires a different expertise (mode), consider creating a subtask rather than overloading the current one.", | |
| "customInstructions": "Your role is to coordinate complex workflows by delegating tasks to specialized modes. As an orchestrator, you should:\n\n1. When given a complex task, break it down into logical subtasks that can be delegated to appropriate specialized modes.\n\n2. For each subtask, use the `new_task` tool to delegate. Choose the most appropriate mode for the subtask's specific goal and provide comprehensive instructions in the `message` parameter. These instructions must include:\n * All necessary context from the parent task or previous subtasks required to complete the work.\n * A clearly defined scope, specifying exactly what the subtask should accomplish.\n * An explicit statement that the subtask should *only* perform the work outlined in these instructions and not deviate.\n * An instruction for the subtask to signal completion by using the `attempt_completion` tool, providing a concise yet thorough summary of the outcome in the `result` parameter, keeping in mind that this summary will be the source of truth used to keep track of what was completed on this project.\n * A statement that these specific instructions supersede any conflicting general instructions the subtask's mode might have.\n * Once you have the plan created by the coder forward the implementation plan to the Expert Opinion mode for review, and based on the result ask for improvements or give the coder the go-ahead. Your prompts for those roles will have to ensure they know the procedure too, so they’re all singing from the same hymn-sheet.\n\n3. Track and manage the progress of all subtasks in a markdown file in the codebase. If it’s a bug, then start the heading with BUG: if its a feature then write FEATURE:. When a subtask is completed, analyze its results from the user and determine the next steps and then go back to complete the markdown file subtask. \n\n4. Help the user understand how the different subtasks fit together in the overall workflow. Provide clear reasoning about why you're delegating specific tasks to specific modes.\n\n5. When all subtasks are completed, synthesize the results and provide a comprehensive overview of what was accomplished.\n\n6. Always Ask clarifying questions when necessary to better understand how to break down complex tasks effectively in as much detail as possible. \n\n7. Suggest improvements to the workflow based on the results of completed subtasks.\n\nUse subtasks to maintain clarity. If a request significantly shifts focus or requires a different expertise (mode), consider creating a subtask rather than overloading the current one.", |
| 8. [x] **Verification:** Confirm the new component displays data correctly, selection works, the button triggers the action, and statuses update based on `jobStatus` polling. | ||
| 9. [x] **Seal Task:** Mark all tasks as complete and seal the feature upon user confirmation. | ||
|
|
||
| **Resolution:** The `CrawlUrls` component was successfully created in `components/CrawlUrls.tsx` and integrated into `app/page.tsx`. The URL list rendering logic was removed from `CrawlStatusMonitor.tsx`. This refactor resolved the UI state issues for checkboxes and status icons. No newline at end of file |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🧹 Nitpick (assertive)
Add a trailing newline.
A single trailing newline at the end of the file may be required by certain markdown or CI checks.
-**Resolution:** The `CrawlUrls` component...
\ No newline at end of file
+**Resolution:** The `CrawlUrls` component...
+📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| **Resolution:** The `CrawlUrls` component was successfully created in `components/CrawlUrls.tsx` and integrated into `app/page.tsx`. The URL list rendering logic was removed from `CrawlStatusMonitor.tsx`. This refactor resolved the UI state issues for checkboxes and status icons. | |
| **Resolution:** The `CrawlUrls` component was successfully created in `components/CrawlUrls.tsx` and integrated into `app/page.tsx`. The URL list rendering logic was removed from `CrawlStatusMonitor.tsx`. This refactor resolved the UI state issues for checkboxes and status icons. |
🧰 Tools
🪛 LanguageTool
[grammar] ~31-~31: The verb ‘refactor’ is plural. Did you mean: “refactors”? Did you use a verb instead of a noun?
Context: ...ved from CrawlStatusMonitor.tsx. This refactor resolved the UI state issues for checkb...
(PLURAL_VERB_AFTER_THIS)
🪛 markdownlint-cli2 (0.17.2)
31-31: Files should end with a single newline character
null
(MD047, single-trailing-newline)
| * [x] Implement checkbox logic (`checked`, `onCheckedChange`). | ||
| * [x] Implement "Select All" functionality for 'pending_crawl' URLs. | ||
| * [x] Implement "Crawl Selected" button with count and disabled state logic. | ||
| 4. [x] **Read `app/page.tsx`:** Examine the current state management and where `CrawlStatusMonitor` is used. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🧹 Nitpick (assertive)
Insert missing preposition for clarity.
Adding “in which” or “where” clarifies the sentence, improving readability.
-**Read `app/page.tsx`:** Examine the current state management and where `CrawlStatusMonitor` is used.
+**Read `app/page.tsx`:** Examine the current state management and where `CrawlStatusMonitor` is used.
+ (Optional improved version)
+**Read `app/page.tsx`:** Examine the current state management and the areas in which `CrawlStatusMonitor` is used.Committable suggestion skipped: line range outside the PR's diff.
🧰 Tools
🪛 LanguageTool
[uncategorized] ~18-~18: Possible missing preposition found.
Context: .../page.tsx:** Examine the current state management and where CrawlStatusMonitor` is used....
(AI_HYDRA_LEO_MISSING_OF)
| crawl_jobs_managed = manager.dict() | ||
| logger.info("Initialized multiprocessing Manager and managed dictionary for crawl_jobs.") | ||
| except Exception as e: | ||
| logger.error(f"Failed to initialize multiprocessing Manager: {e}. Falling back to regular dict (STATE WILL NOT BE SHARED BETWEEN PROCESSES).", exc_info=True) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🧹 Nitpick (assertive)
Use logger.exception(...) instead of logger.error(..., exc_info=True).
When logging errors with stack traces, logger.exception(...) automatically adds exception info and is more concise.
Below is an example change for one occurrence. Repeat for each similar usage:
-logger.error(f"Failed to initialize multiprocessing Manager: {e}. Falling back to regular dict (STATE WILL NOT BE SHARED BETWEEN PROCESSES).", exc_info=True)
+logger.exception(f"Failed to initialize multiprocessing Manager. Falling back to regular dict (STATE WILL NOT BE SHARED BETWEEN PROCESSES).")Also applies to: 65-65, 83-83, 108-108, 124-124, 144-144, 160-160, 186-186, 201-201
🧰 Tools
🪛 Ruff (0.8.2)
18-18: Logging .exception(...) should be used instead of .error(..., exc_info=True)
(G201)
18-18: Logging statement uses f-string
(G004)
| @@ -0,0 +1,203 @@ | |||
| import logging | |||
| from pydantic import BaseModel, Field | |||
| from typing import Optional, Dict | |||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🧹 Nitpick (assertive)
Consider using built-in dict for type hints.
Python 3.9+ allows the use of built-in dict[str, Any] over typing.Dict[str, Any]. This also aligns with recommended modern Python practices.
Apply this diff:
-from typing import Optional, Dict
+from typing import Optional
...
-crawl_jobs: Dict[str, CrawlJobStatus] = crawl_jobs_managed
+crawl_jobs: dict[str, CrawlJobStatus] = crawl_jobs_managed📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| from typing import Optional, Dict | |
| from typing import Optional | |
| # ... other imports remain unchanged ... | |
| # Somewhere later in the file, when defining crawl_jobs | |
| crawl_jobs: dict[str, CrawlJobStatus] = crawl_jobs_managed |
🧰 Tools
🪛 Ruff (0.8.2)
3-3: typing.Dict is deprecated, use dict instead
(UP035)
| /* --- Old logic expecting direct page results --- | ||
| const pages = await discoverSubdomains({ url: submittedUrl, depth }) | ||
| console.log('Discovered pages:', pages) | ||
| setDiscoveredPages(pages) | ||
| setStats(prev => ({ | ||
| setDiscoveredPages(pages) // This is now handled by polling/status monitor | ||
| setStats(prev => ({ // Stats are now part of the job status | ||
| ...prev, | ||
| subdomainsParsed: pages.length | ||
| })) | ||
| toast({ | ||
| toast({ // Toast is now handled above when job starts | ||
| title: "Pages Discovered", | ||
| description: `Found ${pages.length} related pages at depth ${depth}` | ||
| }) | ||
| */ // --- End of old logic --- | ||
| } catch (error) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🧹 Nitpick (assertive)
Remove or minimize large commented-out code blocks.
Excessive commented-out code can be confusing. If version control is in place, you can rely on Git history for reference.
- /* --- Old logic expecting direct page results ---
- ...
- */ // --- End of old logic ---
+// Rely on version control for historical code; remove these blocks for clarity.Also applies to: 185-238
| const [currentJobId, setCurrentJobId] = useState<string | null>(null); // Add state for Job ID | ||
| const { toast } = useToast() | ||
| // Lifted state from CrawlStatusMonitor | ||
| const [jobStatus, setJobStatus] = useState<CrawlJobStatus | null>(null); | ||
| const [jobError, setJobError] = useState<string | null>(null); | ||
| const [isPollingLoading, setIsPollingLoading] = useState<boolean>(false); | ||
| // State lifted for selective crawl | ||
| const [selectedUrls, setSelectedUrls] = useState<Set<string>>(new Set()); | ||
| const [isCrawlingSelected, setIsCrawlingSelected] = useState<boolean>(false); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🧹 Nitpick (assertive)
Consider centralizing job-related state.
Although the approach works, the Home component is maintaining multiple job-tracking states. If state management becomes more complex or is used across components, moving it into a dedicated store (e.g., a global context or Redux) could improve scalability and maintainability.
| // Handler for selection changes from CrawlStatusMonitor | ||
| const handleSelectionChange = (newSelectedUrls: Set<string>) => { | ||
| setSelectedUrls(newSelectedUrls); | ||
| }; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🧹 Nitpick (assertive)
Encapsulate selection logic if it grows.
Your handleSelectionChange updates state with newly selected URLs. If the logic becomes more complex, consider extracting it into a separate hook or utility to keep components lean.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 3
📜 Review details
Configuration used: CodeRabbit UI
Review profile: ASSERTIVE
Plan: Pro
⛔ Files ignored due to path filters (1)
assets/image.pngis excluded by!**/*.png
📒 Files selected for processing (1)
README.md(3 hunks)
🧰 Additional context used
🪛 markdownlint-cli2 (0.17.2)
README.md
4-4: Inline HTML
Element: img
(MD033, no-inline-html)
124-124: Fenced code blocks should be surrounded by blank lines
null
(MD031, blanks-around-fences)
🔇 Additional comments (2)
README.md (2)
98-99: Update of Prerequisites for Docker
The prerequisites now list only Docker (and Git for cloning) without mentioning Docker Compose. This update aligns with the intended configuration changes. Please ensure that any internal references to Docker Compose have been removed.
111-117: Clear Mac/Linux Setup Instructions
The instructions for configuring environment variables (copying.env.templateto.envand specifyingNEXT_PUBLIC_BACKEND_URL) are clear and well-documented. This approach should help users correctly set up communication between the frontend and backend.
| <div align="center"> | ||
| <img src="https://github.com/user-attachments/assets/6d4cc4df-fe5d-4483-9218-3d621f572e49" alt="DevDocs Interface" width="800"> | ||
| <img src="https://github.com/user-attachments/assets/00350dc6-2ff3-40cf-b0b3-8b3e387d983d" alt="DevDocs Interface" width="800"> | ||
| <img src="assets/image.png" alt="DevDocs Interface" width="800"> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🧹 Nitpick (assertive)
Consider Using Markdown Image Syntax
The inline HTML <img> tag currently triggers markdownlint MD033. If possible, consider switching to Markdown’s image syntax (e.g., {width=800}) to improve compatibility with markdown linters. If custom attributes (like width) are necessary, you may retain it and simply disable the warning.
🧰 Tools
🪛 markdownlint-cli2 (0.17.2)
4-4: Inline HTML
Element: img
(MD033, no-inline-html)
|
|
||
| For Windows users: | ||
| For Windows users: Experimental Only (Not Tested Yet) | ||
| ```cmd |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🧹 Nitpick (assertive)
Improve Fenced Code Block Formatting
A blank line before and after fenced code blocks can help satisfy markdownlint MD031. Please add a blank line above and below the code fence (starting at line 124) to improve readability and adherence to markdown style guidelines.
🧰 Tools
🪛 markdownlint-cli2 (0.17.2)
124-124: Fenced code blocks should be surrounded by blank lines
null
(MD031, blanks-around-fences)
| For Windows users: Experimental Only (Not Tested Yet) | ||
| ```cmd | ||
| # Clone the repository | ||
| git clone https://github.com/cyberagiinc/DevDocs.git | ||
| # Navigate to the project directory | ||
| cd DevDocs | ||
| # Configure environment variables | ||
| # Copy the template file to .env | ||
| copy .env.template .env | ||
| # Ensure NEXT_PUBLIC_BACKEND_URL in .env is set correctly (e.g., http://localhost:24125) | ||
| # This allows the frontend (running in your browser) to communicate with the backend service. | ||
| # Prerequisites: Install WSL 2 and Docker Desktop | ||
| # Docker Desktop for Windows requires WSL 2. Please ensure you have WSL 2 installed and running first. | ||
| # 1. Install WSL 2: Follow the official Microsoft guide: https://learn.microsoft.com/en-us/windows/wsl/install | ||
| # 2. Install Docker Desktop for Windows: Download and install from the official Docker website. Docker Desktop includes Docker Compose. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🧹 Nitpick (assertive)
Enhanced Windows Setup Instructions
The updated Windows instructions now clearly state the experimental nature of the setup and outline the necessary steps, including setting environment variables and prerequisites (WSL 2 and Docker Desktop). This additional detail should aid users on Windows, though consider emphasizing that the configuration is untested and might require adjustments.
🧰 Tools
🪛 markdownlint-cli2 (0.17.2)
124-124: Fenced code blocks should be surrounded by blank lines
null
(MD031, blanks-around-fences)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 17
🔭 Outside diff range comments (2)
backend/app/crawler.py (2)
176-279:⚠️ Potential issueUndefined variable and blocking I/O in async function.
poll_urlis undefined at line 224. This causes a runtime error:status_response = requests.get(poll_url, ...)Propose defining it, for example:
+ poll_url = f"{CRAWL4AI_URL}/task/{task_id}" status_response = requests.get( poll_url, ... )Using
requests.get/postinasynccode blocks the event loop. Considerhttpxwith async or running this in a thread pool to maintain concurrency.🧰 Tools
🪛 Ruff (0.8.2)
181-181: Logging statement uses f-string
(G004)
188-188: Async functions should not call blocking HTTP methods
(ASYNC210)
192-192: Trailing comma missing
Add trailing comma
(COM812)
196-196: Logging statement uses f-string
(G004)
197-197: Logging statement uses f-string
(G004)
202-202: Logging statement uses f-string
(G004)
203-203: Do not catch blind exception:
Exception(BLE001)
204-204: Use
logging.exceptioninstead oflogging.errorReplace with
exception(TRY400)
204-204: Logging statement uses f-string
(G004)
204-204: Use explicit conversion flag
Replace with conversion flag
(RUF010)
205-205: Use
logging.exceptioninstead oflogging.errorReplace with
exception(TRY400)
205-205: Logging statement uses f-string
(G004)
209-209: f-string without any placeholders
Remove extraneous
fprefix(F541)
211-211: Trailing comma missing
Add trailing comma
(COM812)
216-216: Logging statement uses f-string
(G004)
223-223: Logging statement uses f-string
(G004)
224-224: Async functions should not call blocking HTTP methods
(ASYNC210)
225-225: Undefined name
poll_url(F821)
227-227: Trailing comma missing
Add trailing comma
(COM812)
235-235: Logging statement uses f-string
(G004)
238-238: f-string without any placeholders
Remove extraneous
fprefix(F541)
240-240: Trailing comma missing
Add trailing comma
(COM812)
244-244: Logging statement uses f-string
(G004)
249-249: Unnecessary
elifafterbreakstatementRemove unnecessary
elif(RET508)
251-251: Logging statement uses f-string
(G004)
259-259: Logging statement uses f-string
(G004)
264-264: Logging
.exception(...)should be used instead of.error(..., exc_info=True)(G201)
264-264: Logging statement uses f-string
(G004)
264-264: Use explicit conversion flag
Replace with conversion flag
(RUF010)
269-269: Logging
.exception(...)should be used instead of.error(..., exc_info=True)(G201)
269-269: Logging statement uses f-string
(G004)
269-269: Use explicit conversion flag
Replace with conversion flag
(RUF010)
277-277: Logging statement uses f-string
(G004)
373-702: 🧹 Nitpick (assertive)Large function and blocking file operations in async code.
crawl_pagesis lengthy (>45 lines) and does multiple I/O calls (requests.post,open) in an async context. This can degrade concurrency. Consider splitting tasks into smaller functions or using async-friendly libraries:
- Refactor into multiple helper methods (e.g., retrieving data, writing files, updating states).
- Use
aiofilesor run blocking I/O in a dedicated thread to prevent blocking the event loop.🧰 Tools
🪛 Ruff (0.8.2)
373-373:
crawl_pagesis too complex (45 > 10)(C901)
373-373: Use
listinstead ofListfor type annotationReplace with
list(UP006)
373-373: PEP 484 prohibits implicit
OptionalConvert to
Optional[T](RUF013)
390-390: Logging statement uses f-string
(G004)
396-396: Logging statement uses f-string
(G004)
408-408: Logging statement uses f-string
(G004)
414-414: Logging statement uses f-string
(G004)
420-420: Trailing comma missing
Add trailing comma
(COM812)
424-424: Logging statement uses f-string
(G004)
427-427: Logging statement uses f-string
(G004)
428-428: Logging statement uses f-string
(G004)
429-429: Logging statement uses f-string
(G004)
432-432: Logging statement uses f-string
(G004)
433-433: Logging statement uses f-string
(G004)
434-434: Async functions should not call blocking HTTP methods
(ASYNC210)
438-438: Trailing comma missing
Add trailing comma
(COM812)
442-442: Logging statement uses f-string
(G004)
449-449: Logging statement uses f-string
(G004)
451-451: Async functions should not call blocking HTTP methods
(ASYNC210)
454-454: Trailing comma missing
Add trailing comma
(COM812)
459-459: Logging statement uses f-string
(G004)
463-463: Logging statement uses f-string
(G004)
478-478: f-string without any placeholders
Remove extraneous
fprefix(F541)
478-478: Logging statement uses f-string
(G004)
479-479: Do not implicitly
return Nonein function able to return non-NonevalueAdd explicit
Nonereturn value(RET502)
485-485: Logging statement uses f-string
(G004)
490-490: Unnecessary key check before dictionary access
Replace with
dict.get(RUF019)
492-492: Logging statement uses f-string
(G004)
505-505: f-string without any placeholders
Remove extraneous
fprefix(F541)
516-516: Async functions should not open files with blocking methods like
open(ASYNC230)
520-520: Logging statement uses f-string
(G004)
521-521: Replace aliased errors with
OSErrorReplace
IOErrorwith builtinOSError(UP024)
522-522: Logging
.exception(...)should be used instead of.error(..., exc_info=True)(G201)
522-522: Logging statement uses f-string
(G004)
525-525: Logging
.exception(...)should be used instead of.error(..., exc_info=True)(G201)
525-525: Logging statement uses f-string
(G004)
534-534: Unnecessary open mode parameters
Remove open mode parameters
(UP015)
534-534: Async functions should not open files with blocking methods like
open(ASYNC230)
537-537: Logging
.exception(...)should be used instead of.error(..., exc_info=True)(G201)
537-537: Logging statement uses f-string
(G004)
539-539: Replace aliased errors with
OSErrorReplace
IOErrorwith builtinOSError(UP024)
540-540: Logging
.exception(...)should be used instead of.error(..., exc_info=True)(G201)
540-540: Logging statement uses f-string
(G004)
543-543: Logging
.exception(...)should be used instead of.error(..., exc_info=True)(G201)
543-543: Logging statement uses f-string
(G004)
551-551:
datetime.datetime.now()called without atzargument(DTZ005)
553-553: Trailing comma missing
Add trailing comma
(COM812)
560-560:
datetime.datetime.now()called without atzargument(DTZ005)
562-562: Trailing comma missing
Add trailing comma
(COM812)
566-566:
datetime.datetime.now()called without atzargument(DTZ005)
569-569: Async functions should not open files with blocking methods like
open(ASYNC230)
571-571: Logging statement uses f-string
(G004)
572-572: Replace aliased errors with
OSErrorReplace
IOErrorwith builtinOSError(UP024)
573-573: Logging
.exception(...)should be used instead of.error(..., exc_info=True)(G201)
573-573: Logging statement uses f-string
(G004)
575-575: Logging
.exception(...)should be used instead of.error(..., exc_info=True)(G201)
575-575: Logging statement uses f-string
(G004)
576-576: Logging statement uses f-string
(G004)
577-577: Do not catch blind exception:
Exception(BLE001)
578-578: Use
logging.exceptioninstead oflogging.errorReplace with
exception(TRY400)
578-578: Logging statement uses f-string
(G004)
578-578: Use explicit conversion flag
Replace with conversion flag
(RUF010)
581-581: Unnecessary
elifafterbreakstatementRemove unnecessary
elif(RET508)
583-583: Logging statement uses f-string
(G004)
588-588: Logging statement uses f-string
(G004)
589-589: Do not catch blind exception:
Exception(BLE001)
590-590: Use
logging.exceptioninstead oflogging.errorReplace with
exception(TRY400)
590-590: Logging statement uses f-string
(G004)
590-590: Use explicit conversion flag
Replace with conversion flag
(RUF010)
596-596: Logging statement uses f-string
(G004)
603-603: Unnecessary key check before dictionary access
Replace with
dict.get(RUF019)
624-624: Trailing comma missing
Add trailing comma
(COM812)
636-636: Logging statement uses f-string
(G004)
644-644: Logging statement uses f-string
(G004)
650-650: Logging statement uses f-string
(G004)
657-657: Use explicit conversion flag
Replace with conversion flag
(RUF010)
658-658: Logging
.exception(...)should be used instead of.error(..., exc_info=True)(G201)
676-676: Trailing comma missing
Add trailing comma
(COM812)
679-679: Logging statement uses f-string
(G004)
686-686: Trailing comma missing
Add trailing comma
(COM812)
690-690: Use explicit conversion flag
Replace with conversion flag
(RUF010)
691-691: Logging
.exception(...)should be used instead of.error(..., exc_info=True)(G201)
700-700: Trailing comma missing
Add trailing comma
(COM812)
701-701: Trailing comma missing
Add trailing comma
(COM812)
♻️ Duplicate comments (5)
README.md (1)
4-5: Inline HTML usage was previously flagged.
This<img>usage with inline HTML triggers markdownlint MD033. Consider switching to Markdown image syntax or disabling the rule if custom attributes are required.🧰 Tools
🪛 markdownlint-cli2 (0.17.2)
4-4: Inline HTML
Element: img(MD033, no-inline-html)
backend/app/main.py (1)
1-1: 🧹 Nitpick (assertive)Remove unused or deprecated imports.
psutil(line 8) andtyping.Dict(line 5) appear unused. Removing them keeps the codebase lean:-import psutil -from typing import List, Optional, Dict +from typing import Optional ...fast-markdown-mcp/src/fast_markdown_mcp/server.py (3)
20-20: Remove unusedSectionimport.
This import is flagged as unused by past reviews and static analysis.-from .document_structure import DocumentStructure, Section +from .document_structure import DocumentStructure🧰 Tools
🪛 Ruff (0.8.2)
20-20: Module level import not at top of file
(E402)
20-20:
.document_structure.Sectionimported but unusedRemove unused import:
.document_structure.Section(F401)
246-444: Function complexity is high.
This method combines multiple nested flows. Splitting it into helper functions could aid in readability and testing.🧰 Tools
🪛 Ruff (0.8.2)
246-246:
smart_section_searchis too complex (22 > 10)(C901)
247-247: Boolean-typed positional argument in function definition
(FBT001)
247-247: Boolean default positional argument in function definition
(FBT002)
247-247: Boolean-typed positional argument in function definition
(FBT001)
247-247: Boolean default positional argument in function definition
(FBT002)
276-276: Missing return type annotation for private function
collect_sectionsAdd return type annotation:
None(ANN202)
276-276: Missing type annotation for function argument
section_list(ANN001)
278-278: Function definition does not bind loop variable
sections(B023)
295-295: Trailing comma missing
Add trailing comma
(COM812)
308-308: Trailing comma missing
Add trailing comma
(COM812)
321-321: Trailing comma missing
Add trailing comma
(COM812)
340-340: Trailing comma missing
Add trailing comma
(COM812)
358-358: Trailing comma missing
Add trailing comma
(COM812)
371-371: Trailing comma missing
Add trailing comma
(COM812)
406-406: Trailing comma missing
Add trailing comma
(COM812)
441-441: Do not catch blind exception:
Exception(BLE001)
442-442: Use
logging.exceptioninstead oflogging.errorReplace with
exception(TRY400)
442-442: Logging statement uses f-string
(G004)
443-443: Use explicit conversion flag
Replace with conversion flag
(RUF010)
10-10: 🧹 Nitpick (assertive)Remove unused typing imports.
None of these imports are explicitly used in this file. Consider removing them or switching to built-in typings if on Python 3.9+.-from typing import Optional, Dict, List, Tuple, Any🧰 Tools
🪛 Ruff (0.8.2)
10-10:
typing.Dictis deprecated, usedictinstead(UP035)
10-10:
typing.Listis deprecated, uselistinstead(UP035)
10-10:
typing.Tupleis deprecated, usetupleinstead(UP035)
10-10:
typing.Optionalimported but unusedRemove unused import
(F401)
10-10:
typing.Dictimported but unusedRemove unused import
(F401)
10-10:
typing.Listimported but unusedRemove unused import
(F401)
10-10:
typing.Tupleimported but unusedRemove unused import
(F401)
10-10:
typing.Anyimported but unusedRemove unused import
(F401)
📜 Review details
Configuration used: CodeRabbit UI
Review profile: ASSERTIVE
Plan: Pro
📒 Files selected for processing (8)
README.md(2 hunks)app/api/storage/route.ts(2 hunks)app/page.tsx(4 hunks)backend/app/crawler.py(18 hunks)backend/app/main.py(4 hunks)docker/dockerfiles/Dockerfile.frontend(2 hunks)fast-markdown-mcp/src/fast_markdown_mcp/server.py(5 hunks)lib/crawl-service.ts(3 hunks)
🧰 Additional context used
🧬 Code Graph Analysis (4)
lib/crawl-service.ts (2)
lib/types.ts (3)
DiscoverResponse(75-79)CrawlRequest(82-85)CrawlResponse(88-93)backend/app/main.py (1)
CrawlRequest(62-64)
app/page.tsx (8)
lib/types.ts (3)
DiscoveredPage(7-12)CrawlJobStatus(63-72)OverallStatus(44-51)components/ui/use-toast.ts (2)
useToast(194-194)toast(194-194)backend/app/status_manager.py (1)
CrawlJobStatus(23-32)lib/crawl-service.ts (2)
discoverSubdomains(6-42)crawlPages(44-84)components/UrlInput.tsx (1)
UrlInput(12-152)components/ui/dialog.tsx (6)
Dialog(112-112)DialogTrigger(116-116)DialogContent(117-117)DialogHeader(118-118)DialogTitle(120-120)DialogDescription(121-121)components/ui/button.tsx (1)
Button(56-56)components/SubdomainList.tsx (1)
SubdomainList(14-284)
backend/app/main.py (3)
backend/app/crawler.py (5)
discover_pages(133-371)crawl_pages(373-702)DiscoveredPage(42-46)CrawlResult(54-56)url_to_filename(60-110)lib/types.ts (4)
DiscoveredPage(7-12)CrawlResult(21-34)CrawlJobStatus(63-72)CrawlRequest(82-85)backend/app/status_manager.py (4)
CrawlJobStatus(23-32)initialize_job(40-69)update_overall_status(71-110)update_url_status(112-146)
backend/app/crawler.py (2)
backend/app/status_manager.py (2)
update_overall_status(71-110)update_url_status(112-146)backend/app/utils.py (1)
normalize_url(3-9)
🪛 markdownlint-cli2 (0.17.2)
README.md
4-4: Inline HTML
Element: img
(MD033, no-inline-html)
124-124: Fenced code blocks should be surrounded by blank lines
null
(MD031, blanks-around-fences)
🪛 Ruff (0.8.2)
fast-markdown-mcp/src/fast_markdown_mcp/server.py
8-8: math imported but unused
Remove unused import: math
(F401)
10-10: typing.Dict is deprecated, use dict instead
(UP035)
10-10: typing.List is deprecated, use list instead
(UP035)
10-10: typing.Tuple is deprecated, use tuple instead
(UP035)
10-10: typing.Optional imported but unused
Remove unused import
(F401)
10-10: typing.Dict imported but unused
Remove unused import
(F401)
10-10: typing.List imported but unused
Remove unused import
(F401)
10-10: typing.Tuple imported but unused
Remove unused import
(F401)
10-10: typing.Any imported but unused
Remove unused import
(F401)
20-20: Module level import not at top of file
(E402)
20-20: .document_structure.Section imported but unused
Remove unused import: .document_structure.Section
(F401)
246-246: smart_section_search is too complex (22 > 10)
(C901)
247-247: Boolean-typed positional argument in function definition
(FBT001)
247-247: Boolean default positional argument in function definition
(FBT002)
247-247: Boolean-typed positional argument in function definition
(FBT001)
247-247: Boolean default positional argument in function definition
(FBT002)
276-276: Missing return type annotation for private function collect_sections
Add return type annotation: None
(ANN202)
276-276: Missing type annotation for function argument section_list
(ANN001)
278-278: Function definition does not bind loop variable sections
(B023)
295-295: Trailing comma missing
Add trailing comma
(COM812)
308-308: Trailing comma missing
Add trailing comma
(COM812)
321-321: Trailing comma missing
Add trailing comma
(COM812)
340-340: Trailing comma missing
Add trailing comma
(COM812)
358-358: Trailing comma missing
Add trailing comma
(COM812)
371-371: Trailing comma missing
Add trailing comma
(COM812)
406-406: Trailing comma missing
Add trailing comma
(COM812)
441-441: Do not catch blind exception: Exception
(BLE001)
442-442: Use logging.exception instead of logging.error
Replace with exception
(TRY400)
442-442: Logging statement uses f-string
(G004)
443-443: Use explicit conversion flag
Replace with conversion flag
(RUF010)
710-710: Trailing comma missing
Add trailing comma
(COM812)
714-714: Trailing comma missing
Add trailing comma
(COM812)
718-718: Trailing comma missing
Add trailing comma
(COM812)
722-722: Trailing comma missing
Add trailing comma
(COM812)
723-723: Trailing comma missing
Add trailing comma
(COM812)
725-725: Trailing comma missing
Add trailing comma
(COM812)
726-726: Trailing comma missing
Add trailing comma
(COM812)
781-781: Avoid specifying long messages outside the exception class
(TRY003)
802-802: Trailing comma missing
Add trailing comma
(COM812)
backend/app/main.py
5-5: typing.List is deprecated, use list instead
(UP035)
5-5: typing.Dict is deprecated, use dict instead
(UP035)
5-5: typing.Optional imported but unused
Remove unused import
(F401)
5-5: typing.Dict imported but unused
Remove unused import
(F401)
8-8: psutil imported but unused
Remove unused import: psutil
(F401)
16-16: .crawler.CrawlResult imported but unused
Remove unused import: .crawler.CrawlResult
(F401)
21-21: Trailing comma missing
Add trailing comma
(COM812)
27-34: Module level import not at top of file
(E402)
31-31: .status_manager.update_url_status imported but unused
Remove unused import: .status_manager.update_url_status
(F401)
33-33: Trailing comma missing
Add trailing comma
(COM812)
398-398: Missing return type annotation for public function get_storage_file_content
(ANN201)
407-407: Logging statement uses f-string
(G004)
408-408: Abstract raise to an inner function
(TRY301)
418-418: Logging statement uses f-string
(G004)
419-419: Abstract raise to an inner function
(TRY301)
420-420: Do not catch blind exception: Exception
(BLE001)
421-421: Logging statement uses f-string
(G004)
422-422: Within an except clause, raise exceptions with raise ... from err or raise ... from None to distinguish them from errors in exception handling
(B904)
425-425: Logging statement uses f-string
(G004)
426-426: Abstract raise to an inner function
(TRY301)
428-428: Logging statement uses f-string
(G004)
430-430: Logging statement uses f-string
(G004)
436-436: Use raise without specifying exception name
Remove exception name
(TRY201)
439-439: Logging .exception(...) should be used instead of .error(..., exc_info=True)
(G201)
439-439: Logging statement uses f-string
(G004)
439-439: Use explicit conversion flag
Replace with conversion flag
(RUF010)
440-440: Within an except clause, raise exceptions with raise ... from err or raise ... from None to distinguish them from errors in exception handling
(B904)
440-440: Use explicit conversion flag
Replace with conversion flag
(RUF010)
464-464: Missing return type annotation for public function discover_endpoint
(ANN201)
469-469: Logging statement uses f-string
(G004)
478-478: Logging statement uses f-string
(G004)
483-483: Trailing comma missing
Add trailing comma
(COM812)
485-485: Logging statement uses f-string
(G004)
485-485: Undefined name response_data
(F821)
486-486: Consider moving this statement to an else block
(TRY300)
486-486: Undefined name response_data
(F821)
488-488: Logging .exception(...) should be used instead of .error(..., exc_info=True)
(G201)
488-488: Logging statement uses f-string
(G004)
488-488: Use explicit conversion flag
Replace with conversion flag
(RUF010)
491-491: Within an except clause, raise exceptions with raise ... from err or raise ... from None to distinguish them from errors in exception handling
(B904)
491-491: Use explicit conversion flag
Replace with conversion flag
(RUF010)
493-493: Missing return type annotation for public function crawl_endpoint
(ANN201)
497-497: Logging statement uses f-string
(G004)
501-501: Logging statement uses f-string
(G004)
502-502: Abstract raise to an inner function
(TRY301)
512-512: Logging statement uses f-string
(G004)
518-518: Logging statement uses f-string
(G004)
520-524: Consider moving this statement to an else block
(TRY300)
523-523: Trailing comma missing
Add trailing comma
(COM812)
526-526: Logging .exception(...) should be used instead of .error(..., exc_info=True)
(G201)
526-526: Logging statement uses f-string
(G004)
526-526: Use explicit conversion flag
Replace with conversion flag
(RUF010)
528-528: Use explicit conversion flag
Replace with conversion flag
(RUF010)
529-529: Within an except clause, raise exceptions with raise ... from err or raise ... from None to distinguish them from errors in exception handling
(B904)
529-529: Use explicit conversion flag
Replace with conversion flag
(RUF010)
533-533: Missing return type annotation for public function get_crawl_status
(ANN201)
535-535: Logging statement uses f-string
(G004)
538-538: Logging statement uses f-string
(G004)
540-540: Logging statement uses f-string
(G004)
backend/app/crawler.py
1-1: typing.List is deprecated, use list instead
(UP035)
1-1: typing.Set is deprecated, use set instead
(UP035)
1-1: typing.Dict is deprecated, use dict instead
(UP035)
1-1: typing.Dict imported but unused
Remove unused import: typing.Dict
(F401)
141-141: PEP 484 prohibits implicit Optional
Convert to Optional[T]
(RUF013)
142-142: Trailing comma missing
Add trailing comma
(COM812)
143-143: Use list instead of List for type annotation
Replace with list
(UP006)
157-157: Logging statement uses f-string
(G004)
181-181: Logging statement uses f-string
(G004)
216-216: Logging statement uses f-string
(G004)
223-223: Logging statement uses f-string
(G004)
224-224: Async functions should not call blocking HTTP methods
(ASYNC210)
225-225: Undefined name poll_url
(F821)
227-227: Trailing comma missing
Add trailing comma
(COM812)
244-244: Logging statement uses f-string
(G004)
249-249: Unnecessary elif after break statement
Remove unnecessary elif
(RET508)
251-251: Logging statement uses f-string
(G004)
259-259: Logging statement uses f-string
(G004)
264-264: Logging .exception(...) should be used instead of .error(..., exc_info=True)
(G201)
264-264: Logging statement uses f-string
(G004)
264-264: Use explicit conversion flag
Replace with conversion flag
(RUF010)
269-269: Logging .exception(...) should be used instead of .error(..., exc_info=True)
(G201)
269-269: Logging statement uses f-string
(G004)
269-269: Use explicit conversion flag
Replace with conversion flag
(RUF010)
277-277: Logging statement uses f-string
(G004)
349-349: Trailing comma missing
Add trailing comma
(COM812)
357-357: Logging statement uses f-string
(G004)
359-359: Consider moving this statement to an else block
(TRY300)
362-362: Use explicit conversion flag
Replace with conversion flag
(RUF010)
363-363: Logging .exception(...) should be used instead of .error(..., exc_info=True)
(G201)
373-373: crawl_pages is too complex (45 > 10)
(C901)
373-373: Use list instead of List for type annotation
Replace with list
(UP006)
373-373: PEP 484 prohibits implicit Optional
Convert to Optional[T]
(RUF013)
408-408: Logging statement uses f-string
(G004)
424-424: Logging statement uses f-string
(G004)
442-442: Logging statement uses f-string
(G004)
463-463: Logging statement uses f-string
(G004)
505-505: f-string without any placeholders
Remove extraneous f prefix
(F541)
516-516: Async functions should not open files with blocking methods like open
(ASYNC230)
520-520: Logging statement uses f-string
(G004)
521-521: Replace aliased errors with OSError
Replace IOError with builtin OSError
(UP024)
522-522: Logging .exception(...) should be used instead of .error(..., exc_info=True)
(G201)
522-522: Logging statement uses f-string
(G004)
525-525: Logging .exception(...) should be used instead of .error(..., exc_info=True)
(G201)
525-525: Logging statement uses f-string
(G004)
534-534: Unnecessary open mode parameters
Remove open mode parameters
(UP015)
534-534: Async functions should not open files with blocking methods like open
(ASYNC230)
537-537: Logging .exception(...) should be used instead of .error(..., exc_info=True)
(G201)
537-537: Logging statement uses f-string
(G004)
539-539: Replace aliased errors with OSError
Replace IOError with builtin OSError
(UP024)
540-540: Logging .exception(...) should be used instead of .error(..., exc_info=True)
(G201)
540-540: Logging statement uses f-string
(G004)
543-543: Logging .exception(...) should be used instead of .error(..., exc_info=True)
(G201)
543-543: Logging statement uses f-string
(G004)
569-569: Async functions should not open files with blocking methods like open
(ASYNC230)
571-571: Logging statement uses f-string
(G004)
572-572: Replace aliased errors with OSError
Replace IOError with builtin OSError
(UP024)
573-573: Logging .exception(...) should be used instead of .error(..., exc_info=True)
(G201)
573-573: Logging statement uses f-string
(G004)
575-575: Logging .exception(...) should be used instead of .error(..., exc_info=True)
(G201)
575-575: Logging statement uses f-string
(G004)
576-576: Logging statement uses f-string
(G004)
577-577: Do not catch blind exception: Exception
(BLE001)
578-578: Use logging.exception instead of logging.error
Replace with exception
(TRY400)
578-578: Logging statement uses f-string
(G004)
578-578: Use explicit conversion flag
Replace with conversion flag
(RUF010)
581-581: Unnecessary elif after break statement
Remove unnecessary elif
(RET508)
583-583: Logging statement uses f-string
(G004)
588-588: Logging statement uses f-string
(G004)
589-589: Do not catch blind exception: Exception
(BLE001)
590-590: Use logging.exception instead of logging.error
Replace with exception
(TRY400)
590-590: Logging statement uses f-string
(G004)
590-590: Use explicit conversion flag
Replace with conversion flag
(RUF010)
596-596: Logging statement uses f-string
(G004)
636-636: Logging statement uses f-string
(G004)
644-644: Logging statement uses f-string
(G004)
650-650: Logging statement uses f-string
(G004)
657-657: Use explicit conversion flag
Replace with conversion flag
(RUF010)
658-658: Logging .exception(...) should be used instead of .error(..., exc_info=True)
(G201)
679-679: Logging statement uses f-string
(G004)
686-686: Trailing comma missing
Add trailing comma
(COM812)
690-690: Use explicit conversion flag
Replace with conversion flag
(RUF010)
691-691: Logging .exception(...) should be used instead of .error(..., exc_info=True)
(G201)
🔇 Additional comments (30)
app/api/storage/route.ts (2)
60-60: Proper type annotation for diskFileDetailsGood use of the newly defined interface to type the
diskFileDetailsarray. This enhances type safety and makes the code more maintainable.
186-186: Explicit type annotation in filter functionGood addition of the explicit type annotation for the file parameter in the filter function. This enhances type safety and helps with code readability.
If you implement the suggestion to rename
allFilestodiskFiles, don't forget to update this line accordingly.docker/dockerfiles/Dockerfile.frontend (1)
25-25: Ensure correct environment variable flow.
By settingENV NEXT_PUBLIC_BACKEND_URL=$NEXT_PUBLIC_BACKEND_URLhere, you may override the default value established at line 14. Verify that your Docker build arguments are passed correctly to avoid runtime mismatches, especially in multi-stage builds.README.md (1)
111-117: Environment variable setup looks fine.
The steps for copying.env.templateand settingNEXT_PUBLIC_BACKEND_URLare clear.lib/crawl-service.ts (2)
6-36: Enhanced discovery workflow is properly handled.
• Returning aPromise<DiscoverResponse>and checkingjob_idensures robust error handling.
• Logging helps trace potential issues.
• Overall logic is correct, ensuring the UI can handle mismatches or missing fields gracefully.
44-82: Crawl initiation refactor improves clarity.
• Accepting{ pages, jobId }aligns well with the updatedCrawlRequesttype.
• Verifyingsuccessandjob_idin the response is a solid approach to ensure valid acknowledgments.
• The fallback response in thecatchblock is helpful for the UI.backend/app/crawler.py (5)
12-14: Import cleanup looks good.Removing the unused
set_task_contextimport and movingnormalize_urltoutilshelps reduce clutter.
58-58: Comment explaining function relocation is clear.The note about moving
normalize_urltoutils.pyis helpful for maintainability.
112-127: Good removal of the in-memory storage logic.Eliminating unused or legacy in-memory patterns helps simplify file handling and reduce confusion. The new, disk-based approach is more transparent.
131-131: Removal of file redirection logging is appropriate.No concerns here, as it's consistent with dropping the old in-memory infrastructure.
152-159: Initialization of discovery status is well-organized.Setting
root_urland updating overall status to'discovering'is straightforward.🧰 Tools
🪛 Ruff (0.8.2)
157-157: Logging statement uses f-string
(G004)
backend/app/main.py (5)
26-34: Good reorganization for status management imports.Consolidating status-related imports under one block clarifies job tracking logic.
🧰 Tools
🪛 Ruff (0.8.2)
27-34: Module level import not at top of file
(E402)
31-31:
.status_manager.update_url_statusimported but unusedRemove unused import:
.status_manager.update_url_status(F401)
33-33: Trailing comma missing
Add trailing comma
(COM812)
63-63: Linking crawl request to job ID is beneficial.Introducing
job_idinCrawlRequestensures the crawl process can be tracked properly.
395-396: Explicit storage directory definition looks clean.Declaring
STORAGE_DIR = Path("storage/markdown")at the top-level is clear and consistent.
397-441: Secure file read approach approved.
- Checking
file_pathto prevent directory traversal is well-implemented:if "/" in safe_file_name or "\\" in safe_file_name ...- Using
.resolve().is_relative_to(base_path)(Python 3.9+ feature) further mitigates symlink attacks.🧰 Tools
🪛 Ruff (0.8.2)
398-398: Missing return type annotation for public function
get_storage_file_content(ANN201)
407-407: Logging statement uses f-string
(G004)
408-408: Abstract
raiseto an inner function(TRY301)
418-418: Logging statement uses f-string
(G004)
419-419: Abstract
raiseto an inner function(TRY301)
420-420: Do not catch blind exception:
Exception(BLE001)
421-421: Logging statement uses f-string
(G004)
422-422: Within an
exceptclause, raise exceptions withraise ... from errorraise ... from Noneto distinguish them from errors in exception handling(B904)
425-425: Logging statement uses f-string
(G004)
426-426: Abstract
raiseto an inner function(TRY301)
428-428: Logging statement uses f-string
(G004)
430-430: Logging statement uses f-string
(G004)
436-436: Use
raisewithout specifying exception nameRemove exception name
(TRY201)
439-439: Logging
.exception(...)should be used instead of.error(..., exc_info=True)(G201)
439-439: Logging statement uses f-string
(G004)
439-439: Use explicit conversion flag
Replace with conversion flag
(RUF010)
440-440: Within an
exceptclause, raise exceptions withraise ... from errorraise ... from Noneto distinguish them from errors in exception handling(B904)
440-440: Use explicit conversion flag
Replace with conversion flag
(RUF010)
469-469: Logging statements enhance clarity.Indicating the assigned job ID in logs helps debugging.
🧰 Tools
🪛 Ruff (0.8.2)
469-469: Logging statement uses f-string
(G004)
app/page.tsx (11)
5-6: Replacing ProcessingBlock with JobStatsSummary.Swapping the old component with
JobStatsSummarysimplifies the code. This fosters a dedicated stats display.
10-12: UI composition improvements.Adding
CrawlStatusMonitor,CrawlUrls, andButtonfosters a more modular UI design.
13-20: Dialog components import is neat.Leveraging the new UI dialog abstraction keeps code consistent and reusable.
25-26: Extended type imports facilitate better status handling.Using
DiscoveredPage,CrawlJobStatus, andOverallStatushelps unify typed data flows.
33-40: Removing old stats state.Removing commented-out stats helps reduce clutter, deferring to
JobStatsSummaryfor all stats usage.
41-49: New job ID and selection states.Tracking
currentJobId,jobStatus, andselectedUrlsexternally is a flexible architecture.
65-91: Discovery now returns a job ID only.Clearing
discoveredPagesand letting the monitor poll for results aligns with the new asynchronous model.
131-157: Crawl selected logic updated.You’re converting selected URLs into
DiscoveredPageobjects with'pending_crawl'. This is consistent with the backend.
167-178: Clear user feedback on crawl request.Using toast notifications for success or error states ensures good UX.
254-332: Lifted polling logic for job status.Centralizing the status fetching in a
useEffectis a maintainable approach, especially stopping on terminal states.
425-426: Rendering consolidated files by default.The consolidated approach provides a better user experience. Good choice to show them.
fast-markdown-mcp/src/fast_markdown_mcp/server.py (3)
16-16: SequenceMatcher usage looks good.
LeveragingSequenceMatcherfor text similarity is an efficient and straightforward approach.
245-245: No meaningful updates.
Nothing changed except whitespace.
778-804: Implementation for the new tool looks good.
This straightforward wrapper aroundsmart_section_searchis consistent with the other tools and properly handles optional parameters.🧰 Tools
🪛 Ruff (0.8.2)
781-781: Avoid specifying long messages outside the exception class
(TRY003)
802-802: Trailing comma missing
Add trailing comma
(COM812)
| // Define interface for disk file details | ||
| interface DiskFileDetail { | ||
| name: string; | ||
| jsonPath: string; | ||
| markdownPath: string; | ||
| timestamp: Date; | ||
| size: number; | ||
| wordCount: number; | ||
| charCount: number; | ||
| isConsolidated: boolean; | ||
| pagesCount: number; | ||
| rootUrl: string; | ||
| isInMemory: boolean; | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🧹 Nitpick (assertive)
Good addition of the DiskFileDetail interface
The introduction of the DiskFileDetail interface improves type safety and code clarity by explicitly defining the structure of disk file objects. This is a positive change that aligns with TypeScript best practices.
One suggestion: Consider whether the isInMemory property is necessary since it will always be false for disk files. It might be clearer to remove this property since in-memory file handling has been removed from the codebase.
interface DiskFileDetail {
name: string;
jsonPath: string;
markdownPath: string;
timestamp: Date;
size: number;
wordCount: number;
charCount: number;
isConsolidated: boolean;
pagesCount: number;
rootUrl: string;
- isInMemory: boolean;
}📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| // Define interface for disk file details | |
| interface DiskFileDetail { | |
| name: string; | |
| jsonPath: string; | |
| markdownPath: string; | |
| timestamp: Date; | |
| size: number; | |
| wordCount: number; | |
| charCount: number; | |
| isConsolidated: boolean; | |
| pagesCount: number; | |
| rootUrl: string; | |
| isInMemory: boolean; | |
| } | |
| // Define interface for disk file details | |
| interface DiskFileDetail { | |
| name: string; | |
| jsonPath: string; | |
| markdownPath: string; | |
| timestamp: Date; | |
| size: number; | |
| wordCount: number; | |
| charCount: number; | |
| isConsolidated: boolean; | |
| pagesCount: number; | |
| rootUrl: string; | |
| } |
| // Removed fetching and combining of in-memory files as that feature was removed. | ||
| // We now only work with files read from disk. | ||
| const allFiles = diskFileDetails // Keep variable name for minimal diff, even though it's just disk files now |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🧹 Nitpick (assertive)
Clean removal of in-memory file handling logic
The commented explanation clearly indicates that in-memory file handling has been removed, which aligns with the PR objectives to simplify the architecture.
Consider renaming the allFiles variable to something more descriptive like diskFiles to better reflect its current content, as it no longer combines both disk and in-memory files.
// Removed fetching and combining of in-memory files as that feature was removed.
// We now only work with files read from disk.
-const allFiles = diskFileDetails // Keep variable name for minimal diff, even though it's just disk files now
+const diskFiles = diskFileDetails // Renamed to reflect that these are only disk files nowAnd update any references to allFiles further down in the code.
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| // Removed fetching and combining of in-memory files as that feature was removed. | |
| // We now only work with files read from disk. | |
| const allFiles = diskFileDetails // Keep variable name for minimal diff, even though it's just disk files now | |
| // Removed fetching and combining of in-memory files as that feature was removed. | |
| // We now only work with files read from disk. | |
| const diskFiles = diskFileDetails // Renamed to reflect that these are only disk files now |
| @@ -1,5 +1,6 @@ | |||
| # Base Node.js image | |||
| FROM node:20-alpine | |||
| ARG NEXT_PUBLIC_BACKEND_URL | |||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🧹 Nitpick (assertive)
Consider consolidating environment variable lines.
You already define ENV NEXT_PUBLIC_BACKEND_URL http://backend:24125 at line 14. Setting ARG NEXT_PUBLIC_BACKEND_URL here and then overriding it later can be confusing. Consider removing or unifying the duplicate declarations for consistency and easier maintenance.
ARG NEXT_PUBLIC_BACKEND_URL
-ENV NEXT_PUBLIC_BACKEND_URL http://backend:24125
+ENV NEXT_PUBLIC_BACKEND_URL=$NEXT_PUBLIC_BACKEND_URL📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| ARG NEXT_PUBLIC_BACKEND_URL | |
| # Lines 1-3 | |
| ARG NEXT_PUBLIC_BACKEND_URL | |
| # ... other intermediate instructions ... | |
| # Line 14 (updated) | |
| ENV NEXT_PUBLIC_BACKEND_URL=$NEXT_PUBLIC_BACKEND_URL |
| For Windows users: Experimental Only (Not Tested Yet) | ||
| ```cmd | ||
| # Clone the repository | ||
| git clone https://github.com/cyberagiinc/DevDocs.git | ||
| # Navigate to the project directory | ||
| cd DevDocs | ||
| # Configure environment variables | ||
| # Copy the template file to .env | ||
| copy .env.template .env | ||
| # Ensure NEXT_PUBLIC_BACKEND_URL in .env is set correctly (e.g., http://localhost:24125) | ||
| # This allows the frontend (running in your browser) to communicate with the backend service. | ||
| # Prerequisites: Install WSL 2 and Docker Desktop | ||
| # Docker Desktop for Windows requires WSL 2. Please ensure you have WSL 2 installed and running first. | ||
| # 1. Install WSL 2: Follow the official Microsoft guide: https://learn.microsoft.com/en-us/windows/wsl/install | ||
| # 2. Install Docker Desktop for Windows: Download and install from the official Docker website. Docker Desktop includes Docker Compose. | ||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🧹 Nitpick (assertive)
Add blank lines around fenced code blocks (MD031).
In Windows instructions (around line 124), the code block should be separated by blank lines to comply with markdownlint MD031 and improve readability.
123 For Windows users: Experimental Only (Not Tested Yet)
-```cmd
+
+```cmd
copy .env.template .env
...
-```
+```
🧰 Tools
🪛 markdownlint-cli2 (0.17.2)
124-124: Fenced code blocks should be surrounded by blank lines
null
(MD031, blanks-around-fences)
| import { DiscoveredPage, CrawlResult, DiscoverOptions, DiscoverResponse, CrawlRequest, CrawlResponse } from './types' // Added new types | ||
|
|
||
| const BACKEND_URL = process.env.NEXT_PUBLIC_BACKEND_URL || process.env.BACKEND_URL || 'http://localhost:24125' | ||
| const BACKEND_URL = process.env.NEXT_PUBLIC_BACKEND_URL || 'http://localhost:24125'; // Use env variable | ||
| console.log(`Using backend URL: ${BACKEND_URL}`); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🧹 Nitpick (assertive)
Logging and environment variable usage.
Importing new types and referring to NEXT_PUBLIC_BACKEND_URL for the backend is consistent with your approach. The additional console logs are useful for debugging but may be too verbose for production environments.
| def _calculate_similarity(self, text1: str, text2: str) -> float: | ||
| """Calculate text similarity using SequenceMatcher.""" | ||
| return SequenceMatcher(None, text1.lower(), text2.lower()).ratio() | ||
|
|
||
| def _calculate_confidence(self, similarity: float, match_type: str) -> float: | ||
| """Calculate confidence score based on similarity and match type.""" | ||
| # Base confidence from similarity score | ||
| confidence = similarity | ||
|
|
||
| # Adjust based on match type | ||
| if match_type == "exact": | ||
| confidence = min(1.0, confidence * 1.2) # Boost exact matches | ||
| elif match_type == "fuzzy": | ||
| confidence = confidence * 0.9 # Slightly reduce fuzzy matches | ||
| elif match_type == "regex": | ||
| confidence = confidence * 0.95 # Slightly reduce regex matches | ||
|
|
||
| return round(confidence, 2) | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🧹 Nitpick (assertive)
Ensure consistent confidence clamping for all match types.
While only exact matches are clamped to a maximum of 1.0, you might also consider applying a similar clamp for fuzzy and regex matches to keep the scoring consistent.
| def collect_sections(section_list): | ||
| for section in section_list: | ||
| sections.append(section) | ||
| collect_sections(section.subsections) | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🧹 Nitpick (assertive)
Add missing type annotations.
The helper function collect_sections lacks type hints for its parameter and return type.
-def collect_sections(section_list):
+def collect_sections(section_list: list[Section]) -> None:📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| def collect_sections(section_list): | |
| for section in section_list: | |
| sections.append(section) | |
| collect_sections(section.subsections) | |
| def collect_sections(section_list: list[Section]) -> None: | |
| for section in section_list: | |
| sections.append(section) | |
| collect_sections(section.subsections) |
🧰 Tools
🪛 Ruff (0.8.2)
276-276: Missing return type annotation for private function collect_sections
Add return type annotation: None
(ANN202)
276-276: Missing type annotation for function argument section_list
(ANN001)
278-278: Function definition does not bind loop variable sections
(B023)
| import json | ||
| import re | ||
| import asyncio | ||
| import math |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🧹 Nitpick (assertive)
Remove unused import.
math is not referenced anywhere, so removing it helps keep your codebase clean.
-import math📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| import math |
🧰 Tools
🪛 Ruff (0.8.2)
8-8: math imported but unused
Remove unused import: math
(F401)
| types.Tool( | ||
| name="smart_section_search", | ||
| description="Advanced search for sections with ranking and confidence scores", | ||
| inputSchema={ | ||
| "type": "object", | ||
| "properties": { | ||
| "query": { | ||
| "type": "string", | ||
| "description": "Search query to find in sections" | ||
| }, | ||
| "max_results": { | ||
| "type": "number", | ||
| "description": "Maximum number of results to return (default: 10)" | ||
| }, | ||
| "use_fuzzy": { | ||
| "type": "boolean", | ||
| "description": "Whether to use fuzzy matching (default: true)" | ||
| }, | ||
| "use_regex": { | ||
| "type": "boolean", | ||
| "description": "Whether to use regex matching (default: true)" | ||
| } | ||
| }, | ||
| "required": ["query"] | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🧹 Nitpick (assertive)
Add trailing commas in your JSON schema definition.
Static analysis suggests adding trailing commas to these object definitions (lines 710, 714, 718, 722, 723, 725, 726) for clarity and consistency.
"query": {
"type": "string",
- "description": "Search query to find in sections"
+ "description": "Search query to find in sections",
},
"max_results": {
"type": "number",
- "description": "Maximum number of results to return (default: 10)"
+ "description": "Maximum number of results to return (default: 10)",
},
...
- "required": ["query"]
+ "required": ["query"],📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| types.Tool( | |
| name="smart_section_search", | |
| description="Advanced search for sections with ranking and confidence scores", | |
| inputSchema={ | |
| "type": "object", | |
| "properties": { | |
| "query": { | |
| "type": "string", | |
| "description": "Search query to find in sections" | |
| }, | |
| "max_results": { | |
| "type": "number", | |
| "description": "Maximum number of results to return (default: 10)" | |
| }, | |
| "use_fuzzy": { | |
| "type": "boolean", | |
| "description": "Whether to use fuzzy matching (default: true)" | |
| }, | |
| "use_regex": { | |
| "type": "boolean", | |
| "description": "Whether to use regex matching (default: true)" | |
| } | |
| }, | |
| "required": ["query"] | |
| } | |
| types.Tool( | |
| name="smart_section_search", | |
| description="Advanced search for sections with ranking and confidence scores", | |
| inputSchema={ | |
| "type": "object", | |
| "properties": { | |
| "query": { | |
| "type": "string", | |
| "description": "Search query to find in sections", | |
| }, | |
| "max_results": { | |
| "type": "number", | |
| "description": "Maximum number of results to return (default: 10)", | |
| }, | |
| "use_fuzzy": { | |
| "type": "boolean", | |
| "description": "Whether to use fuzzy matching (default: true)", | |
| }, | |
| "use_regex": { | |
| "type": "boolean", | |
| "description": "Whether to use regex matching (default: true)", | |
| }, | |
| }, | |
| "required": ["query"], | |
| } | |
| ) |
🧰 Tools
🪛 Ruff (0.8.2)
710-710: Trailing comma missing
Add trailing comma
(COM812)
714-714: Trailing comma missing
Add trailing comma
(COM812)
718-718: Trailing comma missing
Add trailing comma
(COM812)
722-722: Trailing comma missing
Add trailing comma
(COM812)
723-723: Trailing comma missing
Add trailing comma
(COM812)
725-725: Trailing comma missing
Add trailing comma
(COM812)
726-726: Trailing comma missing
Add trailing comma
(COM812)
| except Exception as e: | ||
| logger.error(f"Error in smart section search: {e}") | ||
| return f"Error performing smart section search: {str(e)}" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🧹 Nitpick (assertive)
Catch more specific exceptions and use logger.exception.
Avoid catching Exception blindly. Also, logger.exception automatically includes a traceback.
- except Exception as e:
- logger.error(f"Error in smart section search: {e}")
- return f"Error performing smart section search: {str(e)}"
+ except ValueError as e:
+ logger.exception("Error in smart section search")
+ return f"Error performing smart section search: {e}"
+ except Exception as e:
+ logger.exception("Unexpected error occurred")
+ return f"Error performing smart section search: {e}"📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| except Exception as e: | |
| logger.error(f"Error in smart section search: {e}") | |
| return f"Error performing smart section search: {str(e)}" | |
| except ValueError as e: | |
| logger.exception("Error in smart section search") | |
| return f"Error performing smart section search: {e}" | |
| except Exception as e: | |
| logger.exception("Unexpected error occurred") | |
| return f"Error performing smart section search: {e}" |
🧰 Tools
🪛 Ruff (0.8.2)
441-441: Do not catch blind exception: Exception
(BLE001)
442-442: Use logging.exception instead of logging.error
Replace with exception
(TRY400)
442-442: Logging statement uses f-string
(G004)
443-443: Use explicit conversion flag
Replace with conversion flag
(RUF010)
Patch Fixes:
feat: Enhance crawl workflow, UI, and fix backend issues
This commit addresses several issues and implements enhancements across the crawling workflow:
Fixes:
file_path) in the file content API route.NameError(set_task_context) in crawler.py that prevented result file saving.UI Enhancements:
Backend Enhancements:
.mdand.jsonfiles. Subsequent crawls for the same job now append data and update timestamps instead of overwriting.feat(frontend): Update Consolidated Files component for polling and downloads