fix(graphql): enforce max_comments cap across all threads#47
Conversation
Summary of ChangesHello @petems, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request addresses a critical bug in the GraphQL comment fetching mechanism where the configured Highlights
Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here. You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension. Footnotes
|
WalkthroughThis pull request implements a hard cap (max_comments_v) on collected comments in GraphQL comment retrieval by introducing a limit_reached flag with early-exit logic at both thread and comment loop levels, emitting diagnostic warnings upon limit breach, and adds nine comprehensive async tests validating boundary conditions, pagination interactions, and limit-enforcement scenarios. Changes
Sequence DiagramsequenceDiagram
participant Caller
participant fetch_pr_comments
participant GraphQL
participant Comments List
Caller->>fetch_pr_comments: fetch_pr_comments_graphql(max_comments)
loop For each thread
fetch_pr_comments->>fetch_pr_comments: Check limit_reached before thread
alt limit_reached == true
fetch_pr_comments->>Caller: break outer loop
else
loop For each comment in thread
fetch_pr_comments->>GraphQL: retrieve comment
GraphQL-->>fetch_pr_comments: comment data
fetch_pr_comments->>Comments List: append comment
fetch_pr_comments->>fetch_pr_comments: Increment total count
fetch_pr_comments->>fetch_pr_comments: Check if total >= max_comments
alt limit reached
fetch_pr_comments->>fetch_pr_comments: Set limit_reached=true
fetch_pr_comments->>fetch_pr_comments: Emit warning to stderr
fetch_pr_comments->>fetch_pr_comments: break inner loop
end
end
end
end
fetch_pr_comments-->>Caller: comments, limit_reached flag
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~20 minutes
Possibly related PRs
Poem
Pre-merge checks and finishing touches✅ Passed checks (3 passed)
✨ Finishing touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
Codecov Report✅ All modified and coverable lines are covered by tests. 📢 Thoughts on this report? Let us know! |
There was a problem hiding this comment.
Code Review
The pull request addresses a bug where the max_comments cap was being exceeded when processing multiple review threads. The changes introduce a limit_reached flag to coordinate stopping across nested loops, add cap checks before processing each thread and before appending each comment, and break the outer thread loop when the limit is reached. Diagnostic logging is also added to stderr when stopping early. The changes strictly honor the PR_FETCH_MAX_COMMENTS safety cap and prevent unbounded memory/time usage on PRs with many review threads.
|
Note Unit test generation is an Early Access feature. Expect some limitations and changes as we gather feedback and continue to improve it. Generating unit tests... This may take up to 20 minutes. |
There was a problem hiding this comment.
Actionable comments posted: 0
🧹 Nitpick comments (1)
src/mcp_github_pr_review/server.py (1)
431-432: Good diagnostic logging for debugging over-collection issues.The message is well-placed and clearly indicates early termination due to the cap.
Consider including the actual limit value in the message for better diagnostics:
- print("Reached max_comments limit; stopping early", file=sys.stderr) + print(f"Reached max_comments limit ({max_comments_v}); stopping early", file=sys.stderr)
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
src/mcp_github_pr_review/server.py(3 hunks)
🧰 Additional context used
📓 Path-based instructions (2)
**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
**/*.py: Use Ruff for formatting and linting; code must be formatted with Ruff and pass Ruff checks
Adhere to line length of 88 characters (Ruff configuration)
Target Python 3.10+ syntax and features
Maintain static typing that passes mypy
Keep imports sorted per Ruff’s import sorting rules
Address security findings covered by Ruff’s bandit rules
Code must compile without SyntaxError (compile check)
**/*.py: Use async/await patterns for all I/O operations (GitHub API calls, file operations)
Use type hints for all function parameters and return values
Handle errors gracefully with proper logging (log to stderr)
Pass ruff linting with zero violations
Format code with ruff format
Code must pass static type checks with mypy
Implement retry logic for transient failures (e.g., GitHub API)
Validate all input parameters (URLs, filenames, numeric limits)
Use safe file operations (exclusive create, path validation)
Enforce pagination safety caps and rate limiting when calling GitHub APIs
Ensure code is compatible with Python 3.10+
Run make compile-check to catch import/syntax errors early; code must be syntactically valid
Return meaningful error messages to MCP clients
Files:
src/mcp_github_pr_review/server.py
src/mcp_github_pr_review/server.py
📄 CodeRabbit inference engine (CLAUDE.md)
src/mcp_github_pr_review/server.py: All GitHub API interactions must be async/await based
Implement robust error handling for timeouts, API errors, and rate limiting
Enforce pagination safety limits (max pages and max comments)
generate_markdown() must use dynamic markdown fencing to handle nested backticks
get_pr_info() must correctly parse owner, repo, and PR number from GitHub URLs
Files:
src/mcp_github_pr_review/server.py
🧠 Learnings (1)
📚 Learning: 2025-10-22T20:35:36.888Z
Learnt from: CR
PR: cool-kids-inc/github-pr-review-mcp-server#0
File: CLAUDE.md:0-0
Timestamp: 2025-10-22T20:35:36.888Z
Learning: Applies to src/mcp_github_pr_review/server.py : Enforce pagination safety limits (max pages and max comments)
Applied to files:
src/mcp_github_pr_review/server.py
🔇 Additional comments (1)
src/mcp_github_pr_review/server.py (1)
400-416: Excellent fix for the max_comments over-collection bug!The dual-check pattern correctly enforces the cap:
- Outer loop check (lines 402-404): Prevents processing new threads once the limit is reached
- Inner loop check (lines 414-416): Prevents adding comments that would reach/exceed the limit
- Coordination via
limit_reachedflag: Enables clean early-exit and diagnostic loggingThis properly solves the issue described in the PR objectives where the previous single break allowed at least one comment per remaining thread to be added after the cap was reached.
|
Creating a local commit for the generated unit tests... The unit tests have been committed to the current branch. Commit ID: |
There was a problem hiding this comment.
Actionable comments posted: 4
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (2)
src/mcp_github_pr_review/server.py (1)
400-432: Critical: Missing break statement to stop pagination when limit is reached.The
limit_reachedflag correctly breaks out of the thread and comment loops within a page, but after printing the diagnostic message (line 432), the pagination loop continues. If another page exists andlen(all_comments) < max_comments_v, the code will fetch the next page, potentially adding more comments beyond the intended limit.Apply this diff to break out of the pagination loop when the limit is reached:
if limit_reached: print("Reached max_comments limit; stopping early", file=sys.stderr) + break # Check paginationThis ensures that once the limit is reached within any page, no additional pages are fetched, fully enforcing the
max_commentscap as intended by the PR objectives.tests/test_graphql_error_handling.py (1)
414-416: Critical: Limit check occurs before appending, missing the exact boundary case.The limit check on line 414 occurs before appending the comment on line 430. This means when the last comment that reaches exactly
max_comments_vis appended, the loop exits naturally without settinglimit_reached = True, so the diagnostic message is never printed.Example: With
max_comments=100and 100 comments available:
- Iterations 1-99: Check
len < 100, append comments- Iteration 100: Check
99 < 100, append comment 100 (len now 100)- Loop exits naturally (no more comments)
limit_reachedremainsFalse, no diagnostic messageApply this diff to check the limit after appending:
for comment in comments: - if len(all_comments) >= max_comments_v: - limit_reached = True - break # Convert GraphQL format to REST-like format with added fields # Guard against null author (e.g., deleted user accounts) author = comment.get("author") or {} review_comment: ReviewComment = { "user": {"login": author.get("login") or "unknown"}, "path": comment.get("path", ""), "line": comment.get("line") or 0, "body": comment.get("body", ""), "diff_hunk": comment.get("diffHunk", ""), "is_resolved": is_resolved, "is_outdated": is_outdated, "resolved_by": resolved_by, } all_comments.append(review_comment) + if len(all_comments) >= max_comments_v: + limit_reached = True + breakThis ensures the check occurs after each comment is added, correctly detecting when the limit is reached.
🧹 Nitpick comments (1)
TEST_COVERAGE_SUMMARY.md (1)
1-209: Documentation is comprehensive but may need updates after production code fixes.The test coverage summary is well-structured and thorough. However, given the critical issues identified in the production code (missing
breakstatement after the diagnostic message, and limit check occurring before comment append), some test expectations and this documentation may need adjustment once those fixes are applied.Specifically, sections describing pagination behavior (lines 72-83) and exact limit handling (lines 97-107) should be reviewed after the production code is corrected to ensure accuracy.
Consider adding a "Known Issues" section temporarily while the production code bugs are being fixed, then remove it once resolved.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (3)
TEST_COVERAGE_SUMMARY.md(1 hunks)src/mcp_github_pr_review/server.py(3 hunks)tests/test_graphql_error_handling.py(1 hunks)
🧰 Additional context used
📓 Path-based instructions (4)
**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
**/*.py: Use Ruff for formatting and linting; code must be formatted with Ruff and pass Ruff checks
Adhere to line length of 88 characters (Ruff configuration)
Target Python 3.10+ syntax and features
Maintain static typing that passes mypy
Keep imports sorted per Ruff’s import sorting rules
Address security findings covered by Ruff’s bandit rules
Code must compile without SyntaxError (compile check)
**/*.py: Use async/await patterns for all I/O operations (GitHub API calls, file operations)
Use type hints for all function parameters and return values
Handle errors gracefully with proper logging (log to stderr)
Pass ruff linting with zero violations
Format code with ruff format
Code must pass static type checks with mypy
Implement retry logic for transient failures (e.g., GitHub API)
Validate all input parameters (URLs, filenames, numeric limits)
Use safe file operations (exclusive create, path validation)
Enforce pagination safety caps and rate limiting when calling GitHub APIs
Ensure code is compatible with Python 3.10+
Run make compile-check to catch import/syntax errors early; code must be syntactically valid
Return meaningful error messages to MCP clients
Files:
tests/test_graphql_error_handling.pysrc/mcp_github_pr_review/server.py
tests/**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
tests/**/*.py: Use pytest as the single test runner with async tests supported by shared fixtures
Avoid live GitHub calls in tests by default; use mocks/fixtures
tests/**/*.py: Name test functions with the test_ prefix
Use pytest-asyncio for async test functions
Mock external dependencies (GitHub API, filesystem) in tests
Files:
tests/test_graphql_error_handling.py
tests/test_*.py
📄 CodeRabbit inference engine (AGENTS.md)
Place test files under tests/ with filename pattern tests/test_*.py
Files:
tests/test_graphql_error_handling.py
src/mcp_github_pr_review/server.py
📄 CodeRabbit inference engine (CLAUDE.md)
src/mcp_github_pr_review/server.py: All GitHub API interactions must be async/await based
Implement robust error handling for timeouts, API errors, and rate limiting
Enforce pagination safety limits (max pages and max comments)
generate_markdown() must use dynamic markdown fencing to handle nested backticks
get_pr_info() must correctly parse owner, repo, and PR number from GitHub URLs
Files:
src/mcp_github_pr_review/server.py
🧠 Learnings (1)
📚 Learning: 2025-10-22T20:35:36.888Z
Learnt from: CR
PR: cool-kids-inc/github-pr-review-mcp-server#0
File: CLAUDE.md:0-0
Timestamp: 2025-10-22T20:35:36.888Z
Learning: Applies to src/mcp_github_pr_review/server.py : Enforce pagination safety limits (max pages and max comments)
Applied to files:
src/mcp_github_pr_review/server.py
🧬 Code graph analysis (1)
tests/test_graphql_error_handling.py (2)
tests/conftest.py (5)
github_token(213-218)json(408-409)raise_for_status(411-415)post(138-148)post(440-450)src/mcp_github_pr_review/server.py (1)
fetch_pr_comments_graphql(261-458)
🪛 GitHub Actions: Lint
tests/test_graphql_error_handling.py
[error] 571-571: E501 Line too long (91 > 88)
[error] 918-918: E501 Line too long (93 > 88)
[error] 921-921: E501 Line too long (91 > 88)
🪛 GitHub Actions: Tests
tests/test_graphql_error_handling.py
[error] 1-1: GraphQL pagination did not stop after reaching max_comments limit; expected to stop after 80 comments but returned 120.
[error] 1-1: Missing expected log/message when max_comments is reached exactly: 'Reached max_comments limit; stopping early'.
[error] 1-1: Process completed with exit code 1 due to test failures in GraphQL limit handling tests.
Previously, the max_comments cap could be exceeded when processing multiple threads. The break statement only exited the inner comment loop, allowing at least one comment per remaining thread to be added. This fix: - Introduces a limit_reached flag to track cap enforcement - Checks the limit before processing each thread and comment - Breaks the outer thread loop when limit is reached - Adds stderr logging when stopping early due to limit The change prevents unbounded memory/time use when PRs have many threads and ensures the safety cap is strictly honored. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
eeee995 to
a3becfc
Compare
The limit_reached flag was being reset on each page fetch, causing the max_comments cap to not be properly enforced across multiple pages. Additionally, even when the limit was reached, pagination would continue. This fix: - Moves limit_reached initialization outside pagination loop - Adds check after thread processing to detect exact limit hits - Adds break statement to stop pagination when limit is reached - Updates test expectations to match correct behavior - Fixes lint issues (line length violations) Ensures max_comments safety cap is strictly enforced across all pages and pagination stops immediately when the limit is reached. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
There was a problem hiding this comment.
Actionable comments posted: 2
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (3)
TEST_COVERAGE_SUMMARY.md(1 hunks)src/mcp_github_pr_review/server.py(3 hunks)tests/test_graphql_error_handling.py(1 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
- src/mcp_github_pr_review/server.py
🧰 Additional context used
📓 Path-based instructions (3)
**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
**/*.py: Use Ruff for formatting and linting; code must be formatted with Ruff and pass Ruff checks
Adhere to line length of 88 characters (Ruff configuration)
Target Python 3.10+ syntax and features
Maintain static typing that passes mypy
Keep imports sorted per Ruff’s import sorting rules
Address security findings covered by Ruff’s bandit rules
Code must compile without SyntaxError (compile check)
**/*.py: Use async/await patterns for all I/O operations (GitHub API calls, file operations)
Use type hints for all function parameters and return values
Handle errors gracefully with proper logging (log to stderr)
Pass ruff linting with zero violations
Format code with ruff format
Code must pass static type checks with mypy
Implement retry logic for transient failures (e.g., GitHub API)
Validate all input parameters (URLs, filenames, numeric limits)
Use safe file operations (exclusive create, path validation)
Enforce pagination safety caps and rate limiting when calling GitHub APIs
Ensure code is compatible with Python 3.10+
Run make compile-check to catch import/syntax errors early; code must be syntactically valid
Return meaningful error messages to MCP clients
Files:
tests/test_graphql_error_handling.py
tests/**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
tests/**/*.py: Use pytest as the single test runner with async tests supported by shared fixtures
Avoid live GitHub calls in tests by default; use mocks/fixtures
tests/**/*.py: Name test functions with the test_ prefix
Use pytest-asyncio for async test functions
Mock external dependencies (GitHub API, filesystem) in tests
Files:
tests/test_graphql_error_handling.py
tests/test_*.py
📄 CodeRabbit inference engine (AGENTS.md)
Place test files under tests/ with filename pattern tests/test_*.py
Files:
tests/test_graphql_error_handling.py
🧬 Code graph analysis (1)
tests/test_graphql_error_handling.py (2)
tests/conftest.py (5)
github_token(213-218)json(408-409)raise_for_status(411-415)post(138-148)post(440-450)src/mcp_github_pr_review/server.py (1)
fetch_pr_comments_graphql(249-448)
🔇 Additional comments (1)
tests/test_graphql_error_handling.py (1)
543-1198: Excellent comprehensive test coverage for limit enforcement!All 8 new test functions properly validate the
limit_reachedflag behavior across various scenarios:
- Boundary conditions (exact limits, mid-thread, mid-comment)
- Pagination interactions
- Empty thread handling
- Diagnostic message verification
The tests follow proper pytest conventions with async patterns, proper mocking, clear docstrings, and comprehensive assertions. The pagination test (lines 847-957) correctly expects 2 API calls and 120 total comments, properly validating that page 2 is fetched to fill up to the limit.
…MARY.md Addresses PR review comments from CodeRabbit: - Fix path typo: mcp_GitHub_pr_review → mcp_github_pr_review (lines 4, 167) - Update test_graphql_limit_with_pagination_stops_early description to correctly reflect that both pages are fetched (120 comments total) instead of incorrectly stating only page 1 is fetched (80 comments) 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
There was a problem hiding this comment.
Actionable comments posted: 1
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
TEST_COVERAGE_SUMMARY.md(1 hunks)
🧰 Additional context used
🪛 LanguageTool
TEST_COVERAGE_SUMMARY.md
[uncategorized] ~4-~4: The official name of this software platform is spelled with a capital “H”.
Context: ...for the limit_reached flag changes in src/mcp_github_pr_review/server.py. ## Changes Teste...
(GITHUB)
* fix(graphql): enforce max_comments cap across all threads Previously, the max_comments cap could be exceeded when processing multiple threads. The break statement only exited the inner comment loop, allowing at least one comment per remaining thread to be added. This fix: - Introduces a limit_reached flag to track cap enforcement - Checks the limit before processing each thread and comment - Breaks the outer thread loop when limit is reached - Adds stderr logging when stopping early due to limit The change prevents unbounded memory/time use when PRs have many threads and ensures the safety cap is strictly honored. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com> * 📝 CodeRabbit Chat: Add unit tests and docs for GraphQL limit_reached flag * fix(graphql): prevent limit_reached flag reset across pagination The limit_reached flag was being reset on each page fetch, causing the max_comments cap to not be properly enforced across multiple pages. Additionally, even when the limit was reached, pagination would continue. This fix: - Moves limit_reached initialization outside pagination loop - Adds check after thread processing to detect exact limit hits - Adds break statement to stop pagination when limit is reached - Updates test expectations to match correct behavior - Fixes lint issues (line length violations) Ensures max_comments safety cap is strictly enforced across all pages and pagination stops immediately when the limit is reached. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com> * docs: fix path typos and update test description in TEST_COVERAGE_SUMMARY.md Addresses PR review comments from CodeRabbit: - Fix path typo: mcp_GitHub_pr_review → mcp_github_pr_review (lines 4, 167) - Update test_graphql_limit_with_pagination_stops_early description to correctly reflect that both pages are fetched (120 comments total) instead of incorrectly stating only page 1 is fetched (80 comments) 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com> --------- Co-authored-by: Claude <noreply@anthropic.com> Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com>
Summary
max_commentscap could be significantly exceeded when processing multiple review threadsbreakstatement only exited the inner comment loop, allowing at least one comment per remaining thread to be added after the cap was reachedChanges
limit_reachedflag to coordinate stopping across nested loopsImpact
PR_FETCH_MAX_COMMENTSsafety capTest Plan
🤖 Generated with Claude Code
Summary by CodeRabbit
Improvements
Tests