feat: Enhanced GitHub API rate limit handling with structured logging#50
feat: Enhanced GitHub API rate limit handling with structured logging#50
Conversation
Add comprehensive GitHub rate limit detection and retry logic: - Distinguish between secondary (abuse) and primary rate limits - Handle secondary limits with 60s backoff and one-retry policy - Extract retry timing from Retry-After and X-RateLimit-Reset headers - Log GitHub Request IDs for support escalation - Increase exponential backoff cap from 5s to 15s - Apply rate limit handling to both REST and GraphQL endpoints - Add SecondaryRateLimitError exception for clean abort behavior - Include comprehensive test suite for rate limit scenarios 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
Consolidate rate limit detection and retry logic into a dedicated class: - Create RateLimitHandler class with comprehensive docstrings - Encapsulate secondary and primary rate limit detection - Manage retry state within handler instance - Remove duplicated code between REST and GraphQL endpoints - Add 6 new unit tests for direct RateLimitHandler testing - Reduce cognitive complexity in fetch_pr_comments functions - Improve testability through dependency injection pattern Benefits: - Cleaner separation of concerns - Easier to test rate limit logic in isolation - Stateful tracking of retry attempts per handler - More maintainable and extensible code Test coverage: 350 tests passing (6 new unit tests added) 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
Replace stderr print statements with structured logging using Python's logging module with contextual metadata for better observability: Changes to RateLimitHandler: - Add structured logging for rate limit detection with extra fields - Log secondary rate limits with context, status_code, backoff_seconds - Log primary rate limits with rate_limit_type metadata - Log request IDs at INFO level for support escalation - Log rate limit exhaustion at ERROR level with retry_count Changes to fetch functions: - Replace print statements with logger calls for errors - Add contextual metadata (owner, repo, pull_number) to error logs - Log GraphQL API errors with structured error details - Log max_comments limit reached with fetched_comments count - Log 401 Bearer token fallback with auth_fallback metadata - Convert page fetching to DEBUG level with structured data Test updates: - Configure caplog to capture logger output at INFO level - Update assertions to check caplog.text instead of captured.err - Add import logging and caplog.set_level() to affected tests - All 350 tests passing with proper log capture Benefits: - Machine-parseable log output for monitoring/alerting - Rich context for debugging production issues - Consistent logging patterns across codebase - Easy integration with log aggregation systems (ELK, Datadog, etc.) - Structured extra fields enable filtering and grouping 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
|
Note Other AI code review bot(s) detectedCodeRabbit has detected other AI code review bot(s) in this pull request and will avoid duplicating their findings in the review comments. This may lead to a less comprehensive review. WalkthroughIntroduces structured logging, a centralized RateLimitHandler and SecondaryRateLimitError for primary/secondary rate-limit detection and retry control, raises exponential backoff cap to 15.0s, and wires rate-limit-aware status handlers into GraphQL and REST comment-fetching flows while replacing print diagnostics with logger calls. Changes
Sequence Diagram(s)sequenceDiagram
autonumber
participant Caller as Client
participant Retry as _retry_http_request
participant Status as status_handler (callback)
participant RHandler as RateLimitHandler
participant API as GitHub API
Caller->>Retry: call request_fn(status_handler)
loop attempts
Retry->>API: perform HTTP request
API-->>Retry: httpx.Response
alt status_handler provided
Retry->>Status: await status_handler(response, attempt)
Status->>RHandler: delegate handle_rate_limit(response)
alt Secondary rate limit
RHandler->>RHandler: sleep(SECONDARY_RATE_LIMIT_BACKOFF)
RHandler-->>Status: return retry or raise SecondaryRateLimitError
Status-->>Retry: retry or abort
else Primary rate limit
RHandler->>RHandler: compute delay (Retry-After / X-RateLimit-Reset)
RHandler-->>Status: return delay directive
Status-->>Retry: retry after delay
else No rate limit / success
RHandler-->>Status: return None
Status-->>Retry: return response
end
else
alt transient failure
Retry->>Retry: compute exponential backoff (cap 15s)
Retry->>Retry: sleep(delay)
else success
Retry-->>Caller: return response
end
end
end
Estimated code review effort🎯 4 (Complex) | ⏱️ ~50 minutes Areas to focus on:
Possibly related PRs
Poem
Pre-merge checks and finishing touches❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✨ Finishing touches
🧪 Generate unit tests (beta)
📜 Recent review detailsConfiguration used: CodeRabbit UI Review profile: CHILL Plan: Pro 📒 Files selected for processing (4)
✅ Files skipped from review due to trivial changes (1)
🧰 Additional context used📓 Path-based instructions (4)**/*.py📄 CodeRabbit inference engine (CLAUDE.md)
Files:
src/mcp_github_pr_review/server.py📄 CodeRabbit inference engine (CLAUDE.md)
Files:
tests/**/*.py📄 CodeRabbit inference engine (CLAUDE.md)
Files:
tests/test_*.py📄 CodeRabbit inference engine (AGENTS.md)
Files:
🧠 Learnings (1)📚 Learning: 2025-10-22T20:35:36.888ZApplied to files:
🧬 Code graph analysis (3)src/mcp_github_pr_review/server.py (1)
tests/test_graphql_error_handling.py (1)
tests/test_rate_limit_handling.py (1)
🔇 Additional comments (11)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
Summary of ChangesHello @petems, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request significantly upgrades the application's interaction with the GitHub API by introducing sophisticated rate limit management and structured logging. The core change involves refactoring rate limit detection and retry mechanisms into a dedicated class, allowing for differentiated handling of primary and secondary rate limits. This, combined with detailed, machine-parseable logs, aims to enhance the system's resilience, debuggability, and operational visibility when dealing with GitHub's API constraints. Highlights
Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here. You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension. Footnotes
|
Codecov Report❌ Patch coverage is
📢 Thoughts on this report? Let us know! |
There was a problem hiding this comment.
Code Review
This is an excellent pull request that significantly improves the robustness and observability of GitHub API interactions. The introduction of the RateLimitHandler class is a great architectural change, cleanly separating concerns and removing duplicated code. The distinction between primary and secondary rate limits is handled well, and the move to structured logging is a huge step up from print statements. The new tests are comprehensive and provide good coverage for the new logic. I have a few minor suggestions to further improve the readability and maintainability of the new rate limit handling code, but overall this is a high-quality contribution.
There was a problem hiding this comment.
Actionable comments posted: 1
🧹 Nitpick comments (4)
tests/test_graphql_error_handling.py (1)
545-557: Use caplog.at_level or module-level import to reduce repetition.You can avoid repeating
import logging+caplog.set_level(logging.INFO)in each test by either:
- Using
with caplog.at_level(logging.INFO): ...around the assertions, or- Importing
loggingonce at module scope and callingcaplog.set_level(logging.INFO)per test only when necessary.Assertions against
"Reached max_comments limit"look good and align with the server log message.Also applies to: 631-642, 711-722, 781-791, 861-872, 971-972, 976-977, 1042-1053, 1104-1105
src/mcp_github_pr_review/server.py (2)
329-391: Replace print-based diagnostics with structured logging in retry helper.Adopt the module logger for consistency and machine-parseable logs.
except httpx.RequestError as e: if attempt < max_retries: delay = _calculate_backoff_delay(attempt) - print( - f"Request error: {e}. Retrying in {delay:.2f}s...", - file=sys.stderr, - ) + logger.warning( + "Request error; retrying with backoff", + extra={"error": str(e), "attempt": attempt, "delay_sec": round(delay, 2)}, + ) await asyncio.sleep(delay) attempt += 1 continue raise @@ if 500 <= response.status_code < 600 and attempt < max_retries: delay = _calculate_backoff_delay(attempt) - print( - f"Server error {response.status_code}. Retrying in {delay:.2f}s...", - file=sys.stderr, - ) + logger.warning( + "Server error; retrying with backoff", + extra={ + "status_code": response.status_code, + "attempt": attempt, + "delay_sec": round(delay, 2), + }, + ) await asyncio.sleep(delay) attempt += 1 continue
817-838: Unify remaining prints under structured logging.Prefer
logger.debug/info/warning/errorwithextra=...to complete the migration to structured logs.Example conversions:
- print( - f"Fetched {len(threads)} threads, " - f"total comments: {len(all_comments)}", - file=sys.stderr, - ) + logger.debug( + "GraphQL page fetched", + extra={"threads": len(threads), "total_comments": len(all_comments)}, + )- print( - "DEBUG: page_count=" - f"{page_count}, MAX_PAGES={max_pages_v}, " - f"comments_len={len(all_comments)}", - file=sys.stderr, -) + logger.debug( + "REST pagination progress", + extra={ + "page_count": page_count, + "max_pages": max_pages_v, + "total_comments": len(all_comments), + }, + )- print(f"DEBUG: next_url={next_url}", file=sys.stderr) + logger.debug("REST next page", extra={"next_url": next_url})- print(f"Timeout error fetching PR comments: {str(e)}", file=sys.stderr) + logger.error("Timeout fetching PR comments", extra={"error": str(e)})- print(error_msg, file=sys.stderr) + logger.error(error_msg, extra={"tool": name})Also applies to: 642-647, 648-652, 654-661, 851-857, 1122-1125
tests/test_rate_limit_handling.py (1)
1-392: Solid coverage for secondary/primary RL and backoff. Add a persistent-primary case.Consider adding a test where primary RL repeats N>cap times to assert we stop retrying (after implementing the primary cap).
Example test sketch:
+@pytest.mark.asyncio +async def test_rest_primary_rate_limit_persistent_aborts(monkeypatch: pytest.MonkeyPatch) -> None: + primary = _make_rest_response( + 403, + {"message": "API rate limit exceeded"}, + headers={"Retry-After": "1", "X-GitHub-Request-Id": "ghi999"}, + ) + client = _mock_async_client("get", [primary, primary, primary, primary]) + with patch("httpx.AsyncClient", return_value=client): + recorder = SleepRecorder() + monkeypatch.setattr("mcp_github_pr_review.server.asyncio.sleep", recorder) + # With PRIMARY_RATE_LIMIT_MAX_RETRIES=3, we should attempt 4 requests total (initial + 3 retries) + with pytest.raises(httpx.HTTPStatusError): + await fetch_pr_comments("owner", "repo", 123) + assert client.get.call_count >= 4 + assert len(recorder.calls) == 3
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (3)
src/mcp_github_pr_review/server.py(10 hunks)tests/test_graphql_error_handling.py(9 hunks)tests/test_rate_limit_handling.py(1 hunks)
🧰 Additional context used
📓 Path-based instructions (4)
**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
**/*.py: Use Ruff for formatting and linting; code must be formatted with Ruff and pass Ruff checks
Adhere to line length of 88 characters (Ruff configuration)
Target Python 3.10+ syntax and features
Maintain static typing that passes mypy
Keep imports sorted per Ruff’s import sorting rules
Address security findings covered by Ruff’s bandit rules
Code must compile without SyntaxError (compile check)
**/*.py: Use async/await patterns for all I/O operations (GitHub API calls, file operations)
Use type hints for all function parameters and return values
Handle errors gracefully with proper logging (log to stderr)
Pass ruff linting with zero violations
Format code with ruff format
Code must pass static type checks with mypy
Implement retry logic for transient failures (e.g., GitHub API)
Validate all input parameters (URLs, filenames, numeric limits)
Use safe file operations (exclusive create, path validation)
Enforce pagination safety caps and rate limiting when calling GitHub APIs
Ensure code is compatible with Python 3.10+
Run make compile-check to catch import/syntax errors early; code must be syntactically valid
Return meaningful error messages to MCP clients
Files:
tests/test_graphql_error_handling.pytests/test_rate_limit_handling.pysrc/mcp_github_pr_review/server.py
tests/**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
tests/**/*.py: Use pytest as the single test runner with async tests supported by shared fixtures
Avoid live GitHub calls in tests by default; use mocks/fixtures
tests/**/*.py: Name test functions with the test_ prefix
Use pytest-asyncio for async test functions
Mock external dependencies (GitHub API, filesystem) in tests
Files:
tests/test_graphql_error_handling.pytests/test_rate_limit_handling.py
tests/test_*.py
📄 CodeRabbit inference engine (AGENTS.md)
Place test files under tests/ with filename pattern tests/test_*.py
Files:
tests/test_graphql_error_handling.pytests/test_rate_limit_handling.py
src/mcp_github_pr_review/server.py
📄 CodeRabbit inference engine (CLAUDE.md)
src/mcp_github_pr_review/server.py: All GitHub API interactions must be async/await based
Implement robust error handling for timeouts, API errors, and rate limiting
Enforce pagination safety limits (max pages and max comments)
generate_markdown() must use dynamic markdown fencing to handle nested backticks
get_pr_info() must correctly parse owner, repo, and PR number from GitHub URLs
Files:
src/mcp_github_pr_review/server.py
🧠 Learnings (3)
📓 Common learnings
Learnt from: CR
PR: cool-kids-inc/github-pr-review-mcp-server#0
File: CLAUDE.md:0-0
Timestamp: 2025-10-22T20:35:36.888Z
Learning: Applies to src/mcp_github_pr_review/server.py : Implement robust error handling for timeouts, API errors, and rate limiting
📚 Learning: 2025-10-22T20:35:36.888Z
Learnt from: CR
PR: cool-kids-inc/github-pr-review-mcp-server#0
File: CLAUDE.md:0-0
Timestamp: 2025-10-22T20:35:36.888Z
Learning: Applies to src/mcp_github_pr_review/server.py : Implement robust error handling for timeouts, API errors, and rate limiting
Applied to files:
src/mcp_github_pr_review/server.py
📚 Learning: 2025-10-22T20:35:36.888Z
Learnt from: CR
PR: cool-kids-inc/github-pr-review-mcp-server#0
File: CLAUDE.md:0-0
Timestamp: 2025-10-22T20:35:36.888Z
Learning: Applies to src/mcp_github_pr_review/server.py : Enforce pagination safety limits (max pages and max comments)
Applied to files:
src/mcp_github_pr_review/server.py
🧬 Code graph analysis (3)
tests/test_graphql_error_handling.py (1)
tests/conftest.py (1)
github_token(213-218)
tests/test_rate_limit_handling.py (1)
src/mcp_github_pr_review/server.py (6)
RateLimitHandler(151-312)SecondaryRateLimitError(143-148)_calculate_backoff_delay(315-326)fetch_pr_comments(664-857)fetch_pr_comments_graphql(429-661)handle_rate_limit(249-312)
src/mcp_github_pr_review/server.py (1)
tests/conftest.py (3)
get(128-136)get(430-438)json(408-409)
🔇 Additional comments (5)
src/mcp_github_pr_review/server.py (5)
466-470: LGTM: token requirement now logged.Clear, structured error on missing GITHUB_TOKEN for GraphQL path.
315-327: Backoff cap raised to 15s looks good.Matches tests; jitter and cap remain predictable and safe.
628-635: LGTM: limit-reached info log matches tests.Message prefix is stable and includes structured fields.
736-774: LGTM: REST status handler and auth fallback logging.Good structured warnings and delegating RL handling to RateLimitHandler.
798-802: Verify product intent for conservative 5xx→None behavior and add explicit test coverage.The concern is valid: the test mocks [Response(500), Response(200, json=[])], demonstrating a scenario where a 5xx error followed by a successful retry should succeed. However, the code at lines 798-802 returns None regardless, discarding any accumulated comments.
The
had_server_errorflag is set when any 5xx occurs (line 751), persists across pagination, and triggers unconditional None return even after successful retry. This is marked "Conservative behavior" but:
- The specific case (5xx→retry success→return None) lacks explicit test validation
- The docstring mentions "unrecoverable" errors, but successfully retried errors are arguably recoverable
- The test at line 182-188 mocks this exact scenario but doesn't assert the return value
Either add an explicit test validating the None return for this case to confirm intent, or reconsider whether returning successful data with a warning (including retry counts) better serves product expectations.
Address all code review feedback from gemini-code-assist and coderabbitai: **Code Quality Improvements (gemini-code-assist):** - Extract secondary rate limit indicators into SECONDARY_RATE_LIMIT_INDICATORS constant - Refactor _primary_rate_limit_delay conditional using not (a or b or c) pattern - Simplify exception handler to use pass with explanatory comment **Critical Security Fix (coderabbitai):** - Add PRIMARY_RATE_LIMIT_MAX_RETRIES constant (default: 3 retries) - Track primary_retry_count in RateLimitHandler instance - Prevent infinite retry loops by checking retry count before allowing retry - Log error and abort after max retries exceeded - Return None to let caller handle via response.raise_for_status() **New Tests:** - test_rate_limit_handler_primary_limit_exhaustion: Verify abort after 3 retries - test_rate_limit_handler_primary_retry_count_tracking: Verify count increments - test_rate_limit_handler_primary_and_secondary_independent: Verify independence **Changes:** src/mcp_github_pr_review/server.py: - Add PRIMARY_RATE_LIMIT_MAX_RETRIES = 3 constant - Add SECONDARY_RATE_LIMIT_INDICATORS tuple constant - Add primary_retry_count attribute to RateLimitHandler - Update _is_secondary_rate_limit to use constant - Improve _primary_rate_limit_delay readability - Add retry exhaustion check in handle_rate_limit - Add retry_attempt to primary limit log metadata tests/test_rate_limit_handling.py: - Add 3 comprehensive tests for primary retry exhaustion (98 lines) **Test Results:** ✅ 353 tests passing (3 new tests added) ✅ All linting checks pass ✅ Type checking passes (mypy) ✅ Format checks pass (ruff) This fix prevents server hang conditions if GitHub continuously returns rate limit responses, improving reliability and preventing resource exhaustion. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
Implement all 4 nitpick suggestions from code review: **1. Refactor test caplog usage (test_graphql_error_handling.py):** - Add logging import at module level to reduce repetition - Use caplog.set_level(logging.INFO) once per test instead of repeated imports - Simplify test setup and improve maintainability **2. Replace print statements in retry helper (_retry_http_request):** - Convert request error prints to structured logging with extra fields - Convert server error prints to structured logging with metadata - Add error, attempt, delay_sec, and status_code to log context **3. Complete structured logging migration in fetch functions:** - GraphQL: Convert "Fetched threads" debug message - GraphQL: Convert success/error messages with owner/repo/pull_number context - REST: Convert "Fetching comments" to debug level - REST: Convert "DEBUG: next_url" to structured debug log - REST: Convert success message with page_count metadata - REST: Convert timeout/error messages with full context - All prints now use logger.debug/info/warning/error with extra fields **4. Add test for persistent primary rate limit (test_rate_limit_handling.py):** - test_rest_primary_rate_limit_persistent_aborts - Verifies abort after PRIMARY_RATE_LIMIT_MAX_RETRIES (3) attempts - Confirms HTTPStatusError raised after retry exhaustion - Validates 4 requests made (initial + 3 retries) and 3 sleeps **Changes:** src/mcp_github_pr_review/server.py: - Replace all diagnostic print statements with structured logging - Add contextual metadata (owner, repo, pull_number, error, attempt, etc.) - Use appropriate log levels (DEBUG, INFO, WARNING, ERROR) tests/test_graphql_error_handling.py: - Add logging import at module level - Simplify caplog setup across 7 test functions tests/test_rate_limit_handling.py: - Add comprehensive test for primary rate limit exhaustion (33 lines) CLAUDE.md: - Document logging best practices for future reference **Test Results:** ✅ 354 tests passing (1 new test added) ✅ All linting checks pass ✅ Type checking passes (mypy) ✅ Format checks pass (ruff) Machine-parseable structured logs now provide complete observability across all fetch operations, retry logic, and error conditions. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
…#50) * feat: implement robust secondary and primary rate limit handling Add comprehensive GitHub rate limit detection and retry logic: - Distinguish between secondary (abuse) and primary rate limits - Handle secondary limits with 60s backoff and one-retry policy - Extract retry timing from Retry-After and X-RateLimit-Reset headers - Log GitHub Request IDs for support escalation - Increase exponential backoff cap from 5s to 15s - Apply rate limit handling to both REST and GraphQL endpoints - Add SecondaryRateLimitError exception for clean abort behavior - Include comprehensive test suite for rate limit scenarios 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
Summary
This PR implements comprehensive GitHub API rate limit handling with differentiation between primary and secondary rate limits, along with production-grade structured logging for observability.
Key Features
RateLimitHandlerclassChanges
1. Rate Limit Detection & Retry (644bf4f)
New Components:
SecondaryRateLimitErrorexception for clean abort behaviorRate Limit Handling:
Improvements:
2. RateLimitHandler Class Extraction (be8ccd3)
Refactoring Benefits:
New Class Structure:
Test Coverage:
RateLimitHandlerclass3. Structured Logging (64c9bdc)
Logging Infrastructure:
Log Levels:
DEBUG: Page fetching progressINFO: Request IDs, max_comments limit reachedWARNING: Rate limit detection, auth fallbackERROR: Rate limit exhaustion, GraphQL errorsUpdated Logs:
Test Updates:
caplogto capture logger output at INFO levelcaplog.textinstead ofcaptured.errTest Results
New Tests:
test_rate_limit_handler_ignores_success_responsetest_rate_limit_handler_secondary_limit_retrytest_rate_limit_handler_secondary_limit_exhaustedtest_rate_limit_handler_primary_limit_retry_aftertest_rate_limit_handler_primary_limit_reset_headertest_rate_limit_handler_custom_backoffBenefits
Operational
Code Quality
Risk Assessment
Risk Level: ✅ Low
Future Enhancements
Follow-up opportunities identified in code review:
SECONDARY_RATE_LIMIT_BACKOFFconfigurable via environment variable🤖 Generated with Claude Code
Co-Authored-By: Claude noreply@anthropic.com
Summary by CodeRabbit
Improvements
Tests
Documentation