Overview
These are medium-severity reliability improvements identified during code review of issues #22 and #23. They are real issues but orthogonal to the crash bugs and schema changes, so they should be addressed in a separate PR.
Items
1. _retry_counts not persisted — retry state lost on resume
Location: orchestrator.py — _retry_counts = {} is initialized in init
Problem: VERIFY_PUSH retry tracking is not persisted. On resume, the counter resets to 0, allowing infinite retry loops if the orchestrator is stopped and resumed multiple times during VERIFY_PUSH.
Proposed fix: Persist retry counts in the tasks table or a separate table, or encode as JSON in a single column.
2. GitHub API retry logic for transient failures
Location: github_api.py — _run_gh() has no retry logic
Problem: Network hiccups, GitHub rate limits, and transient API errors cause immediate failure. No exponential backoff or retry mechanism.
Proposed fix: Add retry with exponential backoff (3-5 attempts) for transient errors (5xx, rate limit, network errors) in _run_gh().
3. Silent GraphQL API errors in review comment fetching
Location: github_api.py — get_unresolved_review_comments() and get_latest_copilot_review_thread_ts()
Problem: GraphQL errors are logged but functions return empty list / None instead of raising, masking real API failures. The orchestrator silently treats these as "no comments" or "no review".
Proposed fix: Raise GitHubAPIError when GraphQL returns errors, let callers handle appropriately.
Acceptance Criteria
- Each fix has tests
- No regressions in existing behavior
🤖 autopilot-loop
Overview
These are medium-severity reliability improvements identified during code review of issues #22 and #23. They are real issues but orthogonal to the crash bugs and schema changes, so they should be addressed in a separate PR.
Items
1. _retry_counts not persisted — retry state lost on resume
Location: orchestrator.py — _retry_counts = {} is initialized in init
Problem: VERIFY_PUSH retry tracking is not persisted. On resume, the counter resets to 0, allowing infinite retry loops if the orchestrator is stopped and resumed multiple times during VERIFY_PUSH.
Proposed fix: Persist retry counts in the tasks table or a separate table, or encode as JSON in a single column.
2. GitHub API retry logic for transient failures
Location: github_api.py — _run_gh() has no retry logic
Problem: Network hiccups, GitHub rate limits, and transient API errors cause immediate failure. No exponential backoff or retry mechanism.
Proposed fix: Add retry with exponential backoff (3-5 attempts) for transient errors (5xx, rate limit, network errors) in _run_gh().
3. Silent GraphQL API errors in review comment fetching
Location: github_api.py — get_unresolved_review_comments() and get_latest_copilot_review_thread_ts()
Problem: GraphQL errors are logged but functions return empty list / None instead of raising, masking real API failures. The orchestrator silently treats these as "no comments" or "no review".
Proposed fix: Raise GitHubAPIError when GraphQL returns errors, let callers handle appropriately.
Acceptance Criteria
🤖 autopilot-loop