Skip to content

fix: persist retry state, add API retry logic, raise on GraphQL errors#76

Merged
chanakyav merged 1 commit intomainfrom
fix/issue-74-retry-reliability
Mar 25, 2026
Merged

fix: persist retry state, add API retry logic, raise on GraphQL errors#76
chanakyav merged 1 commit intomainfrom
fix/issue-74-retry-reliability

Conversation

@chanakyav
Copy link
Copy Markdown
Owner

Summary

Addresses all three items in issue #74 — retry state persistence, GitHub API retry logic, and silent GraphQL error handling.

Changes

1. Persist _retry_counts for crash recovery (schema v8)

  • persistence.py: Added retry_counts_json TEXT column. Schema v7 to v8, migration entry, _TASK_COLUMNS updated.
  • orchestrator.py: __init__ loads _retry_counts from task["retry_counts_json"] (JSON deserialized). After incrementing a retry counter in _do_verify_push() or _do_verify_pr(), persists via update_task(task_id, retry_counts_json=json.dumps(...)).

Before: If the orchestrator crashed and resumed during VERIFY_PUSH, _retry_counts reset to {}, allowing infinite retry loops.

2. Exponential backoff retry in _run_gh()

  • github_api.py: _run_gh() now retries up to 3 times with exponential backoff (1s, 2s, 4s) for transient errors.
  • Transient errors detected by matching stderr against patterns: rate limit, abuse detection, server error, 502, 503, 504, timed out, connection refused, connection reset, network is unreachable.
  • Non-transient errors (404, auth, etc.) fail immediately with no retry.
  • check=False callers are unaffected — they still get empty stdout on failure.

3. Raise GitHubAPIError on GraphQL errors

  • github_api.py: get_unresolved_review_comments() and get_latest_copilot_review_thread_ts() now raise GitHubAPIError when GraphQL returns errors with no usable data.
  • Partial errors (errors + data present) are logged as warnings but data is still returned.
  • The orchestrator's existing try/except around these calls will catch and handle the new exceptions.

Tests Added (10 new)

  • test_retry_counts_json_persists + migration test updated (test_persistence.py)
  • test_retries_on_rate_limit (test_github_api.py)
  • test_no_retry_on_permanent_error (test_github_api.py)
  • test_retries_on_server_error (test_github_api.py)
  • test_gives_up_after_max_retries (test_github_api.py)
  • test_check_false_no_raise (test_github_api.py)
  • test_unresolved_comments_raises_on_graphql_error_no_data (test_github_api.py)
  • test_unresolved_comments_partial_error_returns_data (test_github_api.py)
  • test_latest_thread_ts_raises_on_graphql_error_no_data (test_github_api.py)
  • test_latest_thread_ts_partial_error_returns_data (test_github_api.py)

Verification

  • ruff check src/ tests/ — zero errors
  • pytest tests/ — 293 tests passed, zero failures, no regressions

Closes #74


🤖 autopilot-loop

- Persist _retry_counts as retry_counts_json column (schema v8)
  Survives crash/resume, prevents infinite retry loops
- Add exponential backoff retry to _run_gh() for transient errors
  (rate limits, 5xx, network timeouts) — up to 3 retries
- Raise GitHubAPIError on GraphQL errors with no data in
  get_unresolved_review_comments() and get_latest_copilot_review_thread_ts()
  instead of silently returning empty results

Closes #74
@chanakyav chanakyav merged commit 8fcdd7e into main Mar 25, 2026
3 checks passed
@chanakyav chanakyav deleted the fix/issue-74-retry-reliability branch March 25, 2026 01:53
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Further improvements: GitHub API retry logic and retry state persistence

1 participant