Skip to content

Further improvements: GitHub API retry logic and retry state persistence #74

@chanakyav

Description

@chanakyav

Overview

These are medium-severity reliability improvements identified during code review of issues #22 and #23. They are real issues but orthogonal to the crash bugs and schema changes, so they should be addressed in a separate PR.

Items

1. _retry_counts not persisted — retry state lost on resume

Location: orchestrator.py — _retry_counts = {} is initialized in init

Problem: VERIFY_PUSH retry tracking is not persisted. On resume, the counter resets to 0, allowing infinite retry loops if the orchestrator is stopped and resumed multiple times during VERIFY_PUSH.

Proposed fix: Persist retry counts in the tasks table or a separate table, or encode as JSON in a single column.

2. GitHub API retry logic for transient failures

Location: github_api.py — _run_gh() has no retry logic

Problem: Network hiccups, GitHub rate limits, and transient API errors cause immediate failure. No exponential backoff or retry mechanism.

Proposed fix: Add retry with exponential backoff (3-5 attempts) for transient errors (5xx, rate limit, network errors) in _run_gh().

3. Silent GraphQL API errors in review comment fetching

Location: github_api.py — get_unresolved_review_comments() and get_latest_copilot_review_thread_ts()

Problem: GraphQL errors are logged but functions return empty list / None instead of raising, masking real API failures. The orchestrator silently treats these as "no comments" or "no review".

Proposed fix: Raise GitHubAPIError when GraphQL returns errors, let callers handle appropriately.

Acceptance Criteria

  • Each fix has tests
  • No regressions in existing behavior

🤖 autopilot-loop

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions