Skip to content

Incarnation becomes unrecoverable when change history contains invalid commit SHAs #584

@peterbraden

Description

@peterbraden

Problem

An incarnation can enter a deadlock state where:

GET /api/incarnations/{id} returns an error: "the given change is in an incomplete state (commit_pushed=False). Try POST .../fix"
Calling the suggested fix endpoint returns {"detail": "Change not found"} (404)

All mutation endpoints (PUT, PATCH, POST .../changes) refuse to operate because they require the latest change to be complete
There is no API path to recover — the only option is direct database manipulation

Root causes

Two scenarios can produce this state:

  1. Force-push or revert rewrites repository history

When someone force-pushes to the incarnation repository or reverts commits, the commit_sha values stored in foxops' change table become invalid — those commits no longer exist in the repo. If multiple revisions are affected:

The fix endpoint deletes the latest broken change (revision N), but the previous revision (N-1) is also broken.

Revision N-1 may have commit_pushed=True (it was completed before the force-push), so the fix endpoint considers it "already complete" and does nothing — even though its SHA points at a nonexistent commit.

The incarnation is now stuck: GET reports an error, fix can't help.

  1. Stale foxops update branch not cleaned up

The branch name for MR changes is deterministic: foxops/update-to-{target_dir_hash}-{version}. When an MR is merged, GitLab deletes the source branch (via remove_source_branch), but if the MR is closed without merging, or merged outside GitLab's MR flow, the branch persists.

When a subsequent update targets the same version (e.g. with different template data), _prepared_change_environment generates the same branch name. The local git checkout -b succeeds (the GitLab clone is --depth=1 and doesn't see the remote branch), and the DB row is created with commit_pushed=False. But git push fails because the branch already exists on the remote with different history, triggering RebaseRequiredError.

The retry loop in _push_change_commit_and_update_database does git pull --rebase, which rebases the new commit onto the stale branch's history. This can:

  • Drop the commit entirely (if the rebase determines the changes are already applied), leaving the DB with a SHA that doesn't exist
  • Produce conflicts and exhaust all 10 retries, at which point the change is deleted — but if the process crashes or times out before cleanup, the orphaned commit_pushed=False row remains
    Succeed but with a different SHA that gets recorded via update_commit_sha, creating a mismatch

In all cases, the result is the same unrecoverable state.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions