fix: unreachable validation logic, add API retry, prevent workflow races by Copilot · Pull Request #27 · codigobonito/management

Copilot · 2026-01-22T21:54:21Z

Three production-blocking issues: validation script never executed (logic unreachable), API calls failed on any transient error, and concurrent workflow runs could race.

Changes

Fixed unreachable validation in scripts/validate_pr.py

Main validation logic was trapped inside user_exists() after early returns
Moved to proper main() function—validation now actually runs

Added retry logic to sync scripts

def create_session(token):
    """Create a requests session with retry logic and exponential backoff."""
    session = requests.Session()
    retries = Retry(
        total=RETRY_TOTAL,
        backoff_factor=RETRY_BACKOFF_FACTOR,
        status_forcelist=[429, 500, 502, 503, 504],
        respect_retry_after_header=True
    )
    session.mount("https://", HTTPAdapter(max_retries=retries))
    session.headers.update(auth_headers(token))
    return session

Applied to all API calls in github_to_yaml.py and yaml_to_github.py
Handles rate limits, transient 5xx errors with exponential backoff

Added workflow concurrency control

concurrency:
  group: team-sync-${{ github.repository }}
  cancel-in-progress: false

Prevents simultaneous YAML→GitHub and GitHub→YAML syncs from racing
Jobs queue instead of canceling

Added comprehensive test suite

Created tests/test_validate_pr.py with 10 test cases for validation logic, pagination, and user existence checks
Created tests/test_retry_logic.py with 6 test cases verifying retry configuration consistency and session creation
Added .github/workflows/run-tests.yml for automated testing on PRs and pushes
Added tests/README.md with testing documentation
Updated main README with development and testing instructions
All 16 tests passing ✅

Original prompt

Summary

This PR addresses three critical issues identified during production readiness review that must be fixed before this can be safely used in production.

Critical Fixes Required

1. Fix broken validation script (scripts/validate_pr.py)

Problem: The main validation logic (lines 52-99) is incorrectly indented inside the user_exists() function and never executes. The validation workflow appears to pass but doesn't actually validate anything.

Current broken code:
def user_exists(login: str) -> bool:
    """Check if a GitHub user exists."""
    try:
        r = requests.get(f"{API}/users/{login}", headers=H, timeout=60)
        if r.status_code == 403:
            # Check if it's a rate limit issue
            if (
                "X-RateLimit-Remaining" in r.headers
                and r.headers["X-RateLimit-Remaining"] == "0"
            ):
                print(f"ERROR: GitHub API rate limit exceeded")
                sys.exit(1)
        return r.status_code == 200
    except requests.exceptions.RequestException as e:
        print(f"ERROR: Failed to check user '{login}': {e}")
        sys.exit(1)

    # THIS CODE IS UNREACHABLE - it's inside user_exists() but should be in main()
    # Load teams.yaml
    teams_path = Path("teams.yaml")
    cfg = yaml.safe_load(teams_path.read_text(encoding="utf-8")) or {}
    # ... rest of validation logic that never runs
Fix: Move the main validation logic out of user_exists() and into a proper main() function that gets called.

Commit message: fix(validate): move main logic out of user_exists function

2. Add retry logic with exponential backoff to sync scripts

Problem: Both scripts/github_to_yaml.py and scripts/yaml_to_github.py use raw requests.get() calls that fail immediately on any transient error (503, rate limit, network blip). A single API hiccup crashes the entire sync.

Fix: Add a shared retry mechanism using requests.adapters.HTTPAdapter with urllib3.util.retry.Retry:
from requests.adapters import HTTPAdapter
from urllib3.util.retry import Retry

def create_session(token):
    session = requests.Session()
    retries = Retry(
        total=3,
        backoff_factor=1,
        status_forcelist=[429, 500, 502, 503, 504],
        respect_retry_after_header=True
    )
    session.mount("https://", HTTPAdapter(max_retries=retries))
    session.headers.update(auth_headers(token))
    return session
Apply this to:

scripts/github_to_yaml.py - update paginate() to use session

scripts/yaml_to_github.py - update paginate() and all API calls in reconcile_team(), invite_by_login(), get_user_id() to use session

Commit message: fix(sync): add retry logic with exponential backoff for API calls

3. Add concurrency control to workflows

Problem: If someone merges a YAML change while the hourly export is running (or vice versa), race conditions can occur leading to inconsistent state.

Fix: Add concurrency block to both workflow files:

For .github/workflows/settings-to-yaml-sync.yml:
concurrency:
  group: team-sync-${{ github.repository }}
  cancel-in-progress: false
For .github/workflows/yaml-to-settings-sync.yml:
concurrency:
  group: team-sync-${{ github.repository }}
  cancel-in-progress: false
Using the same group name ensures only one sync runs at a time. cancel-in-progress: false means jobs queue rather than cancel each other.

Commit message: fix(workflows): add concurrency control to prevent race conditions

Commit Strategy

Please make three separate atomic commits in this order:

fix(validate): move main logic out of user_exists function

fix(sync): add retry logic with exponential backoff for API calls

fix(workflows): add concurrency control to prevent race conditions

Each commit should be self-contained and the codebase should work correctly after each commit.

This pull request was created from Copilot chat.

💡 You can make Copilot smarter by setting up custom instructions, customizing its development environment and configuring Model Context Protocol (MCP) servers. Learn more Copilot coding agent tips in the docs.

Co-authored-by: lubianat <7917951+lubianat@users.noreply.github.com>

lubianat · 2026-01-22T22:09:18Z

Without tests it is always harder to approve PRs...

Copilot

Pull request overview

This PR fixes three critical production-blocking issues: unreachable validation logic that prevented PR validation from running, lack of retry handling for API calls that caused failures on transient errors, and missing concurrency control that could lead to race conditions between workflow runs.

Changes:

Moved validation logic from inside user_exists() to a proper main() function so validation actually executes
Added retry logic with exponential backoff to all GitHub API calls in sync scripts
Added concurrency control to both workflow files to prevent simultaneous syncs

Reviewed changes

Copilot reviewed 5 out of 6 changed files in this pull request and generated 1 comment.

Show a summary per file

File	Description
scripts/validate_pr.py	Extracted validation logic into main() function to fix unreachable code
scripts/yaml_to_github.py	Added session-based retry mechanism for all API calls
scripts/github_to_yaml.py	Added session-based retry mechanism for all API calls
.github/workflows/yaml-to-settings-sync.yml	Added concurrency control to prevent race conditions
.github/workflows/settings-to-yaml-sync.yml	Added concurrency control and corrected cron comment

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

.github/workflows/settings-to-yaml-sync.yml

lubianat · 2026-01-23T09:43:17Z

.github/workflows/settings-to-yaml-sync.yml

+    - cron: "00 * * * *" # hourly
+
+concurrency:
+  group: team-sync-${{ github.repository }}


Does this actually provide safety against racing conditions? Say an hourly update and a pull request merge at the same time?

Also, if there are manual settings that have been changed (i.e. the state of the org is different from the previous yaml file) in the same hour a merge happens, there is a racing condition this will not catch. Right?

The concurrency control prevents workflows from running simultaneously, but you're correct that it doesn't fully prevent the race condition you described.

What it prevents: Two workflows starting at the exact same time (e.g., hourly export + manual merge).

What it doesn't prevent: The scenario where:

Export starts at 0:00, reads current GitHub state

YAML merge happens at 0:30 (export still running)

Export completes at 0:45 and creates a PR that might conflict

However, this is partially mitigated because:

The export creates a PR (not direct push), so conflicts are visible during review

The YAML→GitHub sync runs immediately on merge, so the window is small

GitHub's PR merge will detect file conflicts if both modified teams.yaml

The fundamental issue is that this is a bidirectional sync system. True prevention would require distributed locking or a single source of truth. Given the current architecture (GitHub UI changes allowed), the PR-based review is the safety mechanism.

If this is a concern in practice, consider:

Disabling manual GitHub UI changes (YAML-only)

Adding a check in the export to detect if main changed during the run

Implementing optimistic locking with version markers in teams.yaml

The issue is the other way around, and doesnt have to do with concurrency in workflows. The issue is the yaml overriding the manual changes even before a PR is open for the sync

You're absolutely right - I misunderstood the race condition. The actual issue is:

Admin manually adds user X to a team in GitHub UI

Someone merges a YAML change (before hourly export captures user X)

YAML→GitHub sync runs, reads teams.yaml (which lacks user X)

Line 151-165 in yaml_to_github.py removes user X (because it's in GitHub but not YAML)

This is data loss, not just a merge conflict. The concurrency control I added doesn't prevent this.

Per your other comment, this seems like a separate issue from the three production-blocking bugs fixed in this PR (unreachable validation, missing retries, workflow concurrency). The sync race would need architectural changes (e.g., merge instead of replace, version markers, or YAML-only mode).

Co-authored-by: lubianat <7917951+lubianat@users.noreply.github.com>

Copilot · 2026-01-23T09:47:26Z

Without tests it is always harder to approve PRs...

Added comprehensive test suite in commit 584bbb3:

16 test cases covering validation logic, pagination, retry configuration, and session creation
Automatic test runs via GitHub Actions workflow
See tests/README.md for details

All tests passing ✅

lubianat

See comments on sync. May ne separate in other pr

Initial plan

18e7b7c

Copilot AI assigned Copilot and lubianat Jan 22, 2026

Copilot started work on behalf of lubianat January 22, 2026 21:54 View session

Copilot AI and others added 5 commits January 22, 2026 21:56

fix(validate): move main logic out of user_exists function

72c5105

Co-authored-by: lubianat <7917951+lubianat@users.noreply.github.com>

fix(sync): add retry logic with exponential backoff for API calls

ac2138e

Co-authored-by: lubianat <7917951+lubianat@users.noreply.github.com>

chore: add .gitignore for Python cache files

d46f608

fix(workflows): add concurrency control to prevent race conditions

0970f17

Co-authored-by: lubianat <7917951+lubianat@users.noreply.github.com>

refactor: extract retry configuration as constants and fix cron comment

76c8451

Co-authored-by: lubianat <7917951+lubianat@users.noreply.github.com>

Copilot AI changed the title ~~[WIP] Fix broken validation script in production workflow~~ fix: unreachable validation logic, add API retry, prevent workflow races Jan 22, 2026

Copilot AI requested a review from lubianat January 22, 2026 22:03

Copilot finished work on behalf of lubianat January 22, 2026 22:03

lubianat requested a review from Copilot January 22, 2026 22:09

Copilot AI reviewed Jan 22, 2026

View reviewed changes

.github/workflows/settings-to-yaml-sync.yml Show resolved Hide resolved

lubianat requested changes Jan 23, 2026

View reviewed changes

Copilot started work on behalf of lubianat January 23, 2026 09:44 View session

test: add comprehensive test suite for validation and retry logic

584bbb3

Co-authored-by: lubianat <7917951+lubianat@users.noreply.github.com>

Copilot AI requested a review from lubianat January 23, 2026 09:49

Copilot finished work on behalf of lubianat January 23, 2026 09:49

lubianat requested changes Jan 23, 2026

View reviewed changes

Copilot started work on behalf of lubianat January 23, 2026 10:27 View session

Copilot finished work on behalf of lubianat January 23, 2026 10:29

Copilot AI requested a review from lubianat January 23, 2026 10:29

lubianat mentioned this pull request Jan 23, 2026

Fix race condition when manual update and YAML merge happen in the same hour #30

Open

lubianat approved these changes Jan 23, 2026

View reviewed changes

lubianat marked this pull request as ready for review January 23, 2026 11:29

lubianat merged commit 09aa496 into main Jan 23, 2026
2 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: unreachable validation logic, add API retry, prevent workflow races#27

fix: unreachable validation logic, add API retry, prevent workflow races#27
lubianat merged 7 commits intomainfrom
copilot/fix-validation-script-error

Copilot AI commented Jan 22, 2026 •

edited

Loading

Uh oh!

lubianat commented Jan 22, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

lubianat Jan 23, 2026

Uh oh!

Copilot AI Jan 23, 2026

Uh oh!

lubianat Jan 23, 2026

Uh oh!

lubianat Jan 23, 2026

Uh oh!

Copilot AI Jan 23, 2026

Uh oh!

Copilot AI commented Jan 23, 2026

Uh oh!

lubianat left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

Copilot AI commented Jan 22, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Changes

Summary

Critical Fixes Required

1. Fix broken validation script (scripts/validate_pr.py)

2. Add retry logic with exponential backoff to sync scripts

3. Add concurrency control to workflows

Commit Strategy

Uh oh!

lubianat commented Jan 22, 2026

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

lubianat Jan 23, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Jan 23, 2026

Choose a reason for hiding this comment

Uh oh!

lubianat Jan 23, 2026

Choose a reason for hiding this comment

Uh oh!

lubianat Jan 23, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Jan 23, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI commented Jan 23, 2026

Uh oh!

lubianat left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Copilot AI commented Jan 22, 2026 •

edited

Loading

1. Fix broken validation script (`scripts/validate_pr.py`)