fix: unreachable validation logic, add API retry, prevent workflow races#27
fix: unreachable validation logic, add API retry, prevent workflow races#27
Conversation
Co-authored-by: lubianat <7917951+lubianat@users.noreply.github.com>
Co-authored-by: lubianat <7917951+lubianat@users.noreply.github.com>
Co-authored-by: lubianat <7917951+lubianat@users.noreply.github.com>
Co-authored-by: lubianat <7917951+lubianat@users.noreply.github.com>
|
Without tests it is always harder to approve PRs... |
There was a problem hiding this comment.
Pull request overview
This PR fixes three critical production-blocking issues: unreachable validation logic that prevented PR validation from running, lack of retry handling for API calls that caused failures on transient errors, and missing concurrency control that could lead to race conditions between workflow runs.
Changes:
- Moved validation logic from inside
user_exists()to a propermain()function so validation actually executes - Added retry logic with exponential backoff to all GitHub API calls in sync scripts
- Added concurrency control to both workflow files to prevent simultaneous syncs
Reviewed changes
Copilot reviewed 5 out of 6 changed files in this pull request and generated 1 comment.
Show a summary per file
| File | Description |
|---|---|
| scripts/validate_pr.py | Extracted validation logic into main() function to fix unreachable code |
| scripts/yaml_to_github.py | Added session-based retry mechanism for all API calls |
| scripts/github_to_yaml.py | Added session-based retry mechanism for all API calls |
| .github/workflows/yaml-to-settings-sync.yml | Added concurrency control to prevent race conditions |
| .github/workflows/settings-to-yaml-sync.yml | Added concurrency control and corrected cron comment |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| - cron: "00 * * * *" # hourly | ||
|
|
||
| concurrency: | ||
| group: team-sync-${{ github.repository }} |
There was a problem hiding this comment.
Does this actually provide safety against racing conditions? Say an hourly update and a pull request merge at the same time?
Also, if there are manual settings that have been changed (i.e. the state of the org is different from the previous yaml file) in the same hour a merge happens, there is a racing condition this will not catch. Right?
There was a problem hiding this comment.
The concurrency control prevents workflows from running simultaneously, but you're correct that it doesn't fully prevent the race condition you described.
What it prevents: Two workflows starting at the exact same time (e.g., hourly export + manual merge).
What it doesn't prevent: The scenario where:
- Export starts at 0:00, reads current GitHub state
- YAML merge happens at 0:30 (export still running)
- Export completes at 0:45 and creates a PR that might conflict
However, this is partially mitigated because:
- The export creates a PR (not direct push), so conflicts are visible during review
- The YAML→GitHub sync runs immediately on merge, so the window is small
- GitHub's PR merge will detect file conflicts if both modified teams.yaml
The fundamental issue is that this is a bidirectional sync system. True prevention would require distributed locking or a single source of truth. Given the current architecture (GitHub UI changes allowed), the PR-based review is the safety mechanism.
If this is a concern in practice, consider:
- Disabling manual GitHub UI changes (YAML-only)
- Adding a check in the export to detect if main changed during the run
- Implementing optimistic locking with version markers in teams.yaml
There was a problem hiding this comment.
The issue is the other way around, and doesnt have to do with concurrency in workflows. The issue is the yaml overriding the manual changes even before a PR is open for the sync
There was a problem hiding this comment.
The issue is the other way around, and doesnt have to do with concurrency in workflows. The issue is the yaml overriding the manual changes even before a PR is open for the sync
There was a problem hiding this comment.
You're absolutely right - I misunderstood the race condition. The actual issue is:
- Admin manually adds user X to a team in GitHub UI
- Someone merges a YAML change (before hourly export captures user X)
- YAML→GitHub sync runs, reads teams.yaml (which lacks user X)
- Line 151-165 in
yaml_to_github.pyremoves user X (because it's in GitHub but not YAML)
This is data loss, not just a merge conflict. The concurrency control I added doesn't prevent this.
Per your other comment, this seems like a separate issue from the three production-blocking bugs fixed in this PR (unreachable validation, missing retries, workflow concurrency). The sync race would need architectural changes (e.g., merge instead of replace, version markers, or YAML-only mode).
Co-authored-by: lubianat <7917951+lubianat@users.noreply.github.com>
Added comprehensive test suite in commit 584bbb3:
All tests passing ✅ |
lubianat
left a comment
There was a problem hiding this comment.
See comments on sync. May ne separate in other pr
Three production-blocking issues: validation script never executed (logic unreachable), API calls failed on any transient error, and concurrent workflow runs could race.
Changes
Fixed unreachable validation in
scripts/validate_pr.pyuser_exists()after early returnsmain()function—validation now actually runsAdded retry logic to sync scripts
github_to_yaml.pyandyaml_to_github.pyAdded workflow concurrency control
Added comprehensive test suite
tests/test_validate_pr.pywith 10 test cases for validation logic, pagination, and user existence checkstests/test_retry_logic.pywith 6 test cases verifying retry configuration consistency and session creation.github/workflows/run-tests.ymlfor automated testing on PRs and pushestests/README.mdwith testing documentationOriginal prompt
This pull request was created from Copilot chat.
💡 You can make Copilot smarter by setting up custom instructions, customizing its development environment and configuring Model Context Protocol (MCP) servers. Learn more Copilot coding agent tips in the docs.