Centralize Claude Workflows – Replace Pilot from dotcms/infrastructure-as-code #1

sfreudenthaler · 2025-06-27T20:35:49Z

PR: Centralize Claude Workflows – Replace Pilot from `dotcms/infrastructure-as-code`

Overview

This PR introduces a centralized, reusable Claude workflow system designed to replace the pilot implementation previously used in the dotcms/infrastructure-as-code repository.

Related PR to cleanup the IaC repo side.

Why This Change?

Centralization & DRY:
By consolidating all Claude-related GitHub Actions into a single, reusable orchestrator and executor pattern, we eliminate duplication and make ongoing maintenance much easier.
Replace the Pilot:
The previous approach in dotcms/infrastructure-as-code was a proof-of-concept. This PR formalizes and generalizes that approach for use across all repositories.
Maintain Flexibility:
While the core logic is now centralized, each repository can still customize prompts, allowed tools, runner selection, and other parameters as needed—ensuring flexibility is not sacrificed for maintainability.

What’s Changed?

Removed:
The standalone claude-code-review.yml workflow, as its functionality is now fully covered by the orchestrator + executor pattern.
Added/Updated:
- claude-orchestrator.yml: Routes all Claude triggers (PRs, issues, comments, reviews) to the correct execution mode.
- claude-executor.yml: Handles the actual execution of Claude actions, with configurable parameters.
Documentation:
Updated to reflect the new centralized approach and how to customize at the repo level.

Benefits

Single source of truth for Claude automation logic.
Easier updates—fixes and improvements are made in one place.
Repo-level customization remains fully supported via workflow inputs.
Cleaner, more maintainable codebase.

Migration Notes

Repositories previously using the pilot workflow in dotcms/infrastructure-as-code should now reference the centralized orchestrator workflow.
See the updated README for instructions on customizing the workflow for your repository’s needs.

In summary:
This PR replaces the pilot Claude workflow with a robust, maintainable, and flexible system that can be easily adopted across all repositories, keeping things DRY while supporting per-repo customization.

@claude

Add centralized, reusable GitHub Actions workflows for Claude AI automation: - claude-executor.yml: Configurable execution engine with custom tool allowlists, timeouts, and runner specifications - claude-orchestrator.yml: Orchestrates all Claude triggers (comments, PRs, issues) with configurable prompts and conditional logic Key features: - Parameterized inputs for repository-specific customization - Support for interactive (@claude mentions) and automatic modes - Configurable allowed tools for different repository types - Flexible timeout and runner configuration - Comprehensive trigger event handling This enables DRY implementation of Claude workflows across all dotCMS repositories while maintaining repository-specific configurations. Usage example: ```yaml jobs: claude: uses: dotCMS/claude-workflows/.github/workflows/claude-orchestrator.yml@main with: allowed_tools: | Bash(terraform plan) Bash(git status) secrets: ANTHROPIC_API_KEY: ${{ secrets.ANTHROPIC_API_KEY }} ``` 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <noreply@anthropic.com> Resolves: dotCMS/dotcat#213

Add GitHub Actions workflow to validate YAML syntax and workflow structure before PRs can be merged. Includes yamllint configuration for consistent formatting. 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <noreply@anthropic.com>

- Add reusable claude-code-review.yml workflow with configurable review focus - Require each repository to provide its own ANTHROPIC_API_KEY secret - Implement cost tracking and security isolation through per-repo API keys - Add comprehensive documentation with setup instructions - Include example workflow for consuming repositories - Enhance test validation to check secret requirements in reusable workflows 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <noreply@anthropic.com>

- Remove trailing spaces from all workflow files - Add missing newlines at end of files - Increase yamllint line-length limit to 240 for shell commands - Fix formatting in claude-code-review.yml, claude-executor.yml, claude-orchestrator.yml, test.yml, and examples 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <noreply@anthropic.com>

…ting for actionlint, clarify migration and usage

not used anymore

yolabingo

🔥

Root cause: /tmp directory is NOT cleaned between jobs in GitHub Actions, causing temp files to persist and contaminate subsequent workflow runs. This caused two critical bugs: - Bug #1: /tmp/validation_failed.txt persisted from previous runs, causing valid PRs to fail with exit code 1 - Bug #2: /tmp/new_images.txt and /tmp/old_images.txt accumulated data from multiple test PRs, causing wrong images to be processed Solution: Add explicit cleanup at the start of each step that uses temp files. Changes: 1. validate-changed-files: Clean /tmp/disallowed_files.txt 2. validate-image-only-changed: Clean /tmp/validation_failed.txt, /tmp/new_images.txt, /tmp/old_images.txt 3. validate-images: Clean /tmp/validation_failed.txt Fixes: Bug #1 - Exit code 1 on successful validations Fixes: Bug #2 - Wrong image processed (Test Case #6) Related: Test Cases #1, #2, #6

## Bug Fixes: Temp File Cleanup (Bug #1 and Bug #2) Fixes critical bugs by adding explicit cleanup of temp files at the start of each job. **Root Cause:** /tmp directory persists between jobs in GitHub Actions. **Bugs Fixed:** - Bug #1: Exit code 1 on successful validations (Test Cases #1, #2) - Bug #2: Wrong image processed (Test Case #6) **Solution:** Add `rm -f` cleanup at start of each step using temp files. --- Will recreate tag v1.1.1 after merge to avoid changing caller workflows.

BREAKING CHANGES: - verify_image_existence now defaults to true (was false in v1.x) ✨ Added: - Robust version comparison with rebuild number and hash support - Improved registry validation with private registry fallback - Enhanced error reporting with accumulated failures - Comprehensive CHANGELOG documenting all fixes 🔧 Changed: - Complete replacement of temporary files with bash arrays - Added strict error handling (set -euo pipefail) to all scripts - Better error messages for all validation failures 🐛 Fixed: - Bug #1: Rebuild downgrade detection (25.12.08-2 → 25.12.08 now blocked) - Bug #2: Temporary file race conditions and persistence issues - Bug #3: Image existence validation for private registries - Bug #4: Silent failures in validation loops - Bug #5: Version pattern validation edge cases 🔒 Security: - Strict error handling prevents silent failures - Eliminated temporary file security concerns - Better input validation before processing 📝 Details: This is a complete architectural refactor addressing all known bugs in the deployment guard workflow. The main improvements are: 1. Version Comparison: Now properly handles: - Base version (YY.MM.DD) - Rebuild numbers (-N suffix) - Commit hashes (_hash suffix) - All combinations of the above 2. State Management: Replaced all temporary files with bash arrays: - Before: /tmp/validation_failed.txt, /tmp/new_images.txt - After: VALIDATION_FAILED, FAILED_IMAGES, NEW_IMAGES arrays - Eliminates race conditions and cleanup issues 3. Registry Validation: Now supports both public and private registries - Tries Docker Hub first - Falls back to full image path for private registries See CHANGELOG.md for complete migration guide and bug details.

#19) ## 🎯 Overview Complete architectural refactor of the Deployment Guard workflow to v2.0.0, eliminating all fragile temporary file-based state management and fixing critical validation bugs. ## 🚨 Breaking Changes - **`verify_image_existence` now defaults to `true`** (was `false` in v1.x) - This was always the intended behavior but was disabled due to bugs - If you want to disable, explicitly set to `false` in your workflow configuration ## ✨ What's New ### 1. Robust Version Comparison Complete rewrite of anti-downgrade logic with proper handling of: - Base version comparison (YY.MM.DD format) - Rebuild number comparison (e.g., `-2` in `25.12.08-2`) - Hash comparison (e.g., `_abc123` in `25.12.08_abc123`) - Full support for all combinations: `25.12.08`, `25.12.08-2`, `25.12.08_abc`, `25.12.08-2_abc` ### 2. Improved Registry Validation - Tries Docker Hub first for canonical images - Falls back to full image path for private registries - Better error messages indicating which registry was checked - Handles mirror registries gracefully ### 3. Enhanced Error Reporting - State variables accumulate ALL validation failures before exiting - Detailed failure reasons for each failed image/file - Clear indication of which validation step failed and why ## 🐛 Bugs Fixed ### Bug #1: Rebuild Downgrade Not Detected (Critical) **Issue**: v1.x allowed downgrade from `25.12.08-2` to `25.12.08` **Root Cause**: Only compared base version (YY.MM.DD), ignored rebuild numbers **Fix**: Now extracts and compares rebuild numbers when base version is the same **Examples**: - ❌ v1.x: `25.12.08-2` → `25.12.08` = ✅ Allowed (BUG) - ✅ v2.0.0: `25.12.08-2` → `25.12.08` = ❌ Blocked (CORRECT) - ✅ v2.0.0: `25.12.08` → `25.12.08-2` = ✅ Allowed (upgrade) - ✅ v2.0.0: `25.12.08-2_abc` → `25.12.08-2_xyz` = ✅ Allowed (same version, different hash) ### Bug #2: Temporary File Race Conditions **Issue**: Race conditions with `/tmp/validation_failed.txt` file **Root Cause**: Multiple writes to same file, manual cleanup required **Fix**: Eliminated ALL temporary files, using in-memory bash arrays ### Bug #3: Image Existence Check Failures **Issue**: Validation failed for valid private registry images **Root Cause**: Only checked Docker Hub canonical image **Fix**: Now tries Docker Hub first, then falls back to full image path ### Bug #4: Silent Failures in Validation Loops **Issue**: Validation could continue after failures **Root Cause**: Lack of strict error handling **Fix**: Added `set -euo pipefail` to all bash scripts ### Bug #5: Version Pattern Validation Edge Cases **Issue**: Malformed tags could pass validation **Root Cause**: Regex didn't enforce proper boundaries **Fix**: Improved regex validation with proper format checks ## 🔧 Technical Changes ### State Management Architecture **Before (v1.x)**: Used temporary files ```bash echo "false" > /tmp/validation_failed.txt echo "$image" >> /tmp/new_images.txt [ -f /tmp/validation_failed.txt ] && exit 1 ``` **After (v2.0.0)**: Uses bash arrays ```bash VALIDATION_FAILED=false FAILED_IMAGES=() FAILED_IMAGES+=("$image: reason") if [ "$VALIDATION_FAILED" = "true" ]; then printf ' - %s\n' "${FAILED_IMAGES[@]}" exit 1 fi ``` ### Error Handling All bash scripts now use strict mode: ```bash set -euo pipefail ``` ## 📝 Documentation Added comprehensive CHANGELOG.md with: - Complete bug details and technical explanations - Migration guide from v1.x to v2.0.0 - Version support matrix - Testing recommendations ## 🧪 Testing Recommended testing approach: ```yaml uses: dotCMS/ai-workflows/.github/workflows/deployment-guard.yml@v2.0.0 with: testing_force_non_bypass: true # Force validation even for org members verify_image_existence: true # Now enabled by default ``` ## 📊 Test Cases Covered | Scenario | v1.x | v2.0.0 | |----------|------|--------| | `25.12.08-2` → `25.12.08` | ✅ (bug) | ❌ (correct) | | `25.12.08` → `25.12.08-2` | ✅ | ✅ | | `25.12.08-2` → `25.12.08-3` | ✅ | ✅ | | `25.12.08_abc` → `25.12.08_xyz` | ✅ | ✅ | | `25.12.07` → `25.12.08` | ✅ | ✅ | | `25.12.08` → `25.12.07` | ❌ | ❌ | | Private registry image | ❌ (bug) | ✅ (fixed) | ## 🔄 Migration Path 1. **Week 1**: Deploy v2.0.0 to staging/dev 2. **Week 2**: Monitor and validate test cases 3. **Week 3**: Deploy to production 4. **Week 4**: Deprecate v1.x ## 📚 Related Issues Fixes bugs reported in Deutsche Bank infrastructure validation. ## ✅ Checklist - [x] All temporary files eliminated - [x] Rebuild downgrade detection fixed - [x] Private registry support added - [x] Strict error handling implemented - [x] Comprehensive CHANGELOG created - [x] All validation cases tested - [x] Breaking changes documented ---

sfreudenthaler and others added 10 commits June 27, 2025 16:34

fix: break up long lines and add newline for YAML lint compliance

1ccb79d

fix: yamllint compliance (bracket spacing, trailing spaces)

f672ef3

newline at end

cc3e9b7

add codeowners

83c80be

cleanup

684c9ac

rename

3684c31

sfreudenthaler changed the title ~~feat: add initial reusable Claude AI workflow templates~~ Centralize Claude Workflows – Replace Pilot from dotcms/infrastructure-as-code Jul 2, 2025

sfreudenthaler enabled auto-merge (squash) July 2, 2025 19:17

sfreudenthaler and others added 6 commits July 2, 2025 15:34

doc updates

8e2726c

add diagram

2e50bd0

upgraded to actions specific linter

8b7a7f9

fix: make orchestrator executor reference static, update docs and lin…

04bb618

…ting for actionlint, clarify migration and usage

Delete .yamllint.yml

1c81ca2

not used anymore

chore: add actionlint.yml for custom actionlint configuration

9457f44

yolabingo approved these changes Jul 2, 2025

View reviewed changes

sfreudenthaler merged commit 38e1e40 into main Jul 2, 2025
2 checks passed

sfreudenthaler deleted the feat/initial-reusable-workflows branch July 2, 2025 20:19

dcolina mentioned this pull request Dec 16, 2025

fix: cleanup temp files to resolve Bug #1 and Bug #2 #18

Merged

dcolina mentioned this pull request Dec 16, 2025

feat: refactor deployment-guard to v1.1.2 with robust state management #19

Merged

7 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Centralize Claude Workflows – Replace Pilot from dotcms/infrastructure-as-code #1

Centralize Claude Workflows – Replace Pilot from dotcms/infrastructure-as-code #1

Uh oh!

sfreudenthaler commented Jun 27, 2025 •

edited

Loading

Uh oh!

yolabingo left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Centralize Claude Workflows – Replace Pilot from dotcms/infrastructure-as-code #1

Centralize Claude Workflows – Replace Pilot from dotcms/infrastructure-as-code #1

Uh oh!

Conversation

sfreudenthaler commented Jun 27, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

PR: Centralize Claude Workflows – Replace Pilot from dotcms/infrastructure-as-code

Overview

Why This Change?

What’s Changed?

Benefits

Migration Notes

Uh oh!

yolabingo left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

sfreudenthaler commented Jun 27, 2025 •

edited

Loading

PR: Centralize Claude Workflows – Replace Pilot from `dotcms/infrastructure-as-code`