Skip to content

Conversation

@dcolina
Copy link
Member

@dcolina dcolina commented Dec 16, 2025

🎯 Overview

Complete architectural refactor of the Deployment Guard workflow to v2.0.0, eliminating all fragile temporary file-based state management and fixing critical validation bugs.

🚨 Breaking Changes

  • verify_image_existence now defaults to true (was false in v1.x)
    • This was always the intended behavior but was disabled due to bugs
    • If you want to disable, explicitly set to false in your workflow configuration

✨ What's New

1. Robust Version Comparison

Complete rewrite of anti-downgrade logic with proper handling of:

  • Base version comparison (YY.MM.DD format)
  • Rebuild number comparison (e.g., -2 in 25.12.08-2)
  • Hash comparison (e.g., _abc123 in 25.12.08_abc123)
  • Full support for all combinations: 25.12.08, 25.12.08-2, 25.12.08_abc, 25.12.08-2_abc

2. Improved Registry Validation

  • Tries Docker Hub first for canonical images
  • Falls back to full image path for private registries
  • Better error messages indicating which registry was checked
  • Handles mirror registries gracefully

3. Enhanced Error Reporting

  • State variables accumulate ALL validation failures before exiting
  • Detailed failure reasons for each failed image/file
  • Clear indication of which validation step failed and why

🐛 Bugs Fixed

Bug #1: Rebuild Downgrade Not Detected (Critical)

Issue: v1.x allowed downgrade from 25.12.08-2 to 25.12.08
Root Cause: Only compared base version (YY.MM.DD), ignored rebuild numbers
Fix: Now extracts and compares rebuild numbers when base version is the same

Examples:

  • ❌ v1.x: 25.12.08-225.12.08 = ✅ Allowed (BUG)
  • ✅ v2.0.0: 25.12.08-225.12.08 = ❌ Blocked (CORRECT)
  • ✅ v2.0.0: 25.12.0825.12.08-2 = ✅ Allowed (upgrade)
  • ✅ v2.0.0: 25.12.08-2_abc25.12.08-2_xyz = ✅ Allowed (same version, different hash)

Bug #2: Temporary File Race Conditions

Issue: Race conditions with /tmp/validation_failed.txt file
Root Cause: Multiple writes to same file, manual cleanup required
Fix: Eliminated ALL temporary files, using in-memory bash arrays

Bug #3: Image Existence Check Failures

Issue: Validation failed for valid private registry images
Root Cause: Only checked Docker Hub canonical image
Fix: Now tries Docker Hub first, then falls back to full image path

Bug #4: Silent Failures in Validation Loops

Issue: Validation could continue after failures
Root Cause: Lack of strict error handling
Fix: Added set -euo pipefail to all bash scripts

Bug #5: Version Pattern Validation Edge Cases

Issue: Malformed tags could pass validation
Root Cause: Regex didn't enforce proper boundaries
Fix: Improved regex validation with proper format checks

🔧 Technical Changes

State Management Architecture

Before (v1.x): Used temporary files

echo "false" > /tmp/validation_failed.txt
echo "$image" >> /tmp/new_images.txt
[ -f /tmp/validation_failed.txt ] && exit 1

After (v2.0.0): Uses bash arrays

VALIDATION_FAILED=false
FAILED_IMAGES=()
FAILED_IMAGES+=("$image: reason")
if [ "$VALIDATION_FAILED" = "true" ]; then
  printf '   - %s\n' "${FAILED_IMAGES[@]}"
  exit 1
fi

Error Handling

All bash scripts now use strict mode:

set -euo pipefail

📝 Documentation

Added comprehensive CHANGELOG.md with:

  • Complete bug details and technical explanations
  • Migration guide from v1.x to v2.0.0
  • Version support matrix
  • Testing recommendations

🧪 Testing

Recommended testing approach:

uses: dotCMS/ai-workflows/.github/workflows/deployment-guard.yml@v2.0.0
with:
  testing_force_non_bypass: true  # Force validation even for org members
  verify_image_existence: true    # Now enabled by default

📊 Test Cases Covered

Scenario v1.x v2.0.0
25.12.08-225.12.08 ✅ (bug) ❌ (correct)
25.12.0825.12.08-2
25.12.08-225.12.08-3
25.12.08_abc25.12.08_xyz
25.12.0725.12.08
25.12.0825.12.07
Private registry image ❌ (bug) ✅ (fixed)

🔄 Migration Path

  1. Week 1: Deploy v2.0.0 to staging/dev
  2. Week 2: Monitor and validate test cases
  3. Week 3: Deploy to production
  4. Week 4: Deprecate v1.x

📚 Related Issues

Fixes bugs reported in Deutsche Bank infrastructure validation.

✅ Checklist

  • All temporary files eliminated
  • Rebuild downgrade detection fixed
  • Private registry support added
  • Strict error handling implemented
  • Comprehensive CHANGELOG created
  • All validation cases tested
  • Breaking changes documented

BREAKING CHANGES:
- verify_image_existence now defaults to true (was false in v1.x)

✨ Added:
- Robust version comparison with rebuild number and hash support
- Improved registry validation with private registry fallback
- Enhanced error reporting with accumulated failures
- Comprehensive CHANGELOG documenting all fixes

🔧 Changed:
- Complete replacement of temporary files with bash arrays
- Added strict error handling (set -euo pipefail) to all scripts
- Better error messages for all validation failures

🐛 Fixed:
- Bug #1: Rebuild downgrade detection (25.12.08-2 → 25.12.08 now blocked)
- Bug #2: Temporary file race conditions and persistence issues
- Bug #3: Image existence validation for private registries
- Bug #4: Silent failures in validation loops
- Bug #5: Version pattern validation edge cases

🔒 Security:
- Strict error handling prevents silent failures
- Eliminated temporary file security concerns
- Better input validation before processing

📝 Details:
This is a complete architectural refactor addressing all known bugs in
the deployment guard workflow. The main improvements are:

1. Version Comparison: Now properly handles:
   - Base version (YY.MM.DD)
   - Rebuild numbers (-N suffix)
   - Commit hashes (_hash suffix)
   - All combinations of the above

2. State Management: Replaced all temporary files with bash arrays:
   - Before: /tmp/validation_failed.txt, /tmp/new_images.txt
   - After: VALIDATION_FAILED, FAILED_IMAGES, NEW_IMAGES arrays
   - Eliminates race conditions and cleanup issues

3. Registry Validation: Now supports both public and private registries
   - Tries Docker Hub first
   - Falls back to full image path for private registries

See CHANGELOG.md for complete migration guide and bug details.
@dcolina dcolina requested review from a team as code owners December 16, 2025 14:18
- Group multiple echo statements before redirecting to GITHUB_OUTPUT
- Improves code quality and follows shellcheck best practices
- No functional changes, only style improvements
- Changed from v2.0.0 to v1.1.2 (hotfix release)
- No breaking changes - all fixes are for bugs that never worked
- Updated migration guide to reflect seamless upgrade
- Updated version support matrix

This is a hotfix release, not a major version bump, because:
- Bugs were present since v1.0.0 and never worked correctly
- No API changes or new features
- Only consumer is deutschebank-infrastructure
- Maintains backward compatibility
@dcolina dcolina merged commit 9e1db62 into main Dec 16, 2025
3 checks passed
@dcolina dcolina deleted the fix/deployment-guard-v2-refactor branch December 16, 2025 14:46
@dcolina dcolina changed the title feat: refactor deployment-guard to v2.0.0 with robust state management feat: refactor deployment-guard to v1.1.2 with robust state management Dec 16, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant