You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
**Timeouts:** callers must supply a context with a finite deadline (e.g. 2 min) to avoid blocking forever.
381
-
**Stale‑lock detection:** each lock has a TTL/heartbeat; expired leases are auto‑cleaned before AcquireLock.
382
+
**Stale-lock detection:** each lock has a TTL/heartbeat; expired leases are auto-cleaned before AcquireLock.
382
383
Force cleanup: --force flag allows manual removal of stale/orphaned locks.
383
384
384
385
Usage in CLI commands:
@@ -431,7 +432,7 @@ Checks will include:
431
432
3. Database connectivity
432
433
4. Custom configuration validation
433
434
434
-
**User Data Backup and Restore System:**
435
+
**[Future Version]User Data Backup and Restore System:**
435
436
436
437
Rather than taking complete snapshots of the underlying databases (etcd/PostgreSQL), we'll implement a more targeted approach that backs up only the user application metadata and configuration that Radius manages:
Timeout time.Duration// Maximum time allowed for upgrade
475
476
476
-
EnableUserDataBackup bool // Whether automatic user data backup is enabled
477
-
BackupID string // ID of user data backup to use for recovery
477
+
EnableUserDataBackupbool//Future Version: Whether automatic user data backup is enabled
478
+
BackupIDstring//Future Version: ID of user data backup to use for recovery
478
479
}
479
480
```
480
481
@@ -513,6 +514,7 @@ const (
513
514
1.**Flexibility**: Support for custom configuration parameters allows adaptation to different environments
514
515
1.**Transparency**: Clear, step-by-step output keeps users informed of the upgrade process
515
516
1.**Consistency**: Ensures all Radius components are upgraded together to compatible versions
517
+
1.**Safety**: Comprehensive preflight checks prevent upgrades in unsuitable conditions, while built-in user data backup and restore capabilities ensure user data is protected during upgrades
516
518
517
519
#### Disadvantages of this approach
518
520
@@ -553,28 +555,31 @@ The implementation will primarily focus on the following components:
553
555
1.**Upgrade Command**: The `rad upgrade kubernetes` command implementation in the CLI codebase
554
556
2.**Version Validation**: Logic to verify compatibility between versions
555
557
3.**Lock Mechanism**: Data-store-level distributed locking system
556
-
4. **Backup/Restore**: User data protection system using ConfigMaps/PVs
557
-
5. **Helm Integration**: Enhanced wrapper around Helm's upgrade capabilities
558
-
6. **Health Verification**: Component readiness and health check mechanisms
558
+
4.**Preflight Checks**: Validation system to ensure prerequisites are met before upgrade
559
+
5.**[Future Version] Backup/Restore**: User data protection system using ConfigMaps/PVs
560
+
6.**Helm Integration**: Enhanced wrapper around Helm's upgrade capabilities
561
+
7.**Health Verification**: Component readiness and health check mechanisms
559
562
560
563
All components will follow Radius coding standards and include comprehensive unit tests.
561
564
562
565
### Error Handling
563
566
564
567
The upgrade process will implement the following error handling strategies:
565
568
566
-
1. **Pre-flight Validation**: Catch incompatibility issues before starting the upgrade
567
-
2. **Graceful Timeouts**: All operations will respect user-defined or default timeouts
568
-
3. **Automatic Rollback**: Failed upgrades trigger automatic restoration of previous state
569
-
4. **Detailed Error Reporting**: Clear error messages with troubleshooting guidance
570
-
5. **Idempotent Operations**: Commands can be safely retried after addressing issues
571
-
6. **Resource Cleanup**: Temporary resources created during the upgrade are properly removed
569
+
1.**Pre-flight Validation**: Catch incompatibility issues before starting the upgrade.
570
+
2.**Graceful Timeouts**: All operations will respect user-defined or default timeouts.
571
+
3.**Helm-based Rollback**: For version 1, failed upgrades will leverage Helm's built-in rollback capability to revert Kubernetes resources to their previous state. Note that this does not include restoration of any user data that might have been modified during the failed upgrade attempt. Full user data backup and restore capabilities will be added in a future version.
572
+
4.**Detailed Error Reporting**: Clear error messages with troubleshooting guidance.
573
+
5.**Idempotent Operations**: Commands can be safely retried after addressing issues.
574
+
6.**Resource Cleanup**: Temporary resources created during the upgrade are properly removed.
572
575
573
576
## Test Plan
574
577
575
578
### Unit Tests
576
579
577
580
- Test each interface implementation independently
581
+
- Test each preflight check with various input scenarios (pass/fail/warning)
582
+
- Test preflight check registry with multiple checks of different severities
578
583
579
584
### Integration Tests
580
585
@@ -617,54 +622,51 @@ The following outlines the key implementation steps required to deliver the Radi
617
622
618
623
- Implement the upgrade functionality in the Radius Helm client: [helmclient.go](https://github.com/radius-project/radius/blob/main/pkg/cli/helm/helmclient.go).
619
624
- Add unit tests to validate Helm upgrade logic.
620
-
- This task can be worked on in parallel with items 2-4. It is a blocker for item 6.
625
+
- This task can be worked on in parallel with items 2-3, 4-5. It is a blocker for item 6.
621
626
622
627
2.**Radius Contour Client Updates**
623
628
624
629
- Implement the upgrade functionality in the Radius Contour client: [contourclient.go](https://github.com/radius-project/radius/blob/main/pkg/cli/helm/contourclient.go).
625
630
- Add unit tests to verify correct behavior.
626
-
- This task can be worked on in parallel with items 1, 3-4. It is a blocker for item 6.
631
+
- This task can be worked on in parallel with items 1, 3, 4-5. It is a blocker for item 6.
627
632
628
633
3.**Cluster Upgrade Interface**
629
634
630
635
- Extend the existing cluster management interface ([cluster.go](https://github.com/radius-project/radius/blob/main/pkg/cli/helm/cluster.go#L249)) to include a new method for upgrading Radius.
631
636
- Implement this method in all relevant interface implementations.
632
637
- Integrate with version validation and custom configuration handling.
633
638
- Add comprehensive unit tests for this functionality.
634
-
- This task can be worked on in parallel with items 1-2, 4. It is a blocker for item 6.
635
-
636
-
4. **User Data Backup and Restore Interfaces**
637
-
638
-
- Define two new interfaces in the `components/database` package:
639
-
- `UserDataBackup`: Responsible for creating backups of user data before the upgrade.
640
-
- `UserDataRestore`: Responsible forrestoring data from the backupincase of rollback.
641
-
- Design versioned backup formats to handle schema migrations between versions.
642
-
- This task can be worked on in parallel with items 1-3. It's a blocker for item 5.
643
-
644
-
5. **User Data Backup and Restore Implementation**
645
-
646
-
- Implement the backup and restore interfaces in the following data store implementations:
- Test edge cases where no previous version is recorded.
721
742
722
-
### Version-4: Skip versions during `rad upgrade kubernetes`
743
+
####Skip versions during `rad upgrade kubernetes`
723
744
724
745
1.**Skip-Aware Pre-flight Checks**
725
746
@@ -739,14 +760,14 @@ The following outlines the key implementation steps required to deliver the Radi
739
760
740
761
4.**Automated Integration Tests**
741
762
742
-
- Cover a variety of version skip paths in CI (adjacent vs. multi‑minor).
763
+
- Cover a variety of version skip paths in CI (adjacent vs. multi-minor).
743
764
- Fail if any migration or Helm chart upgrade in the skip path is missing.
744
765
745
-
### Version 5: Support for Air-Gapped Environments
766
+
####Support for Air-Gapped Environments
746
767
747
768
This can be discussed later.
748
769
749
-
### Version 6: Upgrading Radius on other platforms like `rad upgrade aci`
770
+
####Upgrading Radius on other platforms like `rad upgrade aci`
750
771
751
772
This can be discussed later.
752
773
@@ -758,13 +779,13 @@ This can be discussed later.
758
779
759
780
### Implementation Risks and Mitigations
760
781
761
-
- **Backup Reliability**: User data backup and restore mechanisms must be thoroughly tested to ensure reliability. Consider edge cases such as backup corruption or restoration failures.
782
+
-**Rollback Reliability**: Helm-based rollback mechanisms should be thoroughly tested to ensure they can return the control plane to a working state if upgrades fail.
762
783
-**Lock Persistence**: Ensure upgrade locks have proper timeout mechanisms to avoid permanently locked systems if a process terminates unexpectedly.
763
784
764
785
### Testing Strategy
765
786
766
-
- **Unit Tests**: Cover all new code paths, especially backup and restore logic, upgrade logic, and error handling.
767
-
- **Functional Tests**: Validate end-to-end upgrade scenarios, including successful upgrades, upgrades with custom configurations, failure scenarios, and rollback procedures.
787
+
-**Unit Tests**: Cover all new code paths, especially version validation, upgrade logic, lock mechanisms, and error handling.
788
+
-**Functional Tests**: Validate end-to-end upgrade scenarios, including successful upgrades, upgrades with custom configurations, failure scenarios, and Helm-based rollback procedures.
768
789
-**Compatibility Tests**: Verify compatibility between different Radius CLI versions and control plane components.
0 commit comments