Skip to content

Feature specification for in-place upgrade of Radius #85

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 3 commits into
base: main
Choose a base branch
from

Conversation

ytimocin
Copy link
Contributor

@ytimocin ytimocin commented Mar 5, 2025

No description provided.

@ytimocin ytimocin force-pushed the ytimocin/feature/upgrades branch 6 times, most recently from 61da8dc to 88314f8 Compare March 11, 2025 17:56
2. **Fetch available chart versions**: Provide a list of known chart versions so the desired version that the users select is a valid one.
3. **Dry-run** (when requested): Simulate the upgrade, logging steps without making changes. Also making sure that the upgrade will work. Helm has this feature available in the `helm upgrade` command: <https://helm.sh/docs/helm/helm_upgrade/>.
4. **Snapshot**: Automatically back up current data (e.g., etcd, resources in the API server, or Postgres) before making changes.
5. **Upgrade**: Apply necessary Helm changes (including timeouts, set args, etc.), optionally perform database migrations if needed.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

how do we know when to perform database migrations? if we introduce a breaking change to one of our schemas, is there an automatic way to detect and upgrade? probably not

Copy link
Contributor

@kachawla kachawla left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How do we handle in progress deployments when an upgrade is initiated?

@ytimocin ytimocin marked this pull request as ready for review March 18, 2025 19:33
@ytimocin ytimocin requested a review from a team as a code owner March 18, 2025 19:33
@ytimocin ytimocin requested a review from a team as a code owner March 18, 2025 19:33
@ytimocin ytimocin force-pushed the ytimocin/feature/upgrades branch 3 times, most recently from d9143ab to 624a1ad Compare March 18, 2025 21:52
@ytimocin ytimocin force-pushed the ytimocin/feature/upgrades branch from 624a1ad to ed05ea2 Compare April 7, 2025 18:19

- **Downgrade Support:**
- Should we support downgrading to previous versions? If yes, what are the limitations?
- How should we handle cases where users attempt to downgrade to versions that don't support the upgrade feature itself?
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's reasonable to only allow downgrades up to the first version that supported upgrades

4. **Snapshot**: Automatically back up current data (e.g., etcd, resources in the API server, or Postgres) before making changes.
5. **Upgrade**: Apply necessary Helm changes (including timeouts, set args, etc.), optionally perform database migrations if needed.
6. **Rollback** (on failure): If something goes wrong, use the snapshot to restore the prior state.
7. **Post-upgrade checks**: Validate that new control plane components are healthy and confirm the upgrade was successful.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We've gotten some feedback from community users that they might have to roll back the upgrade even after the post-upgrade checks have passed (perhaps their own tests fail in the upgraded version) -- is it possible to do a rollback on the snapshot even after the upgrade successfully completes?

Signed-off-by: ytimocin <ytimocin@microsoft.com>
@ytimocin ytimocin force-pushed the ytimocin/feature/upgrades branch 3 times, most recently from 432f53c to 82361ea Compare June 2, 2025 17:59
Signed-off-by: ytimocin <ytimocin@microsoft.com>
@ytimocin ytimocin force-pushed the ytimocin/feature/upgrades branch from 82361ea to ee3c7fa Compare June 2, 2025 18:02
Signed-off-by: ytimocin <ytimocin@microsoft.com>
@ytimocin ytimocin force-pushed the ytimocin/feature/upgrades branch from 0da75a2 to 2f847c2 Compare June 3, 2025 18:46
- Upgrading the Radius control plane using Helm directly. We can run `helm upgrade` on the Radius Helm installation but that is not going to put all the necessary pieces together for the control plane to work. Making this work is not in the scope of this work.
- Zero-downtime control plane upgrades. While we aim to minimize disruption, guaranteeing absolutely no downtime for control plane components is not a goal for this initial release.
- Automatic CLI upgrades. Users must manually update their local CLI version after upgrading the control plane.
- Direct GitOps workflow integration for version 1. While users who manage Radius through HelmReleases in their GitOps pipeline will be able to update Helm charts, the complete upgrade process (including preflight checks, locking, and health verification) requires the `rad upgrade kubernetes` command in this initial version. Future versions will provide better GitOps integration options.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

per @willdavsmith this is in scope now

Comment on lines +205 to +207
### GitOps Workflow Integration

In future versions, we plan to enhance GitOps integration to support users who manage Radius through HelmReleases as part of their GitOps workflow. This will include developing a Kubernetes operator that watches for HelmRelease changes and automatically performs the necessary upgrade procedures including preflight checks, locking, and health verification. This integration will allow teams to manage Radius upgrades through their existing GitOps pipelines without requiring manual CLI commands.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is in scope now, per @willdavsmith

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants