Skip to content

OTA-962: Duration of migration to multi-arch #1690

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Nov 5, 2024

Conversation

hongkailiu
Copy link
Member

@openshift-ci-robot
Copy link

openshift-ci-robot commented Oct 1, 2024

@hongkailiu: This pull request references OTA-962 which is a valid jira issue.

Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the story to target the "4.18.0" version, but no target version was set.

In response to this:

/cc @wking @petr-muller

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@openshift-ci-robot openshift-ci-robot added the jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. label Oct 1, 2024
@openshift-ci openshift-ci bot requested review from petr-muller and wking October 1, 2024 20:49
@openshift-ci openshift-ci bot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Oct 1, 2024
@hongkailiu hongkailiu force-pushed the OTA-962 branch 8 times, most recently from 837c46a to b4b8d48 Compare October 10, 2024 20:10
@hongkailiu hongkailiu force-pushed the OTA-962 branch 5 times, most recently from 44fb468 to 35465c8 Compare October 15, 2024 22:33
@hongkailiu hongkailiu changed the title [wip] OTA-962: migrate to multi-arch [wip] OTA-962: Duration of migration to multi-arch Oct 15, 2024
@hongkailiu hongkailiu force-pushed the OTA-962 branch 3 times, most recently from 4601e1d to 2174b1e Compare October 18, 2024 17:20
@hongkailiu hongkailiu force-pushed the OTA-962 branch 3 times, most recently from 430e8b3 to db78e02 Compare October 22, 2024 16:18
@hongkailiu hongkailiu changed the title [wip] OTA-962: Duration of migration to multi-arch OTA-962: Duration of migration to multi-arch Oct 22, 2024
@openshift-ci openshift-ci bot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Oct 22, 2024
where the value is [the image pull specification of an OCP release payload image](https://hypershift-docs.netlify.app/reference/api/#hypershift.openshift.io/v1beta1.Release).

The MCO does not run on a hosted cluster and thus our proposal in this enhancement makes no improvement (or regression) for migration of a hosted cluster to multi-arch. Currently there is no way to discover if a hosted cluster is ready to serve a cross-arch node pool. The requests are mainly from the users of standalone clusters at the moment. When hosted cluster users desire the improvement, we can work with the HyperShift team for a solution.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cc @jeffdyoung this might be needed sooner than later as HCP is in the process of using multi payload as the default

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is already the case. All new ROSA HCP clusters are multi-arch by default. ARO HCP is planning to do this as well.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do I understand this correct?

  • Migration of existing hosted clusters from ROSA to multi-arch has done already. The new ones are multi-arch by default.
  • ARO will catch up with ROSA soon.

Considering ROSA has done it without our fix, ARO is expected to be done smoothly as well.

Then the solution for HCP is not that demanding at the moment which gives us some air to do it in a later phase, e.g., after we handle the standalone clusters.


where the value is [the image pull specification of an OCP release payload image](https://hypershift-docs.netlify.app/reference/api/#hypershift.openshift.io/v1beta1.Release).

The MCO does not run on a hosted cluster and thus our proposal in this enhancement makes no improvement (or regression) for migration of a hosted cluster to multi-arch. Currently there is no way to discover if a hosted cluster is ready to serve a cross-arch node pool. The requests are mainly from the users of standalone clusters at the moment. When hosted cluster users desire the improvement, we can work with the HyperShift team for a solution.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Currently there is no way to discover if a hosted cluster is ready to serve a cross-arch node pool.

This is not true. There is a status condition on the HostedCluster that signals if the HostedCluster can support NodePools of different CPU architectures. (note only one CPU architecture type per NodePool is allowed).

Copy link
Member Author

@hongkailiu hongkailiu Oct 28, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My understanding from this is that "payloadArch" could mean it is or will be ready soon (i.e., still under reconciliation) to serve a secondary arch node pool. And we want a condition that exclude "will be ready".

What did I miss there?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe the statement just needs clarified then. As it is currently stated, it seems like one could read "there is no way to tell if a HostedCluster can support NodePools of different CPU architectures" when there is currently a way - checking that status condition and verifying the HC upgrade to a multi-arch release image completed.

It's more that there is no way to determine if a hosted cluster is ready to serve NodePools of varying CPU architectures in the middle of the upgrade process, is that what is trying to be conveyed here?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated. Let me if I made the correct clarification.

Copy link
Contributor

@Prashanth684 Prashanth684 Oct 29, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@bryan-cox how/when is this payloadArch populated? @hongkailiu when hypershift adds the image version in the status could we use this payloadArch field as a "it is ready" field all the time for the purposes of CVO?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The HyperShift Operator updates the field for the HostedCluster here.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@hongkailiu when hypershift adds the image version in the status could we use this payloadArch field as a "it is ready" field all the time for the purposes of CVO?

Are you suggesting to use payloadArch=Multi to indicate the hosted cluster is ready for creating a cross arch node pool? We cannot because it could mean it-will-be-ready-soon, i.e., not ready yet. Adding status for HyperShift handcrafted COs is not relevant because we are looking for an equivalent signal as co/machine-config finished upgrade and it does not exist on hosted clusters.

@Prashanth684
Copy link
Contributor

/lgtm

@openshift-ci openshift-ci bot added the lgtm Indicates that a PR is ready to be merged. label Nov 4, 2024
Copy link
Contributor

openshift-ci bot commented Nov 4, 2024

@hongkailiu: all tests passed!

Full PR test history. Your PR dashboard.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

Copy link
Contributor

@PratikMahajan PratikMahajan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm

Copy link
Contributor

openshift-ci bot commented Nov 5, 2024

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: PratikMahajan

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-ci openshift-ci bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Nov 5, 2024
@openshift-merge-bot openshift-merge-bot bot merged commit fda5961 into openshift:master Nov 5, 2024
2 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. lgtm Indicates that a PR is ready to be merged.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

9 participants