-
Notifications
You must be signed in to change notification settings - Fork 189
enable two node cluster deployment #3671
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
|
/assign @travisn @malayparida2000 /cherry-pick release-4.21 |
|
@parth-gr: once the present PR merges, I will cherry-pick it on top of DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
it is needed to run the floating mon with the other mon deployment Also made the mgr count:1 And also make the default max replica count 2 Signed-off-by: parth-gr <partharora1010@gmail.com>
|
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: parth-gr The full list of commands accepted by this bot can be found here. DetailsNeeds approval from an approver in each of these files:Approvers can indicate their approval by writing |
| } | ||
|
|
||
| // IsTwoNodeDeployment returns true if cluster has only two nodes. | ||
| func IsTwoNodeDeployment(nodeCount int) bool { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think we should rely on this to confirm if it's a two node deployment. This can be dangerous as someone may try to deploy ODF on two nodes even when it's not a TNF cluster.
A better way IMO would be to either utilise a ENV var for two node deployment similar to Single Node Deployment or detect two node deployment by getting the clusterversion CR & checking it's topology field. We already get the clusterversion CR in the operator code so there won't be performance loss issue as well.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
checking it's topology field.
what info it will provide?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
here is the output
rider:Downloads$ oc get clusterversion -o yaml
apiVersion: v1
items:
- apiVersion: config.openshift.io/v1
kind: ClusterVersion
metadata:
creationTimestamp: "2026-01-22T16:21:56Z"
generation: 2
name: version
resourceVersion: "1448558"
uid: c1718d72-7559-4d39-9ad2-25f297abb7a8
spec:
channel: stable-4.20
clusterID: 59e5579a-f880-4f1a-ae6b-b97b51ff0063
status:
availableUpdates:
- channels:
- candidate-4.20
- candidate-4.21
- candidate-4.22
- eus-4.20
- fast-4.20
- stable-4.20
image: quay.io/openshift-release-dev/ocp-release@sha256:2d228e6d0b5a5ef2d7eb40bc171ad44f06b990d7adb678914e5d9d047e72568d
url: https://access.redhat.com/errata/RHBA-2026:370
version: 4.20.10
- channels:
- candidate-4.20
- candidate-4.21
- candidate-4.22
- eus-4.20
- fast-4.20
- stable-4.20
image: quay.io/openshift-release-dev/ocp-release@sha256:91606a5f04331ed3293f71034d4f480e38645560534805fe5a821e6b64a3f203
url: https://access.redhat.com/errata/RHBA-2025:23103
version: 4.20.8
- channels:
- candidate-4.20
- candidate-4.21
- candidate-4.22
- eus-4.20
- fast-4.20
- stable-4.20
image: quay.io/openshift-release-dev/ocp-release@sha256:24da924c84a1dfa28525f85525356cf1ac4fbe23faec7c66d1890e0b3bcba7a0
url: https://access.redhat.com/errata/RHSA-2025:19890
version: 4.20.3
- channels:
- candidate-4.20
- candidate-4.21
- candidate-4.22
- eus-4.20
- fast-4.20
- stable-4.20
image: quay.io/openshift-release-dev/ocp-release@sha256:0e232879e27fb821eeb1d0e34f9bd8f85e28533836e59cc7fee96fcc9f3851cd
url: https://access.redhat.com/errata/RHSA-2025:19296
version: 4.20.2
- channels:
- candidate-4.20
- candidate-4.21
- candidate-4.22
- eus-4.20
- fast-4.20
- stable-4.20
image: quay.io/openshift-release-dev/ocp-release@sha256:cbde13fe6ed4db88796be201fbdb2bbb63df5763ae038a9eb20bc793d5740416
url: https://access.redhat.com/errata/RHSA-2025:19003
version: 4.20.1
capabilities:
enabledCapabilities:
- Build
- CSISnapshot
- CloudControllerManager
- CloudCredential
- Console
- DeploymentConfig
- ImageRegistry
- Ingress
- Insights
- MachineAPI
- NodeTuning
- OperatorLifecycleManager
- OperatorLifecycleManagerV1
- Storage
- baremetal
- marketplace
- openshift-samples
knownCapabilities:
- Build
- CSISnapshot
- CloudControllerManager
- CloudCredential
- Console
- DeploymentConfig
- ImageRegistry
- Ingress
- Insights
- MachineAPI
- NodeTuning
- OperatorLifecycleManager
- OperatorLifecycleManagerV1
- Storage
- baremetal
- marketplace
- openshift-samples
conditionalUpdates:
- conditions:
- lastTransitionTime: "2026-01-22T16:22:21Z"
message: Some runc 1.2 releases fail to launch containers in some Pods where
shareProcessNamespace is explicitly set true. https://issues.redhat.com/browse/RUN-3748
reason: RuncShareProcessNamespace
status: "False"
type: Recommended
release:
channels:
- candidate-4.20
- candidate-4.21
- candidate-4.22
- eus-4.20
- fast-4.20
- stable-4.20
image: quay.io/openshift-release-dev/ocp-release@sha256:a29bcbc9f286d68b394ffa0288c5de7e487c90077c06cbaf7a4cadeb0398ce28
url: https://access.redhat.com/errata/RHSA-2025:22257
version: 4.20.6
risks:
- matchingRules:
- type: Always
message: Some runc 1.2 releases fail to launch containers in some Pods where
shareProcessNamespace is explicitly set true.
name: RuncShareProcessNamespace
url: https://issues.redhat.com/browse/RUN-3748
- conditions:
- lastTransitionTime: "2026-01-22T16:22:21Z"
message: Some runc 1.2 releases fail to launch containers in some Pods where
shareProcessNamespace is explicitly set true. https://issues.redhat.com/browse/RUN-3748
reason: RuncShareProcessNamespace
status: "False"
type: Recommended
release:
channels:
- candidate-4.20
- candidate-4.21
- candidate-4.22
- eus-4.20
- fast-4.20
- stable-4.20
image: quay.io/openshift-release-dev/ocp-release@sha256:c1568bf00f149d16b4cbe5cd8aedf3bef110c1460a91f81688aca8e338806a2c
url: https://access.redhat.com/errata/RHBA-2025:21811
version: 4.20.5
risks:
- matchingRules:
- type: Always
message: Some runc 1.2 releases fail to launch containers in some Pods where
shareProcessNamespace is explicitly set true.
name: RuncShareProcessNamespace
url: https://issues.redhat.com/browse/RUN-3748
- conditions:
- lastTransitionTime: "2026-01-22T16:22:21Z"
message: Some runc 1.2 releases fail to launch containers in some Pods where
shareProcessNamespace is explicitly set true. https://issues.redhat.com/browse/RUN-3748
reason: RuncShareProcessNamespace
status: "False"
type: Recommended
release:
channels:
- candidate-4.20
- candidate-4.21
- candidate-4.22
- eus-4.20
- fast-4.20
- stable-4.20
image: quay.io/openshift-release-dev/ocp-release@sha256:5b87a665045cdfe0a1b271024be936a0c46de17b25a112d6a136c5af89d861c4
url: https://access.redhat.com/errata/RHBA-2025:21228
version: 4.20.4
risks:
- matchingRules:
- type: Always
message: Some runc 1.2 releases fail to launch containers in some Pods where
shareProcessNamespace is explicitly set true.
name: RuncShareProcessNamespace
url: https://issues.redhat.com/browse/RUN-3748
conditions:
- lastTransitionTime: "2026-01-22T16:22:22Z"
status: "True"
type: RetrievedUpdates
- lastTransitionTime: "2026-01-22T16:22:22Z"
message: |-
Multiple cluster operators should not be upgraded between minor versions:
* Cluster operator config-operator should not be upgraded between minor versions: FeatureGates_RestrictedFeatureGates_TechPreviewNoUpgrade: FeatureGatesUpgradeable: "TechPreviewNoUpgrade" does not allow updates
* Cluster operator etcd should not be upgraded between minor versions: UnsupportedConfigOverrides_UnsupportedConfigOverridesSet: UnsupportedConfigOverridesUpgradeable: setting: [useExternalEtcdSupport useUnsupportedUnsafeEtcdContainerRemoval]
reason: ClusterOperatorsNotUpgradeable
status: "False"
type: Upgradeable
- lastTransitionTime: "2026-01-22T16:22:22Z"
message: Capabilities match configured spec
reason: AsExpected
status: "False"
type: ImplicitlyEnabledCapabilities
- lastTransitionTime: "2026-01-22T16:22:22Z"
message: Payload loaded version="4.20.0" image="quay.io/openshift-release-dev/ocp-release@sha256:d1dc76522d1e235b97675b28e977cb8c452f47d39c0eb519cde02114925f91d2"
architecture="amd64"
reason: PayloadLoaded
status: "True"
type: ReleaseAccepted
- lastTransitionTime: "2026-01-22T16:48:48Z"
message: Done applying 4.20.0
status: "True"
type: Available
- lastTransitionTime: "2026-01-27T05:46:18Z"
status: "False"
type: Failing
- lastTransitionTime: "2026-01-22T16:48:48Z"
message: Cluster version is 4.20.0
status: "False"
type: Progressing
desired:
channels:
- candidate-4.20
- candidate-4.21
- candidate-4.22
- eus-4.20
- fast-4.20
- stable-4.20
image: quay.io/openshift-release-dev/ocp-release@sha256:d1dc76522d1e235b97675b28e977cb8c452f47d39c0eb519cde02114925f91d2
url: https://access.redhat.com/errata/RHSA-2025:9562
version: 4.20.0
history:
- completionTime: "2026-01-22T16:48:48Z"
image: quay.io/openshift-release-dev/ocp-release@sha256:d1dc76522d1e235b97675b28e977cb8c452f47d39c0eb519cde02114925f91d2
startedTime: "2026-01-22T16:22:22Z"
state: Completed
verified: false
version: 4.20.0
observedGeneration: 2
versionHash: PF7438UmreY=
kind: List
metadata:
resourceVersion: ""
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hm, Seems like this does not has any info regd that. Can you please check the infrastructure CR
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nothing like that
rider:Downloads$ oc get infrastructure -o yaml
apiVersion: v1
items:
- apiVersion: config.openshift.io/v1
kind: Infrastructure
metadata:
creationTimestamp: "2026-01-22T16:21:50Z"
generation: 1
name: cluster
resourceVersion: "543"
uid: 8bdcd1a4-b9ab-4e5f-87aa-8d2608613de8
spec:
cloudConfig:
name: ""
platformSpec:
type: None
status:
apiServerInternalURI: https://api-int.2nodehp-test.hubcluster-1.lab.eng.cert.redhat.com:6443
apiServerURL: https://api.2nodehp-test.hubcluster-1.lab.eng.cert.redhat.com:6443
controlPlaneTopology: DualReplica
cpuPartitioning: None
etcdDiscoveryDomain: ""
infrastructureName: 2nodehp-test-spxcz
infrastructureTopology: HighlyAvailable
platform: None
platformStatus:
type: None
kind: List
metadata:
resourceVersion: ""
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe you are looking for controlPlaneTopology: DualReplica
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, This controlPlaneTopology: DualReplica can give us a definite sign of this being a TNF cluster
| } | ||
|
|
||
| // IsTwoNodeDeployment returns true if cluster has only two nodes. | ||
| func IsTwoNodeDeployment(nodeCount int) bool { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The name of the func can also be more indicative of the actual intention like IsOCPTNFDeployment or something else like that instead of just being TwoNodeDeployment
| // cluster-wide encryption is enabled or any of the device set is encrypted | ||
| // ie, sc.Spec.Encryption.ClusterWide/sc.Spec.Encryption.Enable is True or any device is encrypted | ||
| // and KMS ConfigMap is available | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unrelated change
| return 1 | ||
| } | ||
| if statusutil.IsTwoNodeDeployment(nodeCount) { | ||
| return 2 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think it's necessary. This AFAIK reflects minimum Replica in a deviceSet. In case of TNF these are baremetal clusters so the DeviceSet may look like Count:2, Replica:1. Please cross check
Part of https://issues.redhat.com/browse/RHSTOR-8071 Solution
it is needed to run the floating mon with the
other mon deployment
Also made the mgr count:1
And also make the default max replica count 2