Implement Clustering and App related functions #4407

naiming-zededa · 2024-10-30T18:44:25Z

change VMI to VMI ReplicaSet for kubernetes
change Pod to Pod RelicaSet for containers
change functions handling replicaset names in services
subscribe EdgeNodeInfo in domainmgr, zedmanager to get node-name for cluster
add Designated Node ID to several structs for App
not to delete domain from kubernetes if not a Designated App node
parse config for EdgeNodeClusterConfig in zedagent
handle ENClusterAppStatus publication in zedmanger in multi-node clustering case
zedmanager handling effective-activation include ENClusterAppStatus
kubevirt hypervisor changes to handle VMI/Pod ReplicaSets

eriknordmark · 2024-11-07T09:54:41Z

pkg/pillar/cmd/domainmgr/domainmgr.go

 		case <-stillRunning.C:
 		}
 		ps.StillRunning(agentName, warningTime, errorTime)
+		if time.Since(wtTime) > 5*time.Minute { // wait for max of 5 minutes


Why do you only want to wait for 5 minutes? Shouldn't zedagent always publish something even if there is no network connectivity to the controller for instance?

Ok, i was sure about if it does in all the cases. I'll remove this 5 min limit.

eriknordmark · 2024-11-07T09:56:53Z

pkg/pillar/cmd/domainmgr/domainmgr.go

-			}
-			if err := hyper.Task(status).Cleanup(status.DomainName); err != nil {
-				log.Errorf("failed to cleanup domain: %s (%v)", status.DomainName, err)
+			// in cluster mode, we can not delete the pod due to failing to get app info


Was the issue this is fixing something appearing when there is a failover/takeover and another node in the cluster starts running the app instance?
Or is it something which could happen when an app instance is first provisioned on the cluster?

This can happen even on the first node of the app deployment, sometimes we can not get the status from the k3s cluster, or somethings it takes time to come up running state, but we should not remove this kubernetes configuration, it has the config stored in the database, it has it's own scheduling and control process to eventually bring it to the intended state. If we delete the config from the cluster, then we need to wait for another 10 minutes to retry, etc. and it will cause confusing.

so, a new boolean is introduced in the domainstatus, DomainConfigDeleted, allow the Designated node, if it knows for sure the app instance is removed from the device, then it can go ahead to delete the app/domain from the cluster.

- change VMI to VMI ReplicaSet for kubernetes - change Pod to Pod RelicaSet for containers - change functions handling replicaset names in services - subscribe EdgeNodeInfo in domainmgr, zedmanager to get node-name for cluster - add Designated Node ID to several structs for App - not to delete domain from kubernetes if not a Designated App node - parse config for EdgeNodeClusterConfig in zedagent - handle ENClusterAppStatus publication in zedmanger in multi-node clustering case - zedmanager handling effective-activation include ENClusterAppStatus - kubevirt hypervisor changes to handle VMI/Pod ReplicaSets Signed-off-by: Naiming Shen <naiming@zededa.com>

naiming-zededa requested review from OhmSpectator, rene, rouming, milan-zededa and eriknordmark as code owners October 30, 2024 18:44

github-actions bot requested review from christoph-zededa, jsfakian, rucoder, shjala and uncleDecart October 30, 2024 18:44

naiming-zededa force-pushed the naiming-cluster-hypervisor branch 4 times, most recently from 631cc70 to 1aed95e Compare November 2, 2024 18:50

eriknordmark reviewed Nov 7, 2024

View reviewed changes

naiming-zededa force-pushed the naiming-cluster-hypervisor branch from 1aed95e to 2e3e102 Compare November 7, 2024 18:50

github-actions bot requested a review from eriknordmark November 7, 2024 18:51

naiming-zededa force-pushed the naiming-cluster-hypervisor branch from 2e3e102 to 8a13e80 Compare November 7, 2024 21:24

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Implement Clustering and App related functions #4407

Implement Clustering and App related functions #4407

naiming-zededa commented Oct 30, 2024

eriknordmark Nov 7, 2024

naiming-zededa Nov 7, 2024

eriknordmark Nov 7, 2024

naiming-zededa Nov 7, 2024

naiming-zededa Nov 7, 2024 •

edited

Loading

Implement Clustering and App related functions #4407

Are you sure you want to change the base?

Implement Clustering and App related functions #4407

Conversation

naiming-zededa commented Oct 30, 2024

eriknordmark Nov 7, 2024

Choose a reason for hiding this comment

naiming-zededa Nov 7, 2024

Choose a reason for hiding this comment

eriknordmark Nov 7, 2024

Choose a reason for hiding this comment

naiming-zededa Nov 7, 2024

Choose a reason for hiding this comment

naiming-zededa Nov 7, 2024 • edited Loading

Choose a reason for hiding this comment

naiming-zededa Nov 7, 2024 •

edited

Loading