-
Notifications
You must be signed in to change notification settings - Fork 164
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implement Clustering and App related functions #4407
base: master
Are you sure you want to change the base?
Implement Clustering and App related functions #4407
Conversation
naiming-zededa
commented
Oct 30, 2024
- change VMI to VMI ReplicaSet for kubernetes
- change Pod to Pod RelicaSet for containers
- change functions handling replicaset names in services
- subscribe EdgeNodeInfo in domainmgr, zedmanager to get node-name for cluster
- add Designated Node ID to several structs for App
- not to delete domain from kubernetes if not a Designated App node
- parse config for EdgeNodeClusterConfig in zedagent
- handle ENClusterAppStatus publication in zedmanger in multi-node clustering case
- zedmanager handling effective-activation include ENClusterAppStatus
- kubevirt hypervisor changes to handle VMI/Pod ReplicaSets
631cc70
to
1aed95e
Compare
case <-stillRunning.C: | ||
} | ||
ps.StillRunning(agentName, warningTime, errorTime) | ||
if time.Since(wtTime) > 5*time.Minute { // wait for max of 5 minutes |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do you only want to wait for 5 minutes? Shouldn't zedagent always publish something even if there is no network connectivity to the controller for instance?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, i was sure about if it does in all the cases. I'll remove this 5 min limit.
} | ||
if err := hyper.Task(status).Cleanup(status.DomainName); err != nil { | ||
log.Errorf("failed to cleanup domain: %s (%v)", status.DomainName, err) | ||
// in cluster mode, we can not delete the pod due to failing to get app info |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Was the issue this is fixing something appearing when there is a failover/takeover and another node in the cluster starts running the app instance?
Or is it something which could happen when an app instance is first provisioned on the cluster?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This can happen even on the first node of the app deployment, sometimes we can not get the status from the k3s cluster, or somethings it takes time to come up running state, but we should not remove this kubernetes configuration, it has the config stored in the database, it has it's own scheduling and control process to eventually bring it to the intended state. If we delete the config from the cluster, then we need to wait for another 10 minutes to retry, etc. and it will cause confusing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
so, a new boolean is introduced in the domainstatus, DomainConfigDeleted, allow the Designated node, if it knows for sure the app instance is removed from the device, then it can go ahead to delete the app/domain from the cluster.
1aed95e
to
2e3e102
Compare
- change VMI to VMI ReplicaSet for kubernetes - change Pod to Pod RelicaSet for containers - change functions handling replicaset names in services - subscribe EdgeNodeInfo in domainmgr, zedmanager to get node-name for cluster - add Designated Node ID to several structs for App - not to delete domain from kubernetes if not a Designated App node - parse config for EdgeNodeClusterConfig in zedagent - handle ENClusterAppStatus publication in zedmanger in multi-node clustering case - zedmanager handling effective-activation include ENClusterAppStatus - kubevirt hypervisor changes to handle VMI/Pod ReplicaSets Signed-off-by: Naiming Shen <naiming@zededa.com>
2e3e102
to
8a13e80
Compare