Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement Clustering and App related functions #4407

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

naiming-zededa
Copy link
Contributor

  • change VMI to VMI ReplicaSet for kubernetes
  • change Pod to Pod RelicaSet for containers
  • change functions handling replicaset names in services
  • subscribe EdgeNodeInfo in domainmgr, zedmanager to get node-name for cluster
  • add Designated Node ID to several structs for App
  • not to delete domain from kubernetes if not a Designated App node
  • parse config for EdgeNodeClusterConfig in zedagent
  • handle ENClusterAppStatus publication in zedmanger in multi-node clustering case
  • zedmanager handling effective-activation include ENClusterAppStatus
  • kubevirt hypervisor changes to handle VMI/Pod ReplicaSets

case <-stillRunning.C:
}
ps.StillRunning(agentName, warningTime, errorTime)
if time.Since(wtTime) > 5*time.Minute { // wait for max of 5 minutes
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do you only want to wait for 5 minutes? Shouldn't zedagent always publish something even if there is no network connectivity to the controller for instance?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, i was sure about if it does in all the cases. I'll remove this 5 min limit.

}
if err := hyper.Task(status).Cleanup(status.DomainName); err != nil {
log.Errorf("failed to cleanup domain: %s (%v)", status.DomainName, err)
// in cluster mode, we can not delete the pod due to failing to get app info
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Was the issue this is fixing something appearing when there is a failover/takeover and another node in the cluster starts running the app instance?
Or is it something which could happen when an app instance is first provisioned on the cluster?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This can happen even on the first node of the app deployment, sometimes we can not get the status from the k3s cluster, or somethings it takes time to come up running state, but we should not remove this kubernetes configuration, it has the config stored in the database, it has it's own scheduling and control process to eventually bring it to the intended state. If we delete the config from the cluster, then we need to wait for another 10 minutes to retry, etc. and it will cause confusing.

Copy link
Contributor Author

@naiming-zededa naiming-zededa Nov 7, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

so, a new boolean is introduced in the domainstatus, DomainConfigDeleted, allow the Designated node, if it knows for sure the app instance is removed from the device, then it can go ahead to delete the app/domain from the cluster.

- change VMI to VMI ReplicaSet for kubernetes
- change Pod to Pod RelicaSet for containers
- change functions handling replicaset names in services
- subscribe EdgeNodeInfo in domainmgr, zedmanager to get
  node-name for cluster
- add Designated Node ID to several structs for App
- not to delete domain from kubernetes if not a Designated
  App node
- parse config for EdgeNodeClusterConfig in zedagent
- handle ENClusterAppStatus publication in zedmanger in
  multi-node clustering case
- zedmanager handling effective-activation include ENClusterAppStatus
- kubevirt hypervisor changes to handle VMI/Pod ReplicaSets

Signed-off-by: Naiming Shen <naiming@zededa.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants