Replies: 1 comment
-
I think that the operators are responsible for the "overall consistency".
Currently the agent checks only the status of the running systemd services and updates the pod status accordingly. The service files are not checked for modifications. We can discuss to add this feature.
The agent is in charge of mapping the pod specification to a systemd service. As long as the service is running, the agent is done. If e.g. the Apache Kafka service is running but not able to process events then it is not the job of the agent to detect or fix this. I am not sure what the operators are exactly responsible for.
I think that the agent behaves very similar to the Kubernetes kubelet:
It is the job of the kubelet to restart the containers according to the
We made the decision that the services should stay up-and-running even if the agent is not available. But there could be use cases where this is not desirable. So this can be discussed. |
Beta Was this translation helpful? Give feedback.
-
Will the agent be periodically checking the systemd unit files (in a 'controller loop' fashion)? Such that if one is removed or edited by mistake, the agent will put it back?
I'm also thinking about which controller is the one doing the actual monitoring/controlling.
If a pod stops, then, as I understand it, the kubelet reports this to the controller and the controller schedules a replacement.
In stackable, if the daemon (e.g., zookeeper) stops, then actually systemd will restart it, so the krustlet would not report this to k8s. Do I understand correctly?
Similarly, on boot, "normally" in k8s the scheduler would allocate a pod to a node[1]; so without kubelet running, the server would not run any applications. But in stackable, systemd will start up the applications regardless of the state of the krustlet or k8s. If the node has been offline a while, it means it could start up with out of date configuration, since krustlet hasn't had new information from the api-server while it was offline. This relates to https://docs.stackable.tech/home/adr/ADR005-systemd_unit_file_location.html - did you consider having the unit files on volatile storage, so during a reboot, they are intentionally lost, so that the operator is the one who decides if an application should start or not?
[1] As I understand it anyway, let me know if I'm wrong
(This was moved from #180 to here)
Beta Was this translation helpful? Give feedback.
All reactions