Skip to content

Latest commit

 

History

History
292 lines (220 loc) · 16.9 KB

0002-controller-manager.md

File metadata and controls

292 lines (220 loc) · 16.9 KB
kep-number: draft-20171127
title: Cloud Provider Controller Manager
authors:
  - "@cheftako"
  - "@calebamiles"
owning-sig: sig-apimachinery
participating-sigs:
  - sig-storage
  - sig-apps
  - sig-network
reviewers:
  - "@wlan0"
  - "@calebamiles"
approvers:
  - "@thockin"
editor: TBD
status: approved
replaces:
  - contributors/design-proposals/cloud-provider/cloud-provider-refactoring.md

Remove Cloud Provider Code From Kubernetes Core

Table of Contents

Summary

We want to remove any cloud provider specific logic from the kubernetes/kubernetes repo. We want to restructure the code to make is easy for any cloud provider to extend the kubernetes core in a consistent manner for their cloud. New cloud providers should look at the Creating a Custom Cluster from Scratch and the cloud provider interface which will need to be implemented.

Motivation

We are trying to remove any dependencies from Kubernetes Core to any specific cloud provider. Currently we have seven such dependencies. To prevent this number from growing we have locked Kubernetes Core to the addition of any new dependencies. This means all new cloud providers have to implement all their pieces outside of the Core. However everyone still ends up consuming the current set of seven in repo dependencies. For the seven in repo cloud providers any changes to their specific cloud provider code requires OSS PR approvals and a deployment to get those changes in to an official build. The relevant dependencies require changes in the following areas.

For the cloud providers who are in repo, moving out would allow them to more quickly iterate on their solution and decouple cloud provider fixes from open source releases. Moving the cloud provider code out of the open source processes means that these processes do not need to load/run unnecessary code for the environment they are in. We would like to abstract a core controller manager library so help standardize the behavior of the cloud controller managers produced by each cloud provider. We would like to minimize the number and scope of controllers running in the cloud controller manager so as to minimize the surface area for per cloud provider deviation.

Goals

  • Get to a point where we do not load the cloud interface for any of kubernetes core processes.
  • Remove all cloud provider specific code from kubernetes/kubernetes.
  • Have a generic controller manager library available for use by the per cloud provider controller managers.
  • Move the cloud provider specific controller manager logic into repos appropriate for those cloud providers.

Intermediary Goals

Have a cloud controller manager in the kubernetes main repo which hosts all of the controller loops for the in repo cloud providers. Do not run any cloud provider logic in the kube controller manager, the kube apiserver or the kubelet. At intermediary points we may just move some of the cloud specific controllers out. (Eg. volumes may be later than the rest)

Non-Goals

Forcing cloud providers to use the generic cloud manager.

Proposal

Controller Manager Changes

For the controller manager we would like to create a set of common code which can be used by both the cloud controller manager and the kube controller manager. The cloud controller manager would then be responsible for running controllers whose function is specific to cloud provider functionality. The kube controller manager would then be responsible for running all controllers whose function was not related to a cloud provider.

In order to create a 100% cloud independent controller manager, the controller-manager will be split into multiple binaries.

  1. Cloud dependent controller-manager binaries
  2. Cloud independent controller-manager binaries - This is the existing kube-controller-manager that is being shipped with kubernetes releases.

The cloud dependent binaries will run those loops that rely on cloudprovider in a separate process(es) within the kubernetes control plane. The rest of the controllers will be run in the cloud independent controller manager. The decision to run entire controller loops, rather than only the very minute parts that rely on cloud provider was made because it makes the implementation simple. Otherwise, the shared data structures and utility functions have to be disentangled, and carefully separated to avoid any concurrency issues. This approach among other things, prevents code duplication and improves development velocity.

Note that the controller loop implementation will continue to reside in the core repository. It takes in cloudprovider.Interface as an input in its constructor. Vendor maintained cloud-controller-manager binary could link these controllers in, as it serves as a reference form of the controller implementation.

There are four controllers that rely on cloud provider specific code. These are node controller, service controller, route controller and attach detach controller. Copies of each of these controllers have been bundled together into one binary. The cloud dependent binary registers itself as a controller, and runs the cloud specific controller loops with the user-agent named "external-controller-manager".

RouteController and serviceController are entirely cloud specific. Therefore, it is really simple to move these two controller loops out of the cloud-independent binary and into the cloud dependent binary.

NodeController does a lot more than just talk to the cloud. It does the following operations -

  1. CIDR management
  2. Monitor Node Status
  3. Node Pod Eviction

While Monitoring Node status, if the status reported by kubelet is either 'ConditionUnknown' or 'ConditionFalse', then the controller checks if the node has been deleted from the cloud provider. If it has already been deleted from the cloud provider, then it deletes the nodeobject without waiting for the monitorGracePeriod amount of time. This is the only operation that needs to be moved into the cloud dependent controller manager.

Finally, The attachDetachController is tricky, and it is not simple to disentangle it from the controller-manager easily, therefore, this will be addressed with Flex Volumes (Discussed under a separate section below)

The kube-controller-manager has many controller loops. See NewControllerInitializers

Among these controller loops, the following are cloud provider dependent.

The nodeController uses the cloudprovider to check if a node has been deleted from the cloud. If cloud provider reports a node as deleted, then this controller immediately deletes the node from kubernetes. This check removes the need to wait for a specific amount of time to conclude that an inactive node is actually dead.

The volumeController uses the cloudprovider to create, delete, attach and detach volumes to nodes. For instance, the logic for provisioning, attaching, and detaching a EBS volume resides in the AWS cloudprovider. The volumeController uses this code to perform its operations.

The routeController configures routes for hosts in the cloud provider.

The serviceController maintains a list of currently active nodes, and is responsible for creating and deleting LoadBalancers in the underlying cloud.

Kubelet Changes

Moving on to the kubelet, the following cloud provider dependencies exist in kubelet.

  • Find the cloud nodename of the host that kubelet is running on for the following reasons :
    1. To obtain the config map for the kubelet, if one already exists
    2. To uniquely identify current node using nodeInformer
    3. To instantiate a reference to the current node object
  • Find the InstanceID, ProviderID, ExternalID, Zone Info of the node object while initializing it
  • Periodically poll the cloud provider to figure out if the node has any new IP addresses associated with it
  • It sets a condition that makes the node unschedulable until cloud routes are configured.
  • It allows the cloud provider to post process DNS settings

The majority of the calls by the kubelet to the cloud is done during the initialization of the Node Object. The other uses are for configuring Routes (in case of GCE), scrubbing DNS, and periodically polling for IP addresses.

All of the above steps, except the Node initialization step can be moved into a controller. Specifically, IP address polling, and configuration of Routes can be moved into the cloud dependent controller manager.

Scrubbing DNS was found to be redundant. So, it can be disregarded. It is being removed.

Finally, Node initialization needs to be addressed. This is the trickiest part. Pods will be scheduled even on uninitialized nodes. This can lead to scheduling pods on incompatible zones, and other weird errors. Therefore, an approach is needed where kubelet can create a Node, but mark it as "NotReady". Then, some asynchronous process can update it and mark it as ready. This is now possible because of the concept of Taints.

This approach requires kubelet to be started with known taints. This will make the node unschedulable until these taints are removed. The external cloud controller manager will asynchronously update the node objects and remove the taints.

API Server Changes

Finally, in the kube-apiserver, the cloud provider is used for transferring SSH keys to all of the nodes, and within an a dmission controller for setting labels on persistent volumes.

Kube-apiserver uses the cloud provider for two purposes

  1. Distribute SSH Keys - This can be moved to the cloud dependent controller manager
  2. Admission Controller for PV - This can be refactored using the taints approach used in Kubelet

Volume Managent Changes

Volumes need cloud providers, but they only need specific cloud providers. The majority of volume management logic resides in the controller manager. These controller loops need to be moved into the cloud-controller manager. The cloud controller manager also needs a mechanism to read parameters for initilization from cloud config. This can be done via config maps.

There are two entirely different approach to refactoring volumes - Flex Volumes and CSI Container Storage Interface. There is an undergoing effort to move all of the volume logic from the controller-manager into plugins called Flex Volumes. In the Flex volumes world, all of the vendor specific code will be packaged in a separate binary as a plugin. After discussing with @thockin, this was decidedly the best approach to remove all cloud provider dependency for volumes out of kubernetes core. Some of the discovery information for this can be found at https://goo.gl/CtzpVm.

Deployment Changes

This change will introduce new binaries to the list of binaries required to run kubernetes. The change will be designed such that these binaries can be installed via kubectl apply -f and the appropriate instances of the binaries will be running.

Issues such as monitoring, configuring the new binaries will generally be left to cloud provider. However they should ensure that test runs upload the logs for these new processes to test grid.

Applying the cloud controller manager is the only step that is different in the upgrade process. In order to complete the upgrade process, you need to apply the cloud-controller-manager deployment to the setup. A deployment descriptor file will be provided with this change. You need to apply this change using

kubectl apply -f cloud-controller-manager.yml

This will start the cloud specific controller manager in your kubernetes setup.

The downgrade steps are also the same as before for all the components except the cloud-controller-manager. In case of the cloud-controller-manager, the deployment should be deleted using

kubectl delete -f cloud-controller-manager.yml

Implementation Details/Notes/Constraints [optional]

Security Considerations

Make sure that you consider the impact of this feature from the point of view of Security.

Graduation Criteria

How will we know that this has succeeded? Gathering user feedback is crucial for building high quality experiences and SIGs have the important responsibility of setting milestones for stability and completeness. Hopefully the content previously contained in umbrella issues will be tracked in the Graduation Criteria section.

Implementation History

Major milestones in the life cycle of a KEP should be tracked in Implementation History. Major milestones might include

  • the Summary and Motivation sections being merged signaling SIG acceptance
  • the Proposal section being merged signaling agreement on a proposed design
  • the date implementation started
  • the first Kubernetes release where an initial version of the KEP was available
  • the version of Kubernetes where the KEP graduated to general availability
  • when the KEP was retired or superseded

Alternatives [optional]

One alternate to consider is the use of a side-car. The cloud-interface in tree could then be a GRPC call out to that side-car. We could then leave the Kube API Server, Kube Controller Manager and Kubelet pretty much as is. We would still need separate repos to hold the code for the side care and to handle cluster setup for the cloud provider. However we believe that different cloud providers will (already) want different control loops. As such we are likely to need something like the cloud controller manager anyway. From the perspective it seems easier to centralize the effort in that direction. In addition it should limit the proliferation of new processes across the entire cluster.