status | stage |
---|---|
provisional |
alpha |
- Summary
- Motivation
- Proposal
- Design Details
- Production Readiness Review Questionnaire
- Implementation History
- Drawbacks
- Alternatives
- Infrastructure Needed (Optional)
This RFC defines a method for users to declaritively manage multiple deployments of instances across many locations as a single logical entity in the form of a Workload. As a Workload is a declarative entity, it is possible that the intent described in the Workload may never be fully satisfied, however, the control plane will constantly drive toward the desired state.
A Workload allows users to define the type of instances to deploy, where to deploy them, and how to manage them.
The type of instance to deploy is influenced by many factors, such as:
- Instance resource requirements (CPU, Memory, Disk, Accelerators, etc)
- Runtime (Container or Virtual Machine)
- Runtime configuration (Environment Variables, Named Ports, Init Containers, Cloud-init)
- Network interfaces (Networks to attach to, Addressing configuration, IP Families)
- Volumes
The placement of instances can also be influenced by many factors, such as:
- An IATA airport code
- The minimum and maximum number of replicas
- Autoscaling settings, such as what metric to observe for scaling actions, or how quickly to scale up or down.
- Network topology constraints (future)
A Workload allows defining multiple deployment configurations to enable flexible management of instances, such as an EU deployment requiring a larger number of minimum replicas than a US deployment, or separate scaling settings.
- Provide a single API call that can result in the deployment of many compute instances.
- Allow the definition of multi-container based instances that will be deployed in managed virtualized sandboxes.
- Allow the definition of virtual machine based instances.
- Allow attaching one or more network interfaces to the same or separate networks.
- Allow the definition of network interface specific network policies.
- Provide the ability to remotely manage instances via the use of serial console or VNC for virtual machine based instances, and remote container execution for container based instances.
- Direct definition of single instances.
- Define functionality of Datum Cloud Networks (A separate RFC will define this).
Below is an example workload definition in YAML form:
Caution
The structure below has not yet been finalized and is subject to change.
apiVersion: compute.datumapis.com/v1alpha
kind: Workload
metadata:
# The workloadId will influence the name of each instance in the workload. This
# must be a valid DNS name.
name: my-workload
namespace: my-namespace
uid: 6e3d1b5f-5d58-40ac-9c4a-b93433c672f9
# Arbitrary string key/value entries that can be used to influence Datum Cloud
# platform behaviors at a workload level, or the behavior of external systems
# that may read these annotations.
annotations:
compute.datumapis.com/enable-anycast: "true"
# Arbitrary string key/value entries that can be used in network policies,
# services, discovery services (DNS discovery, Metadata API).
# WILL NOT propagate to instances managed by the workload, allowing updates to
# these labels without impacting the lifecycle of an instance.
labels:
tier: app
creationTimestamp: 1970-01-01T00:00:00Z
# Defines the expectations of the workload
spec:
# The template defines settings for each instance
template:
# Arbitrary string key/value entries that can be used in network policies,
# services, discovery services (DNS discovery, Metadata API).
#
# Any changes to these labels may result in a lifecycle event of instances.
labels:
tier: app
spec:
# The runtime type of the instance, such as a set of containers or a VM.
runtime:
# Resources each instance must be allocated.
#
# A sandbox runtime's containers may specify resource requests and
# limits. When limits are defined on all containers, they MUST consume
# the entire amount of resources defined here.
#
# A virtual machine runtime will be provided all requested resources.
resources:
# Resources for an instance may be defined by referencing a pre
# defined instance type, or by referencing an instance family and
# providing specific resource requests for CPU, Memory, etc.
#
# Instance type is a required field.
#
# When customizing an instance type the value will be in the form
# `<instanceFamily>-custom`.
instanceType: datumcloud/d1-standard-2
# Directly specifying desired resources.
#
# NOTE: This MUST be combined with `instanceType` being set to
# reference an instance family.
requests:
cpu: "2"
memory: "2Gi"
# Example of a GPU attachment
datumcloud.com/nvidia-tesla-a100: "1"
# A sandbox is a managed isolated environment capable of running
# containers. Each sandbox comes with its own kernel, isolated from the
# host machine.
#
# NOTE: This field is mutually exclusive with `virtualMachine`.
sandbox:
# Multiple containers can be deployed in a single instance. They will
# behave similar to a Kubernetes Pod in that they can interact over the
# instance's network interfaces, and attached volumes.
containers:
- name: netdata
# Only accept fully qualified container images so it's known where
# the image is coming from. Otherwise, we'd have to decide on a list
# of registries to fall back to, which can introduce security holes.
image: docker.io/netdata/netdata
# Resource requests and limits for the container.
# This field is optional. If left undefined, there will be no
# limits placed on the container beyond those imposed by the
# amount of resources requested for the runtime.
resources:
# Optional resource requests for the container.
#
# Each resource type MUST also be defined in `runtime.resources`,
# and cannot exceed the amount requested in `runtime.resources`.
#
# If only a single container is defined, resource requests must
# either match those defined in `runtime.resources`, or leave
# the field blank to permit all resources to the container.
#
# This can be helpful to hint the OS scheduler regarding resource
# priority between multiple containers in the same instance.
requests:
cpu: "1"
memory: "1Gi"
# Optional resource limits for the container.
#
# If not set, the upper limit on container resource usage will
# be the resources requested for the runtime.
limits:
# The container CPU usage will be throttled to 1 cpu-second
# per second.
cpu: "1"
# If the container memory usage exceeds 1Gi, it will be
# terminated and restarted by the control plane.
memory: "1536Mi"
volumeAttachments:
- name: logs
mountPath: /app/logs
# Named ports to be used in network policies and discovery services.
ports:
- name: http
port: 80
- name: https
port: 443
- name: sidecar
image: docker.io/timberio/vector
resources:
requests:
cpu: "100m"
memory: "256m"
limits:
cpu: "1"
memory: "512Mi"
volumeAttachments:
- name: logs
mountPath: /app/logs
# A virtual machine is a classical VM environment, booting a full OS
# provided by the user via an image.
#
# NOTE: This field is mutually exclusive with `sandbox`.
virtualMachine:
volumeAttachments:
# A VM based instance must have a volume attached as the boot disk.
# The `name` field must align with the `name` of a volume defined
# in `spec.volumes`.
#
# The order of volumes is important, and the first volume will be
# used to boot the VM's OS.
- name: boot
- name: logs
# Specifies a unique device name that is reflected into the
# `/dev/disk/by-id/datumcloud-*` tree of a Linux operating system
# running within the instance. This name can be used to reference
# the device for mounting, resizing, and so on, from within the
# instance.
#
# If not specified, the server chooses a default device name to
# apply to this disk, in the form persistent-disk-x, where x is a
# number assigned by Datum Cloud.
#
# This field is only applicable for persistent disks.
deviceName: logs
# One or more network interfaces can be attached to each instance
networkInterfaces:
# A default network will exist in each project
- network:
name: default
ipFamilies:
- IPv4
- IPv6
# Additional networks can be created and attached to
- network:
name: corp
ipFamilies:
- IPv4
# Volumes defines volumes that must be available to attach to an instance.
#
# Each volume may be of a different type, such as disk backed, backed by
# downward API information, secrets, memory, etc.
#
# Mounting or presenting the volume to the instance is controlled via the
# `volumeAttachments` field in a container or virtual machine settings.
volumes:
# Name will be used to reference the volume in `volumeAttachments` for
# containers and VMs, and will be used to derive the platform resource
# name when required by prefixing this name with the instance name upon
# creation.
- name: boot
disk:
template:
spec:
populator:
# Using a source image will default storage resource requests to
# the size of the source image.
image:
name: datumcloud/ubuntu-2204-lts
- name: logs
disk:
template:
spec:
populator:
filesystem:
type: ext4
resources:
requests:
storage: 10Gi
# Defines where instances should be deployed, and at what scope a deployment
# will live in (such as in a city, or region).
#
# Each placement will result in zero or more deployments. A deployment is
# responsible for managing the lifecycle of instances within its scope.
#
# NOTE: Only the deployment scope of `cityCode` will be supported at this time.
placements:
# Basic examples of leveraging IATA airport codes to define expectations on
# where deployments should be created.
- name: us
# Each city code which can accept the workload will result in a unique
# deployment.
#
# In this case two deployments will be created, one in DFW and another in
# SEA. In each of these deployments, there will be a minimum of 5
# instances deployed.
#
# NOTE: A deployment may be created in a city that cannot currently meet
# the minimum replica requirements of the placement. However, the control
# plane will constantly work toward creating the replicas when capacity is
# available.
cityCodes: ['DFW', 'SEA']
scaleSettings:
minReplicas: 5
# Same example as above, but to build different logical "regions".
- name: eu
cityCodes: ['LHR', 'AMS']
scaleSettings:
minReplicas: 3
- name: oc
cityCodes: ['SYD', 'MEL']
scaleSettings:
minReplicas: 3
# Forward looking autoscaling based on external metrics. There is an
# expectation that typical metrics that cover CPU and network related
# concerns will be made available automatically.
- name: us
cityCodes: ['DFW', 'SEA']
# Read
# - https://kubernetes.io/docs/tasks/run-application/horizontal-pod-autoscale-walkthrough/
# - https://kubernetes.io/docs/tasks/run-application/horizontal-pod-autoscale
scaleSettings:
minReplicas: 5
maxReplicas: 10
metrics:
# The intent here is that `External` will interact with an external
# metrics store owned by the user when making scaling decisions.
#
# In this example, the system will scale up or down instances with
# the goal of meeting the target request latency.
- external:
metric:
name: app:request_latency_ms:95
selector:
# Metric in the external system must have labels that match
# the expected values.
matchLabels:
tier: app
# Implies the metric series has a label `city_code`
# with a value matching the IATA cityCode of a potential
# location for the deployment
city_code: "{{ context.locality.cityCode }}"
target:
type: AverageValue
averageValue: 30
# Future flexible format
- name: test-eu
# Scope for a deployment. Each unique value of the locality attributed
# referenced will have a deployment created.
#
# For example, each `region` below may have one or more distinct cities
# available. In this configuration, a deployment would be created in each
# of those cities.
# TODO(jreese) consider a fully qualified label such as `locality.datumcloud.com/cityCode`
scope: cityCode
selector:
# CEL expression
constraint: region in ['US', 'EU', 'OC']
scaleSettings:
minReplicas: 3
# Status is an output-only system populated field used to communicate
# information regarding the state of the workload.
#
# Status conditions are aggregated from WorkloadDeployments into each placement
# conditions, and from the placement condition to the workload conditions.
status:
# Status of each placement
placements:
- name: us
currentReplicas: 7
desiredReplicas: 10
conditions:
- type: Available
status: "True"
- type: Progressing
status: "False"
reason: QuotaExhausted
message: CPU quota has been exhausted in, please contact support.
# ...
Below are examples of the minimum requirements for creating a container or VM based workload.
apiVersion: compute.datumapis.com/v1alpha
kind: Workload
metadata:
name: my-container-workload
spec:
template:
spec:
runtime:
resources:
instanceType: datumcloud/d1-standard-2
sandbox:
containers:
- name: httpbin
image: mccutchen/go-httpbin
networkInterfaces:
- network:
name: default
placements:
- name: us
cityCodes: ['DFW', 'SEA']
scaleSettings:
minReplicas: 1
apiVersion: compute.datumapis.com/v1alpha
kind: Workload
metadata:
name: my-vm-workload
spec:
template:
spec:
runtime:
resources:
instanceType: datumcloud/d1-standard-2
virtualMachine:
volumeAttachments:
- name: boot
networkInterfaces:
- network: default
volumes:
- name: boot
disk:
template:
spec:
populator:
image:
name: datumcloud/ubuntu-2204-lts
placements:
- name: us
cityCodes: ['DFW', 'SEA']
scaleSettings:
minReplicas: 1
%%{init: {
"sequence": {
"showSequenceNumbers": true
}
}}%%
sequenceDiagram
box API entities
participant workload as Workload
participant workloaddeployment as WorkloadDeployment
participant location as Location
end
box controllers
participant workload-controller
end
workload -->> workload-controller: Reconcile Workload
activate workload-controller
alt Workload Deleting
loop For each WorkloadDeployment
workload-controller ->> workloaddeployment: Delete
workload-controller ->> workload: Remove Finalizer
end
else Workload Create or Update
workload-controller ->> workload: Ensure Finalizer
loop For each Placement
loop For each cityCode
workload-controller ->> location: Locate viable locations
location ->> workload-controller: Locations
alt Locations found
workload-controller ->> workloaddeployment: Create/Update Deployment
else No Locations found
workload-controller ->> workload: Update Placement Status
end
end
end
workload-controller ->> workloaddeployment: Find orphaned deployments
loop For each orphaned WorkloadDeployment
workload-controller ->> workloaddeployment: Delete
end
end
deactivate workload-controller
%%{init: {
"sequence": {
"showSequenceNumbers": true
}
}}%%
sequenceDiagram
participant Controller
participant WorkloadDeploymentScheduler
participant KubernetesAPI
participant LocationList
participant DeploymentStatus
Controller->>WorkloadDeploymentScheduler: Reconcile(req)
WorkloadDeploymentScheduler->>KubernetesAPI: Get WorkloadDeployment
KubernetesAPI-->>WorkloadDeploymentScheduler: WorkloadDeployment or NotFound error
alt WorkloadDeployment Not Found
WorkloadDeploymentScheduler-->>Controller: Return (no requeue)
else WorkloadDeployment Found
WorkloadDeploymentScheduler->>KubernetesAPI: List Locations
KubernetesAPI-->>WorkloadDeploymentScheduler: LocationList or Error
alt No Locations
WorkloadDeploymentScheduler->>DeploymentStatus: Set "NoLocations" Condition
WorkloadDeploymentScheduler->>KubernetesAPI: Update Deployment Status
KubernetesAPI-->>WorkloadDeploymentScheduler: Success
WorkloadDeploymentScheduler-->>Controller: Return (RequeueAfter 30s)
else Locations Found
WorkloadDeploymentScheduler->>KubernetesAPI: Get Workload for Deployment
KubernetesAPI-->>WorkloadDeploymentScheduler: Workload or Error
alt Placement not found in Workload
WorkloadDeploymentScheduler-->>Controller: Return (no action)
else Placement Found
WorkloadDeploymentScheduler->>LocationList: Find matching Location (based on CityCode)
alt No Matching Location
WorkloadDeploymentScheduler->>DeploymentStatus: Set "NoCandidateLocations" Condition
WorkloadDeploymentScheduler->>KubernetesAPI: Update Deployment Status
KubernetesAPI-->>WorkloadDeploymentScheduler: Success
else Matching Location Found
WorkloadDeploymentScheduler->>KubernetesAPI: Update Deployment with Location
WorkloadDeploymentScheduler->>DeploymentStatus: Set "LocationAssigned" Condition
WorkloadDeploymentScheduler->>KubernetesAPI: Update Deployment Status
KubernetesAPI-->>WorkloadDeploymentScheduler: Success
end
end
end
end
WorkloadDeploymentScheduler-->>Controller: Return (no requeue)
- Feature gate
- Feature gate name:
- Components depending on the feature gate:
- Other
- Describe the mechanism:
- Will enabling / disabling the feature require downtime of the control plane?
- Will enabling / disabling the feature require downtime or reprovisioning of a node?
Is the rollout accompanied by any deprecations and/or removals of features, APIs, fields of API types, flags, etc.?
- Events
- Event Reason:
- API .status
- Condition name:
- Other field:
- Other (treat as last resort)
- Details:
What are the SLIs (Service Level Indicators) an operator can use to determine the health of the service?
- Metrics
- Metric name:
- [Optional] Aggregation method:
- Components exposing the metric:
- Other (treat as last resort)
- Details: