Skip to content

Latest commit

 

History

History

Folders and files

NameName
Last commit message
Last commit date

parent directory

..
 
 
status stage
provisional
alpha

Workloads

Summary

This RFC defines a method for users to declaritively manage multiple deployments of instances across many locations as a single logical entity in the form of a Workload. As a Workload is a declarative entity, it is possible that the intent described in the Workload may never be fully satisfied, however, the control plane will constantly drive toward the desired state.

A Workload allows users to define the type of instances to deploy, where to deploy them, and how to manage them.

The type of instance to deploy is influenced by many factors, such as:

  • Instance resource requirements (CPU, Memory, Disk, Accelerators, etc)
  • Runtime (Container or Virtual Machine)
  • Runtime configuration (Environment Variables, Named Ports, Init Containers, Cloud-init)
  • Network interfaces (Networks to attach to, Addressing configuration, IP Families)
  • Volumes

The placement of instances can also be influenced by many factors, such as:

  • An IATA airport code
  • The minimum and maximum number of replicas
  • Autoscaling settings, such as what metric to observe for scaling actions, or how quickly to scale up or down.
  • Network topology constraints (future)

A Workload allows defining multiple deployment configurations to enable flexible management of instances, such as an EU deployment requiring a larger number of minimum replicas than a US deployment, or separate scaling settings.

Motivation

Goals

  • Provide a single API call that can result in the deployment of many compute instances.
  • Allow the definition of multi-container based instances that will be deployed in managed virtualized sandboxes.
  • Allow the definition of virtual machine based instances.
  • Allow attaching one or more network interfaces to the same or separate networks.
  • Allow the definition of network interface specific network policies.
  • Provide the ability to remotely manage instances via the use of serial console or VNC for virtual machine based instances, and remote container execution for container based instances.

Non-Goals

  • Direct definition of single instances.
  • Define functionality of Datum Cloud Networks (A separate RFC will define this).

Proposal

User Stories (Optional)

Story 1

Story 2

Notes/Constraints/Caveats (Optional)

Risks and Mitigations

Design Details

Workload Entity

Below is an example workload definition in YAML form:

Caution

The structure below has not yet been finalized and is subject to change.

apiVersion: compute.datumapis.com/v1alpha
kind: Workload
metadata:
  # The workloadId will influence the name of each instance in the workload. This
  # must be a valid DNS name.
  name: my-workload
  namespace: my-namespace
  uid: 6e3d1b5f-5d58-40ac-9c4a-b93433c672f9
  # Arbitrary string key/value entries that can be used to influence Datum Cloud
  # platform behaviors at a workload level, or the behavior of external systems
  # that may read these annotations.
  annotations:
    compute.datumapis.com/enable-anycast: "true"
  # Arbitrary string key/value entries that can be used in network policies,
  # services, discovery services (DNS discovery, Metadata API).
  # WILL NOT propagate to instances managed by the workload, allowing updates to
  # these labels without impacting the lifecycle of an instance.
  labels:
    tier: app
  creationTimestamp: 1970-01-01T00:00:00Z
# Defines the expectations of the workload
spec:
  # The template defines settings for each instance
  template:
    # Arbitrary string key/value entries that can be used in network policies,
    # services, discovery services (DNS discovery, Metadata API).
    #
    # Any changes to these labels may result in a lifecycle event of instances.
    labels:
      tier: app
    spec:
      # The runtime type of the instance, such as a set of containers or a VM.
      runtime:
        # Resources each instance must be allocated.
        #
        # A sandbox runtime's containers may specify resource requests and
        # limits. When limits are defined on all containers, they MUST consume
        # the entire amount of resources defined here.
        #
        # A virtual machine runtime will be provided all requested resources.
        resources:
          # Resources for an instance may be defined by referencing a pre
          # defined instance type, or by referencing an instance family and
          # providing specific resource requests for CPU, Memory, etc.
          #
          # Instance type is a required field.
          #
          # When customizing an instance type the value will be in the form
          # `<instanceFamily>-custom`.
          instanceType: datumcloud/d1-standard-2

          # Directly specifying desired resources.
          #
          # NOTE: This MUST be combined with `instanceType` being set to
          # reference an instance family.
          requests:
            cpu: "2"
            memory: "2Gi"
            # Example of a GPU attachment
            datumcloud.com/nvidia-tesla-a100: "1"
        # A sandbox is a managed isolated environment capable of running
        # containers. Each sandbox comes with its own kernel, isolated from the
        # host machine.
        #
        # NOTE: This field is mutually exclusive with `virtualMachine`.
        sandbox:
          # Multiple containers can be deployed in a single instance. They will
          # behave similar to a Kubernetes Pod in that they can interact over the
          # instance's network interfaces, and attached volumes.
          containers:
            - name: netdata
              # Only accept fully qualified container images so it's known where
              # the image is coming from. Otherwise, we'd have to decide on a list
              # of registries to fall back to, which can introduce security holes.
              image: docker.io/netdata/netdata
              # Resource requests and limits for the container.
              # This field is optional. If left undefined, there will be no
              # limits placed on the container beyond those imposed by the
              # amount of resources requested for the runtime.
              resources:
                # Optional resource requests for the container.
                #
                # Each resource type MUST also be defined in `runtime.resources`,
                # and cannot exceed the amount requested in `runtime.resources`.
                #
                # If only a single container is defined, resource requests must
                # either match those defined in `runtime.resources`, or leave
                # the field blank to permit all resources to the container.
                #
                # This can be helpful to hint the OS scheduler regarding resource
                # priority between multiple containers in the same instance.
                requests:
                  cpu: "1"
                  memory: "1Gi"
                # Optional resource limits for the container.
                #
                # If not set, the upper limit on container resource usage will
                # be the resources requested for the runtime.
                limits:
                  # The container CPU usage will be throttled to 1 cpu-second
                  # per second.
                  cpu: "1"
                  # If the container memory usage exceeds 1Gi, it will be
                  # terminated and restarted by the control plane.
                  memory: "1536Mi"
              volumeAttachments:
                - name: logs
                  mountPath: /app/logs
              # Named ports to be used in network policies and discovery services.
              ports:
                - name: http
                  port: 80
                - name: https
                  port: 443
            - name: sidecar
              image: docker.io/timberio/vector
              resources:
                requests:
                  cpu: "100m"
                  memory: "256m"
                limits:
                  cpu: "1"
                  memory: "512Mi"
              volumeAttachments:
                - name: logs
                  mountPath: /app/logs
        # A virtual machine is a classical VM environment, booting a full OS
        # provided by the user via an image.
        #
        # NOTE: This field is mutually exclusive with `sandbox`.
        virtualMachine:
          volumeAttachments:
            # A VM based instance must have a volume attached as the boot disk.
            # The `name` field must align with the `name` of a volume defined
            # in `spec.volumes`.
            #
            # The order of volumes is important, and the first volume will be
            # used to boot the VM's OS.
            - name: boot
            - name: logs
              # Specifies a unique device name that is reflected into the
              # `/dev/disk/by-id/datumcloud-*` tree of a Linux operating system
              # running within the instance. This name can be used to reference
              # the device for mounting, resizing, and so on, from within the
              # instance.
              #
              # If not specified, the server chooses a default device name to
              # apply to this disk, in the form persistent-disk-x, where x is a
              # number assigned by Datum Cloud.
              #
              # This field is only applicable for persistent disks.
              deviceName: logs

      # One or more network interfaces can be attached to each instance
      networkInterfaces:
        # A default network will exist in each project
        - network:
            name: default
          ipFamilies:
            - IPv4
            - IPv6
        # Additional networks can be created and attached to
        - network:
            name: corp
          ipFamilies:
            - IPv4

      # Volumes defines volumes that must be available to attach to an instance.
      #
      # Each volume may be of a different type, such as disk backed, backed by
      # downward API information, secrets, memory, etc.
      #
      # Mounting or presenting the volume to the instance is controlled via the
      # `volumeAttachments` field in a container or virtual machine settings.
      volumes:
        # Name will be used to reference the volume in `volumeAttachments` for
        # containers and VMs, and will be used to derive the platform resource
        # name when required by prefixing this name with the instance name upon
        # creation.
        - name: boot
          disk:
            template:
              spec:
                populator:
                  # Using a source image will default storage resource requests to
                  # the size of the source image.
                  image:
                    name: datumcloud/ubuntu-2204-lts
        - name: logs
          disk:
            template:
              spec:
                populator:
                  filesystem:
                    type: ext4
                resources:
                  requests:
                    storage: 10Gi

  # Defines where instances should be deployed, and at what scope a deployment
  # will live in (such as in a city, or region).
  #
  # Each placement will result in zero or more deployments. A deployment is
  # responsible for managing the lifecycle of instances within its scope.
  #
  # NOTE: Only the deployment scope of `cityCode` will be supported at this time.
  placements:
    # Basic examples of leveraging IATA airport codes to define expectations on
    # where deployments should be created.
    - name: us
      # Each city code which can accept the workload will result in a unique
      # deployment.
      #
      # In this case two deployments will be created, one in DFW and another in
      # SEA. In each of these deployments, there will be a minimum of 5
      # instances deployed.
      #
      # NOTE: A deployment may be created in a city that cannot currently meet
      # the minimum replica requirements of the placement. However, the control
      # plane will constantly work toward creating the replicas when capacity is
      # available.
      cityCodes: ['DFW', 'SEA']
      scaleSettings:
        minReplicas: 5
    # Same example as above, but to build different logical "regions".
    - name: eu
      cityCodes: ['LHR', 'AMS']
      scaleSettings:
        minReplicas: 3
    - name: oc
      cityCodes: ['SYD', 'MEL']
      scaleSettings:
        minReplicas: 3
    # Forward looking autoscaling based on external metrics. There is an
    # expectation that typical metrics that cover CPU and network related
    # concerns will be made available automatically.
    - name: us
      cityCodes: ['DFW', 'SEA']
      # Read
      # - https://kubernetes.io/docs/tasks/run-application/horizontal-pod-autoscale-walkthrough/
      # - https://kubernetes.io/docs/tasks/run-application/horizontal-pod-autoscale
      scaleSettings:
        minReplicas: 5
        maxReplicas: 10
        metrics:
          # The intent here is that `External` will interact with an external
          # metrics store owned by the user when making scaling decisions.
          #
          # In this example, the system will scale up or down instances with
          # the goal of meeting the target request latency.
          - external:
              metric:
                name: app:request_latency_ms:95
                selector:
                  # Metric in the external system must have labels that match
                  # the expected values.
                  matchLabels:
                    tier: app
                    # Implies the metric series has a label `city_code`
                    # with a value matching the IATA cityCode of a potential
                    # location for the deployment
                    city_code: "{{ context.locality.cityCode }}"
              target:
                type: AverageValue
                averageValue: 30

    # Future flexible format
    - name: test-eu
      # Scope for a deployment. Each unique value of the locality attributed
      # referenced will have a deployment created.
      #
      # For example, each `region` below may have one or more distinct cities
      # available. In this configuration, a deployment would be created in each
      # of those cities.
      # TODO(jreese) consider a fully qualified label such as `locality.datumcloud.com/cityCode`
      scope: cityCode
      selector:
        # CEL expression
        constraint: region in ['US', 'EU', 'OC']
      scaleSettings:
        minReplicas: 3
# Status is an output-only system populated field used to communicate
# information regarding the state of the workload.
#
# Status conditions are aggregated from WorkloadDeployments into each placement
# conditions, and from the placement condition to the workload conditions.
status:
  # Status of each placement
  placements:
    - name: us
      currentReplicas: 7
      desiredReplicas: 10
      conditions:
        - type: Available
          status: "True"
        - type: Progressing
          status: "False"
          reason: QuotaExhausted
          message: CPU quota has been exhausted in, please contact support.
    # ...

Minimum specifications

Below are examples of the minimum requirements for creating a container or VM based workload.

Container based instance
apiVersion: compute.datumapis.com/v1alpha
kind: Workload
metadata:
  name: my-container-workload
spec:
  template:
    spec:
      runtime:
        resources:
          instanceType: datumcloud/d1-standard-2
        sandbox:
          containers:
            - name: httpbin
              image: mccutchen/go-httpbin
      networkInterfaces:
        - network:
            name: default
  placements:
    - name: us
      cityCodes: ['DFW', 'SEA']
      scaleSettings:
        minReplicas: 1
VM based instance
apiVersion: compute.datumapis.com/v1alpha
kind: Workload
metadata:
  name: my-vm-workload
spec:
  template:
    spec:
      runtime:
        resources:
          instanceType: datumcloud/d1-standard-2
        virtualMachine:
          volumeAttachments:
            - name: boot
      networkInterfaces:
        - network: default
      volumes:
        - name: boot
          disk:
            template:
              spec:
                populator:
                  image:
                    name: datumcloud/ubuntu-2204-lts
  placements:
    - name: us
      cityCodes: ['DFW', 'SEA']
      scaleSettings:
        minReplicas: 1

Control Loops

Workload

%%{init: {
  "sequence": {
    "showSequenceNumbers": true
  }
}}%%
sequenceDiagram

  box API entities
    participant workload as Workload
    participant workloaddeployment as WorkloadDeployment
    participant location as Location
  end

  box controllers
    participant workload-controller
  end

  workload -->> workload-controller: Reconcile Workload
  activate workload-controller

  alt Workload Deleting
    loop For each WorkloadDeployment
      workload-controller ->> workloaddeployment: Delete
      workload-controller ->> workload: Remove Finalizer
    end
  else Workload Create or Update
    workload-controller ->> workload: Ensure Finalizer
    loop For each Placement
      loop For each cityCode
        workload-controller ->> location: Locate viable locations
        location ->> workload-controller: Locations

        alt Locations found
          workload-controller ->> workloaddeployment: Create/Update Deployment
        else No Locations found
          workload-controller ->> workload: Update Placement Status
        end
      end
    end

    workload-controller ->> workloaddeployment: Find orphaned deployments
    loop For each orphaned WorkloadDeployment
      workload-controller ->> workloaddeployment: Delete
    end
  end

  deactivate workload-controller

Loading

Workload Scheduler

%%{init: {
  "sequence": {
    "showSequenceNumbers": true
  }
}}%%
sequenceDiagram
  participant Controller
  participant WorkloadDeploymentScheduler
  participant KubernetesAPI
  participant LocationList
  participant DeploymentStatus

  Controller->>WorkloadDeploymentScheduler: Reconcile(req)
  WorkloadDeploymentScheduler->>KubernetesAPI: Get WorkloadDeployment
  KubernetesAPI-->>WorkloadDeploymentScheduler: WorkloadDeployment or NotFound error

  alt WorkloadDeployment Not Found
    WorkloadDeploymentScheduler-->>Controller: Return (no requeue)
  else WorkloadDeployment Found
    WorkloadDeploymentScheduler->>KubernetesAPI: List Locations
    KubernetesAPI-->>WorkloadDeploymentScheduler: LocationList or Error

    alt No Locations
      WorkloadDeploymentScheduler->>DeploymentStatus: Set "NoLocations" Condition
      WorkloadDeploymentScheduler->>KubernetesAPI: Update Deployment Status
      KubernetesAPI-->>WorkloadDeploymentScheduler: Success
      WorkloadDeploymentScheduler-->>Controller: Return (RequeueAfter 30s)
    else Locations Found
      WorkloadDeploymentScheduler->>KubernetesAPI: Get Workload for Deployment
      KubernetesAPI-->>WorkloadDeploymentScheduler: Workload or Error

      alt Placement not found in Workload
        WorkloadDeploymentScheduler-->>Controller: Return (no action)
      else Placement Found
        WorkloadDeploymentScheduler->>LocationList: Find matching Location (based on CityCode)
        alt No Matching Location
          WorkloadDeploymentScheduler->>DeploymentStatus: Set "NoCandidateLocations" Condition
          WorkloadDeploymentScheduler->>KubernetesAPI: Update Deployment Status
          KubernetesAPI-->>WorkloadDeploymentScheduler: Success
        else Matching Location Found
          WorkloadDeploymentScheduler->>KubernetesAPI: Update Deployment with Location
          WorkloadDeploymentScheduler->>DeploymentStatus: Set "LocationAssigned" Condition
          WorkloadDeploymentScheduler->>KubernetesAPI: Update Deployment Status
          KubernetesAPI-->>WorkloadDeploymentScheduler: Success
        end
      end
    end
  end
  WorkloadDeploymentScheduler-->>Controller: Return (no requeue)
Loading

Production Readiness Review Questionnaire

Feature Enablement and Rollback

How can this feature be enabled / disabled in a live cluster?

  • Feature gate
    • Feature gate name:
    • Components depending on the feature gate:
  • Other
    • Describe the mechanism:
    • Will enabling / disabling the feature require downtime of the control plane?
    • Will enabling / disabling the feature require downtime or reprovisioning of a node?

Does enabling the feature change any default behavior?

Can the feature be disabled once it has been enabled (i.e. can we roll back the enablement)?

What happens if we reenable the feature if it was previously rolled back?

Are there any tests for feature enablement/disablement?

Rollout, Upgrade and Rollback Planning

How can a rollout or rollback fail? Can it impact already running workloads?

What specific metrics should inform a rollback?

Were upgrade and rollback tested? Was the upgrade->downgrade->upgrade path tested?

Is the rollout accompanied by any deprecations and/or removals of features, APIs, fields of API types, flags, etc.?

Monitoring Requirements

How can an operator determine if the feature is in use by workloads?

How can someone using this feature know that it is working for their instance?

  • Events
    • Event Reason:
  • API .status
    • Condition name:
    • Other field:
  • Other (treat as last resort)
    • Details:

What are the reasonable SLOs (Service Level Objectives) for the enhancement?

What are the SLIs (Service Level Indicators) an operator can use to determine the health of the service?

  • Metrics
    • Metric name:
    • [Optional] Aggregation method:
    • Components exposing the metric:
  • Other (treat as last resort)
    • Details:

Are there any missing metrics that would be useful to have to improve observability of this feature?

Dependencies

Does this feature depend on any specific services running in the cluster?

Scalability

Will enabling / using this feature result in any new API calls?

Will enabling / using this feature result in introducing new API types?

Will enabling / using this feature result in any new calls to the cloud provider?

Will enabling / using this feature result in increasing size or count of the existing API objects?

Will enabling / using this feature result in increasing time taken by any operations covered by existing SLIs/SLOs?

Will enabling / using this feature result in non-negligible increase of resource usage in any components?

Can enabling / using this feature result in resource exhaustion of some node resources (PIDs, sockets, inodes, etc.)?

Troubleshooting

How does this feature react if the API server is unavailable?

What are other known failure modes?

What steps should be taken if SLOs are not being met to determine the problem?

Implementation History

Drawbacks

Alternatives

Infrastructure Needed (Optional)