Scale workloads

This topic describes the best practices required to build and deploy workloads at scale.

Sample application reference

The following sections describe the configuration of the different-size applications used to derive scalability best practices.

Small

This is the simplest configuration and consists of the following services and workloads:

API Gateway workload
Search workload with in-memory database
Search processor workload
Availability workload with in-memory database
UI workload
3 Node RabbitMQ cluster

Medium

This includes all of the services of the small-size application and the following services and workloads:

Notify workload
Persistent Database, MySQL or Postgres

Large

This includes all of the services of the medium size application and the following services and workloads:

Crawler Service
Redis

Application Configuration

The following section describes the application configuration used to derive the scalability best practices.

Supply chains used:

Out of the Box Supply Chain with Testing and Scanning (Build+Run)
Out of the Box Supply Chain Basic + Out of the Box Supply Chain with Testing (Iterate)

Workload type: web, server + worker

Kubernetes Distribution: Azure Kubernetes Service

Number of applications deployed concurrently: 50–55

	CPU	Memory Range	Workload CRs in Iterate	Workload CRs in Build+Run	Workload Transactions per second
Small	500m - 700m	3-5 GB	4	5	4
Medium	700m - 1000m	4-6 GB	NA	6	4
Large	1000m - 1500m	6-8 GB	NA	7	4

Scale configuration for workload deployments

This section describes cluster sizes for deploying a 1K workload.

Node configuration: 4 vCPUs, 16 GB RAM, 120 GB Disk size

|Cluster Type / Workload Details |Shared Iterate Cluster | Build Cluster |Run Cluster 1 |Run Cluster 2| Run Cluster 3 | |:--- |:--- |:--- |:--- |:---|:--- |:--- | |No. of Namespaces |300| 333 | 333 | 333 | 333 | |Small | 300 | 233 | 233 | 233 | 233 | |Medium | | 83 | 83 | 83 | 83 | |Large | | 17 | 17 | 17 | 17 | |No. of Nodes |90 | 60 | 135 | 135 | 135 |

Best Practices

The following table describes the resource limit changes that are required for components to support the scale configuration described in the previous table.

|Controller/Pod|CPU Requests/Limits|Memory Requests/Limits|Other changes|Build | Run | Iterate |Changes made in| |:------|:------|:--------|:-------|:------|:------|:-----|:------|:--------|:-------| AMR Observer | 200 m/1000 m | 2 Gi/3 Gi |n/a| Yes | Yes | No | tap-values.yaml | | Build Service/kpack controller | 20 m/100 m | 1 Gi/2 Gi |n/a| Yes | No | Yes | tap-values.yaml | | Scanning/scan-link | 200 m/500 m | 1 Gi/3 Gi| "SCAN_JOB_TTL_SECONDS_AFTER_FINISHED" - 10800*| Yes | No | No | tap-values.yaml | | Cartographer| 3000 m/4000 m | 10 Gi/10 Gi | In tap-values.yaml, change concurrency to 25. | Yes| Partial (only CPU) | Yes | tap-values.yaml | | Cartographer conventions| 100 m/100 m | 20 Mi/1.8 Gi | n/a | Yes | Yes | Yes | tap-values.yaml | | Namespace Provisioner | 100 m/500 m | 500 Mi/2 Gi |n/a | Yes | Yes | Yes | tap-values.yaml | | Cnrs/knative-controller | 100 m/1000 m | 1 Gi/3 Gi |n/a | No | Yes | Yes | tap-values.yaml | | Cnrs/net-contour | 40 m/400 m | 512 Mi/2 Gi | In tap-values.yaml, change Contour envoy workload type from Daemonset to Deployment.| No | Yes | Yes | tap-values.yaml | | Cnrs/activator | 300 m/1000 m | 5 Gi/5 Gi | n/a | No | Yes | No | tap-values.yaml | | Cnrs/autoscaler | 100 m/1000 m | 2 Gi/2 Gi | n/a | No | Yes | No | tap-values.yaml | | Services Toolkit Controller | 100 m/200 m | 750 Mi/1.5 Gi | n/a| No | Yes | Yes| overlay | | tap-telemetry/tap-telemetry-informer | 100 m/1000 m | 100 Mi/2 Gi | n/a| Yes | No | Yes| tap-values.yaml | | App SSO/App SSO Controller | 20 m/500 m | 512 Mi/2 Gi | n/a| No | Yes | Yes| tap-values.yaml |

CPU is measured in millicores. m = millicore. 1000 millicores = 1 vCPU.
Memory is measured in Mebibyte and Gibibyte. Mi = Mebibyte. Gi = Gibibyte
In the CPU Requests/Limits column and the Memory Requests/Limits, the changed values are bolded. Non-bolded values are the default ones set during a Tanzu Application Platform installation.
In the CPU Requests/Limits column, some of the request and limits values are set equally so that the pod is allocated in a node where the requested limit is available.

* Only when there is an issue with scan pods getting deleted before Cartographer can process it

Example resource limit changes

The following section provides examples of the changes required to the default limits to achieve scalability:

Cartographer

The default Cartographer concurrency limits are:

cartographer:
  cartographer:
    concurrency:
      max_workloads: 2
      max_deliveries: 2
      max_runnables: 2

Edit values.yaml to scale Cartographer concurrency limits. Configure the node with 4 vCPUs, 16 GB RAM, and 120 GB disk size:

cartographer:
  cartographer:
    concurrency:
      max_workloads: 25
      max_deliveries: 25
      max_runnables: 25

The default resource limits are:

resources:
  limits:
    cpu: 1
    memory: 1Gi
  requests:
    cpu: 500m
    memory: 512Mi

Edit values.yaml to scale resource limit:

# build-cluster
cartographer:
  cartographer:
    resources:
      limits:
        cpu: 4
        memory: 10Gi
      requests:
        cpu: 3
        memory: 10Gi

# run-cluster
cartographer:
  cartographer:
    resources:
      limits:
        cpu: 4
        memory: 2Gi
      requests:
        cpu: 3
        memory: 1G

Cartographer Conventions

The default resource limits are:

resources:
  limits:
    cpu: 100m
    memory: 256Mi
  requests:
    cpu: 100m
    memory: 20Mi

Edit values.yaml to scale resource limit:

cartographer:
  conventions:
    resources:
      limits:
        memory: 1.8Gi

Scan-link-controller

The default resource limits are:

resources:
  limits:
    cpu: 250m
    memory: 256Mi
  requests:
    cpu: 100m
    memory: 128Mi

Edit values.yaml to scale resource limit:

scanning:
  resources:
    limits:
      cpu: 500m
      memory: 3Gi
    requests:
      cpu: 200m
      memory: 1Gi

AMR Observer

The default resource limits are:

resources:
  limits:
    cpu:     500m
    memory:  512Mi
  requests:
    cpu:      100m
    memory:   256Mi

Edit values.yaml to scale resource limits:

amr:
  observer:
    app_limit_cpu: 1000m
    app_limit_memory: 3Gi
    app_req_cpu: 200m
    app_req_memory: 2Gi

kpack-controller in Tanzu Build Service

The default resource limits are:

resources:
  limits:
    memory: 1Gi
  requests:
    cpu: 20m
    memory: 1Gi

Edit values.yaml to scale resource limits:

buildservice:
  controller:
    resources:
      limits:
         memory: 2Gi
         cpu: 100m
      requests:
         memory: 1Gi
         cpu: 20m

Namespace Provisioner

The default resource limits are:

resources:
  limits:
    cpu: 500m
    memory: 100Mi
  requests:
    cpu: 100m
    memory: 20Mi

Edit values.yaml to scale resource limits:

namespace_provisioner:
  controller_resources:
    resources:
      limits:
        cpu: 500m
        memory: 2Gi
      requests:
        cpu: 100m
        memory: 500Mi

Cloud Native Runtimes Knative Serving

The default resource limits are:

resources:
  limits:
    cpu: 1
    memory: 1000Mi
  requests:
    cpu: 100m
    memory: 100Mi

Edit values.yaml to scale resource limits:

cnrs:
  resource_management:
  - name: "controller"
    limits:
      cpu: 1000m
      memory: 3Gi
    requests:
      cpu: 100m
      memory: 1Gi

net-contour controller

Change deployment type from Daemonset to Deployment.

contour:
  envoy:
    workload:
      type: Deployment
      replicas: 3

The default resource limits are:

resources:
  limits:
    cpu: 400m
    memory: 400Mi
  requests:
    cpu: 40m
    memory: 40Mi

Edit values.yaml to scale resource limits:

cnrs:
  resource_management:
  - name: "net-contour-controller"
    limits:
      cpu: 400m
      memory: 2Gi
    requests:
      cpu: 40m
      memory: 512Mi

Autoscaler

The default resource limits are:

resources:
  limits:
    cpu: 1
    memory: 1000Mi
  requests:
    cpu: 100m
    memory: 100Mi

Edit values.yaml to scale resource limits:

cnrs:
  resource_management:
  - name: "autoscaler"
    limits:
      cpu: 1000m
      memory: 2Gi
    requests:
      cpu: 100m
      memory: 2Gi

Activator

The default resource limits are:

resources:
  limits:
    cpu: 1
    memory: 600Mi
  requests:
    cpu: 300m
    memory: 60Mi

Edit values.yaml to scale resource limits:

cnrs:
  resource_management:
  - name: "activator"
    limits:
      cpu: 1000m
      memory: 5Gi
    requests:
      cpu: 300m
      memory: 5Gi

Tanzu Application Platform Telemetry

The default resource limits are:

resources:
  limits:
    cpu: 1
    memory: 1000Mi
  requests:
    cpu: 100m
    memory: 100Mi

Edit values.yaml to scale resource limits:

tap_telemetry:
  limit_memory: 2Gi

Application Single Sign-On

The default resource limits are:

resources:
  limits:
    cpu: 500m
    memory: 5000Mi
  requests:
    cpu: 20m
    memory: 100Mi

Edit values.yaml to scale resource limits:

appsso:
  resources:
    limits:
      memory: 1Gi
    requests:
      memory: 512Mi

Services Toolkit Controller

The default resource limits are:

resources:
  limits:
    cpu: 200m
    memory: 500Mi
  requests:
    cpu: 100m
    memory: 100Mi

Edit values.yaml to scale resource limits:

services_toolkit:
  controller:
    resources:
      requests:
        cpu: "200m"
        memory: "750Mi"
      limits:
        cpu: "200m"
        memory: "1.5Gi"

Services Toolkit Resource Claims API Server

The default resource limits are:

resources:
  limits:
    cpu: 120m
    memory: 500Mi
  requests:
    cpu: 100m
    memory: 100Mi

Edit values.yaml to scale resource limits:

services_toolkit:
  resource_claims_apiserver:
    resources:
      requests:
        cpu: "200m"
        memory: "750Mi"
      limits:
        cpu: "200m"
        memory: "1.5Gi"

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

scalability.hbs.md

scalability.hbs.md

Scale workloads

Sample application reference

Small

Medium

Large

Application Configuration

Scale configuration for workload deployments

Best Practices

Example resource limit changes

Cartographer

Cartographer Conventions

Scan-link-controller

AMR Observer

kpack-controller in Tanzu Build Service

Namespace Provisioner

Cloud Native Runtimes Knative Serving

net-contour controller

Autoscaler

Activator

Tanzu Application Platform Telemetry

Application Single Sign-On

Services Toolkit Controller

Services Toolkit Resource Claims API Server

Files

scalability.hbs.md

Latest commit

History

scalability.hbs.md

File metadata and controls

Scale workloads

Sample application reference

Small

Medium

Large

Application Configuration

Scale configuration for workload deployments

Best Practices

Example resource limit changes

Cartographer

Cartographer Conventions

Scan-link-controller

AMR Observer

kpack-controller in Tanzu Build Service

Namespace Provisioner

Cloud Native Runtimes Knative Serving

net-contour controller

Autoscaler

Activator

Tanzu Application Platform Telemetry

Application Single Sign-On

Services Toolkit Controller

Services Toolkit Resource Claims API Server