This topic describes the best practices required to build and deploy workloads at scale.
The following sections describe the configuration of the different-size applications used to derive scalability best practices.
This is the simplest configuration and consists of the following services and workloads:
- API Gateway workload
- Search workload with in-memory database
- Search processor workload
- Availability workload with in-memory database
- UI workload
- 3 Node RabbitMQ cluster
This includes all of the services of the small-size application and the following services and workloads:
- Notify workload
- Persistent Database, MySQL or Postgres
This includes all of the services of the medium size application and the following services and workloads:
- Crawler Service
- Redis
The following section describes the application configuration used to derive the scalability best practices.
Supply chains used:
- Out of the Box Supply Chain with Testing and Scanning (Build+Run)
- Out of the Box Supply Chain Basic + Out of the Box Supply Chain with Testing (Iterate)
Workload type: web
, server
+ worker
Kubernetes Distribution: Azure Kubernetes Service
Number of applications deployed concurrently: 50–55
CPU | Memory Range | Workload CRs in Iterate | Workload CRs in Build+Run | Workload Transactions per second | |
---|---|---|---|---|---|
Small | 500m - 700m | 3-5 GB | 4 | 5 | 4 |
Medium | 700m - 1000m | 4-6 GB | NA | 6 | 4 |
Large | 1000m - 1500m | 6-8 GB | NA | 7 | 4 |
This section describes cluster sizes for deploying a 1K workload.
Node configuration: 4 vCPUs, 16 GB RAM, 120 GB Disk size
|Cluster Type / Workload Details |Shared Iterate Cluster | Build Cluster |Run Cluster 1 |Run Cluster 2| Run Cluster 3 | |:--- |:--- |:--- |:--- |:---|:--- |:--- | |No. of Namespaces |300| 333 | 333 | 333 | 333 | |Small | 300 | 233 | 233 | 233 | 233 | |Medium | | 83 | 83 | 83 | 83 | |Large | | 17 | 17 | 17 | 17 | |No. of Nodes |90 | 60 | 135 | 135 | 135 |
The following table describes the resource limit changes that are required for components to support the scale configuration described in the previous table.
|Controller/Pod|CPU Requests/Limits|Memory Requests/Limits|Other changes|Build | Run | Iterate |Changes made in|
|:------|:------|:--------|:-------|:------|:------|:-----|:------|:--------|:-------|
AMR Observer | 200 m/1000 m | 2 Gi/3 Gi |n/a| Yes | Yes | No | tap-values.yaml
|
| Build Service/kpack controller | 20 m/100 m | 1 Gi/2 Gi |n/a| Yes | No | Yes | tap-values.yaml
|
| Scanning/scan-link | 200 m/500 m | 1 Gi/3 Gi| "SCAN_JOB_TTL_SECONDS_AFTER_FINISHED" - 10800*| Yes | No | No | tap-values.yaml
|
| Cartographer| 3000 m/4000 m | 10 Gi/10 Gi | In tap-values.yaml
, change concurrency
to 25. | Yes| Partial (only CPU) | Yes | tap-values.yaml
|
| Cartographer conventions| 100 m/100 m | 20 Mi/1.8 Gi | n/a | Yes | Yes | Yes | tap-values.yaml
|
| Namespace Provisioner | 100 m/500 m | 500 Mi/2 Gi |n/a | Yes | Yes | Yes | tap-values.yaml
|
| Cnrs/knative-controller | 100 m/1000 m | 1 Gi/3 Gi |n/a | No | Yes | Yes | tap-values.yaml
|
| Cnrs/net-contour | 40 m/400 m | 512 Mi/2 Gi | In tap-values.yaml
, change Contour envoy workload type from Daemonset
to Deployment
.| No | Yes | Yes | tap-values.yaml
|
| Cnrs/activator | 300 m/1000 m | 5 Gi/5 Gi | n/a | No | Yes | No | tap-values.yaml
|
| Cnrs/autoscaler | 100 m/1000 m | 2 Gi/2 Gi | n/a | No | Yes | No | tap-values.yaml
|
| Services Toolkit Controller | 100 m/200 m | 750 Mi/1.5 Gi | n/a| No | Yes | Yes| overlay |
| tap-telemetry/tap-telemetry-informer | 100 m/1000 m | 100 Mi/2 Gi | n/a| Yes | No | Yes| tap-values.yaml
|
| App SSO/App SSO Controller | 20 m/500 m | 512 Mi/2 Gi | n/a| No | Yes | Yes| tap-values.yaml
|
- CPU is measured in millicores. m = millicore. 1000 millicores = 1 vCPU.
- Memory is measured in Mebibyte and Gibibyte. Mi = Mebibyte. Gi = Gibibyte
- In the CPU Requests/Limits column and the Memory Requests/Limits, the changed values are bolded. Non-bolded values are the default ones set during a Tanzu Application Platform installation.
- In the CPU Requests/Limits column, some of the request and limits values are set equally so that the pod is allocated in a node where the requested limit is available.
* Only when there is an issue with scan pods getting deleted before Cartographer can process it
The following section provides examples of the changes required to the default limits to achieve scalability:
The default Cartographer concurrency limits are:
cartographer:
cartographer:
concurrency:
max_workloads: 2
max_deliveries: 2
max_runnables: 2
Edit values.yaml
to scale Cartographer concurrency limits. Configure the node with 4 vCPUs,
16 GB RAM, and 120 GB disk size:
cartographer:
cartographer:
concurrency:
max_workloads: 25
max_deliveries: 25
max_runnables: 25
The default resource limits are:
resources:
limits:
cpu: 1
memory: 1Gi
requests:
cpu: 500m
memory: 512Mi
Edit values.yaml
to scale resource limit:
# build-cluster
cartographer:
cartographer:
resources:
limits:
cpu: 4
memory: 10Gi
requests:
cpu: 3
memory: 10Gi
# run-cluster
cartographer:
cartographer:
resources:
limits:
cpu: 4
memory: 2Gi
requests:
cpu: 3
memory: 1G
The default resource limits are:
resources:
limits:
cpu: 100m
memory: 256Mi
requests:
cpu: 100m
memory: 20Mi
Edit values.yaml
to scale resource limit:
cartographer:
conventions:
resources:
limits:
memory: 1.8Gi
The default resource limits are:
resources:
limits:
cpu: 250m
memory: 256Mi
requests:
cpu: 100m
memory: 128Mi
Edit values.yaml
to scale resource limit:
scanning:
resources:
limits:
cpu: 500m
memory: 3Gi
requests:
cpu: 200m
memory: 1Gi
The default resource limits are:
resources:
limits:
cpu: 500m
memory: 512Mi
requests:
cpu: 100m
memory: 256Mi
Edit values.yaml
to scale resource limits:
amr:
observer:
app_limit_cpu: 1000m
app_limit_memory: 3Gi
app_req_cpu: 200m
app_req_memory: 2Gi
The default resource limits are:
resources:
limits:
memory: 1Gi
requests:
cpu: 20m
memory: 1Gi
Edit values.yaml
to scale resource limits:
buildservice:
controller:
resources:
limits:
memory: 2Gi
cpu: 100m
requests:
memory: 1Gi
cpu: 20m
The default resource limits are:
resources:
limits:
cpu: 500m
memory: 100Mi
requests:
cpu: 100m
memory: 20Mi
Edit values.yaml
to scale resource limits:
namespace_provisioner:
controller_resources:
resources:
limits:
cpu: 500m
memory: 2Gi
requests:
cpu: 100m
memory: 500Mi
The default resource limits are:
resources:
limits:
cpu: 1
memory: 1000Mi
requests:
cpu: 100m
memory: 100Mi
Edit values.yaml
to scale resource limits:
cnrs:
resource_management:
- name: "controller"
limits:
cpu: 1000m
memory: 3Gi
requests:
cpu: 100m
memory: 1Gi
Change deployment type from Daemonset to Deployment.
contour:
envoy:
workload:
type: Deployment
replicas: 3
The default resource limits are:
resources:
limits:
cpu: 400m
memory: 400Mi
requests:
cpu: 40m
memory: 40Mi
Edit values.yaml
to scale resource limits:
cnrs:
resource_management:
- name: "net-contour-controller"
limits:
cpu: 400m
memory: 2Gi
requests:
cpu: 40m
memory: 512Mi
The default resource limits are:
resources:
limits:
cpu: 1
memory: 1000Mi
requests:
cpu: 100m
memory: 100Mi
Edit values.yaml
to scale resource limits:
cnrs:
resource_management:
- name: "autoscaler"
limits:
cpu: 1000m
memory: 2Gi
requests:
cpu: 100m
memory: 2Gi
The default resource limits are:
resources:
limits:
cpu: 1
memory: 600Mi
requests:
cpu: 300m
memory: 60Mi
Edit values.yaml
to scale resource limits:
cnrs:
resource_management:
- name: "activator"
limits:
cpu: 1000m
memory: 5Gi
requests:
cpu: 300m
memory: 5Gi
The default resource limits are:
resources:
limits:
cpu: 1
memory: 1000Mi
requests:
cpu: 100m
memory: 100Mi
Edit values.yaml
to scale resource limits:
tap_telemetry:
limit_memory: 2Gi
The default resource limits are:
resources:
limits:
cpu: 500m
memory: 5000Mi
requests:
cpu: 20m
memory: 100Mi
Edit values.yaml
to scale resource limits:
appsso:
resources:
limits:
memory: 1Gi
requests:
memory: 512Mi
The default resource limits are:
resources:
limits:
cpu: 200m
memory: 500Mi
requests:
cpu: 100m
memory: 100Mi
Edit values.yaml
to scale resource limits:
services_toolkit:
controller:
resources:
requests:
cpu: "200m"
memory: "750Mi"
limits:
cpu: "200m"
memory: "1.5Gi"
The default resource limits are:
resources:
limits:
cpu: 120m
memory: 500Mi
requests:
cpu: 100m
memory: 100Mi
Edit values.yaml
to scale resource limits:
services_toolkit:
resource_claims_apiserver:
resources:
requests:
cpu: "200m"
memory: "750Mi"
limits:
cpu: "200m"
memory: "1.5Gi"