Skip to content

Latest commit

 

History

History
45 lines (42 loc) · 2.65 KB

scalability_validation_report.md

File metadata and controls

45 lines (42 loc) · 2.65 KB

This is the scalability validation report for release-1.8 written as per this template.

Validated large cluster performance under the following configuration:

  • Cloud-provider - GCE
  • No. of nodes - 5000
  • Node machine-type - n1-standard-1; OS - gci; Disk size - 50GB
  • Master machine-type - [auto-calculated]; OS - gci; Disk size - [auto-calculated]
  • Any non-default config used:
    • KUBE_ENABLE_CLUSTER_MONITORING=none
    • ENABLE_APISERVER_ADVANCED_AUDIT=false
    • APISERVER_TEST_ARGS=--max-requests-inflight=3000 --max-mutating-requests-inflight=1000
    • SCHEDULER_TEST_ARGS=--kube-api-qps=100
    • CONTROLLER_MANAGER_TEST_ARGS=--kube-api-qps=100 --kube-api-burst=100
    • TEST_CLUSTER_RESYNC_PERIOD=--min-resync-period=12h
    • TEST_CLUSTER_DELETE_COLLECTION_WORKERS=--delete-collection-workers=16
    • ENABLE_BIG_CLUSTER_SUBNETS=true
  • Any important test details:
    • Services disabled in load test
    • SLO used for 99%ile for api call latency:
      • For clusters with <= 500 nodes:
        • <= 1s for all calls
      • For clusters with > 500 nodes:
        • <= 1s for non-list calls
        • <= 5s for namespaced list calls
        • <= 10s for cluster-scoped list calls
  • <job-name, run#> of the validating run (to know other specific details from the logs): https://k8s-gubernator.appspot.com/build/kubernetes-jenkins/logs/ci-kubernetes-e2e-gce-scale-performance/36

Validated large cluster correctness under the following configuration:

  • Cloud-provider - GCE
  • No. of nodes - 5000
  • Node machine-type - g1-small; OS - gci; Disk size - 50GB
  • One special n1-standard-8 node (out of the 5k nodes) used for heapster
  • Master machine-type - [auto-calculated]; OS - gci; Disk size - [auto-calculated]
  • Any non-default config used:
    • KUBE_ENABLE_CLUSTER_MONITORING=standalone
    • ENABLE_APISERVER_ADVANCED_AUDIT=true
    • APISERVER_TEST_ARGS=--max-requests-inflight=1500 --max-mutating-requests-inflight=500
    • (rest same as above)
  • Any important test details:
    • A few e2es around external loadbalancer timing out due to GCE-side issues (#52495) - being fixed
    • A few e2es around heapster and stackdriver logging are flaking - need stabilization
  • <job-name, run#> of the validating run (to know other specific details from the logs): https://k8s-gubernator.appspot.com/build/kubernetes-jenkins/logs/ci-kubernetes-e2e-gce-scale-correctness/13

Misc:

  • We're seeing a performance regression in density test wrt api call latency for delete pods - #51899