Website: https://www.cast.ai
- Terraform 0.13+
A module to connect a GKE cluster to CAST AI.
Requires castai/castai
and hashicorp/google
providers to be configured.
For Phase 2 onboarding credentials from terraform-gke-iam
are required
module "castai_gke_cluster" {
source = "castai/gke-cluster/castai"
project_id = var.project_id
gke_cluster_name = var.cluster_name
gke_cluster_location = module.gke.location # cluster region or zone
gke_credentials = module.castai_gke_iam.private_key
delete_nodes_on_disconnect = var.delete_nodes_on_disconnect
autoscaler_policies_json = var.autoscaler_policies_json
default_node_configuration = module.castai_gke_cluster.node_configurations["default"]
node_configurations = {
default = {
disk_cpu_ratio = 25
subnets = [module.vpc.subnets_ids[0]]
tags = {
"node-config" : "default"
}
max_pods_per_node = 110
network_tags = ["dev"]
disk_type = "pd-balanced"
}
}
node_templates = {
spot_tmpl = {
configuration_id = module.castai_gke_cluster.node_configurations["default"]
should_taint = true
custom_labels = {
custom-label-key-1 = "custom-label-value-1"
custom-label-key-2 = "custom-label-value-2"
}
custom_taints = [
{
key = "custom-taint-key-1"
value = "custom-taint-value-1"
},
{
key = "custom-taint-key-2"
value = "custom-taint-value-2"
}
]
constraints = {
fallback_restore_rate_seconds = 1800
spot = true
use_spot_fallbacks = true
min_cpu = 4
max_cpu = 100
instance_families = {
exclude = ["e2"]
}
compute_optimized_state = "disabled"
storage_optimized_state = "disabled"
is_gpu_only = false
architectures = ["amd64"]
}
custom_instances_enabled = true
custom_instances_with_extended_memory_enabled = true
}
}
autoscaler_settings = {
enabled = true
node_templates_partial_matching_enabled = false
unschedulable_pods = {
enabled = true
headroom = {
enabled = true
cpu_percentage = 10
memory_percentage = 10
}
headroom_spot = {
enabled = true
cpu_percentage = 10
memory_percentage = 10
}
}
node_downscaler = {
enabled = true
empty_nodes = {
enabled = true
}
evictor = {
aggressive_mode = false
cycle_interval = "5s10s"
dry_run = false
enabled = true
node_grace_period_minutes = 10
scoped_mode = false
}
}
cluster_limits = {
enabled = true
cpu = {
max_cores = 20
min_cores = 1
}
}
}
}
Version 4.x.x changes:
- Removed
custom_label
attribute incastai_node_template
resource. Usecustom_labels
instead.
Old configuration:
module "castai-gke-cluster" {
node_templates = {
spot_tmpl = {
custom_label = {
key = "custom-label-key-1"
value = "custom-label-value-1"
}
}
}
}
New configuration:
module "castai-gke-cluster" {
node_templates = {
spot_tmpl = {
custom_labels = {
custom-label-key-1 = "custom-label-value-1"
}
}
}
}
Version 5.x.x changed:
- Removed
compute_optimized
andstorage_optimized
attributes incastai_node_template
resource,constraints
object. Usecompute_optimized_state
andstorage_optimized_state
instead.
Old configuration:
module "castai-gke-cluster" {
node_templates = {
spot_tmpl = {
constraints = {
compute_optimized = false
storage_optimized = true
}
}
}
}
New configuration:
module "castai-gke-cluster" {
node_templates = {
spot_tmpl = {
constraints = {
compute_optimized_state = "disabled"
storage_optimized_state = "enabled"
}
}
}
}
Version 6.3.x changed:
- Deprecated
autoscaler_policies_json
attribute. Useautoscaler_settings
instead.
Old configuration:
module "castai-gke-cluster" {
autoscaler_policies_json = <<-EOT
{
"enabled": true,
"unschedulablePods": {
"enabled": true
},
"nodeDownscaler": {
"enabled": true,
"emptyNodes": {
"enabled": true
},
"evictor": {
"aggressiveMode": false,
"cycleInterval": "5m10s",
"dryRun": false,
"enabled": true,
"nodeGracePeriodMinutes": 10,
"scopedMode": false
}
},
"nodeTemplatesPartialMatchingEnabled": false,
"clusterLimits": {
"cpu": {
"maxCores": 20,
"minCores": 1
},
"enabled": true
}
}
EOT
}
New configuration:
module "castai-gke-cluster" {
autoscaler_settings = {
enabled = true
node_templates_partial_matching_enabled = false
unschedulable_pods = {
enabled = true
}
node_downscaler = {
enabled = true
empty_nodes = {
enabled = true
}
evictor = {
aggressive_mode = false
cycle_interval = "5m10s"
dry_run = false
enabled = true
node_grace_period_minutes = 10
scoped_mode = false
}
}
cluster_limits = {
enabled = true
cpu = {
max_cores = 20
min_cores = 1
}
}
}
}
Usage examples are located in terraform provider repo
Name | Version |
---|---|
terraform | >= 0.13 |
castai | ~> 7.17 |
>= 2.49 | |
helm | >= 2.0.0 |
Name | Version |
---|---|
castai | ~> 7.17 |
helm | >= 2.0.0 |
null | n/a |
No modules.
Name | Type |
---|---|
castai_autoscaler.castai_autoscaler_policies | resource |
castai_gke_cluster.castai_cluster | resource |
castai_node_configuration.this | resource |
castai_node_configuration_default.this | resource |
castai_node_template.this | resource |
castai_workload_scaling_policy.this | resource |
helm_release.castai_agent | resource |
helm_release.castai_cloud_proxy | resource |
helm_release.castai_cluster_controller | resource |
helm_release.castai_cluster_controller_self_managed | resource |
helm_release.castai_evictor | resource |
helm_release.castai_evictor_ext | resource |
helm_release.castai_evictor_self_managed | resource |
helm_release.castai_kvisor | resource |
helm_release.castai_kvisor_self_managed | resource |
helm_release.castai_pod_pinner | resource |
helm_release.castai_pod_pinner_self_managed | resource |
helm_release.castai_spot_handler | resource |
helm_release.castai_workload_autoscaler | resource |
helm_release.castai_workload_autoscaler_self_managed | resource |
null_resource.wait_for_cluster | resource |
Name | Description | Type | Default | Required |
---|---|---|---|---|
agent_values | List of YAML formatted string values for agent helm chart | list(string) |
[] |
no |
agent_version | Version of castai-agent helm chart. Default latest | string |
null |
no |
api_grpc_addr | CAST AI GRPC API address | string |
"api-grpc.cast.ai:443" |
no |
api_url | URL of alternative CAST AI API to be used during development or testing | string |
"https://api.cast.ai" |
no |
autoscaler_policies_json | Optional json object to override CAST AI cluster autoscaler policies. Deprecated, use autoscaler_settings instead. |
string |
null |
no |
autoscaler_settings | Optional Autoscaler policy definitions to override current autoscaler settings | any |
null |
no |
castai_api_token | Optional CAST AI API token created in console.cast.ai API Access keys section. Used only when wait_for_cluster_ready is set to true |
string |
"" |
no |
castai_components_labels | Optional additional Kubernetes labels for CAST AI pods | map(any) |
{} |
no |
cloud_proxy_grpc_url_override | Override for the castai-cloud-proxy gRPC URL | string |
null |
no |
cloud_proxy_values | List of YAML formatted strings with castai-cloud-proxy values | list(string) |
[] |
no |
cloud_proxy_version | Version of the castai-cloud-proxy Helm chart. Defaults to latest. | string |
null |
no |
cluster_controller_values | List of YAML formatted string values for cluster-controller helm chart | list(string) |
[] |
no |
cluster_controller_version | Version of castai-cluster-controller helm chart. Default latest | string |
null |
no |
default_node_configuration | ID of the default node configuration | string |
"" |
no |
default_node_configuration_name | Name of the default node configuration | string |
"" |
no |
delete_nodes_on_disconnect | Optionally delete Cast AI created nodes when the cluster is destroyed | bool |
false |
no |
evictor_ext_values | List of YAML formatted string with evictor-ext values | list(string) |
[] |
no |
evictor_ext_version | Version of castai-evictor-ext chart. Default latest | string |
null |
no |
evictor_values | List of YAML formatted string values for evictor helm chart | list(string) |
[] |
no |
evictor_version | Version of castai-evictor chart. Default latest | string |
null |
no |
gke_cluster_location | Location of the cluster to be connected to CAST AI. Can be region or zone for zonal clusters | string |
n/a | yes |
gke_cluster_name | Name of the cluster to be connected to CAST AI. | string |
n/a | yes |
gke_credentials | Optional GCP Service account credentials.json | string |
n/a | yes |
grpc_url | gRPC endpoint used by pod-pinner | string |
"grpc.cast.ai:443" |
no |
install_cloud_proxy | Optional flag for installation of castai-cloud-proxy | bool |
false |
no |
install_security_agent | Optional flag for installation of security agent (https://docs.cast.ai/product-overview/console/security-insights/) | bool |
false |
no |
install_workload_autoscaler | Optional flag for installation of workload autoscaler (https://docs.cast.ai/docs/workload-autoscaling-configuration) | bool |
false |
no |
kvisor_controller_extra_args | Extra arguments for the kvisor controller. Optionally enable kvisor to lint Kubernetes YAML manifests, scan workload images and check if workloads pass CIS Kubernetes Benchmarks as well as NSA, WASP and PCI recommendations. | map(string) |
{ |
no |
kvisor_values | List of YAML formatted string values for kvisor helm chart | list(string) |
[] |
no |
kvisor_version | Version of kvisor chart. If not provided, latest version will be used. | string |
null |
no |
node_configurations | Map of GKE node configurations to create | any |
{} |
no |
node_templates | Map of node templates to create | any |
{} |
no |
pod_pinner_values | List of YAML formatted string values for agent helm chart | list(string) |
[] |
no |
pod_pinner_version | Version of pod-pinner helm chart. Default latest | string |
null |
no |
project_id | The project id from GCP | string |
n/a | yes |
self_managed | Whether CAST AI components' upgrades are managed by a customer; by default upgrades are managed CAST AI central system. | bool |
false |
no |
spot_handler_values | List of YAML formatted string values for spot-handler helm chart | list(string) |
[] |
no |
spot_handler_version | Version of castai-spot-handler helm chart. Default latest | string |
null |
no |
wait_for_cluster_ready | Wait for cluster to be ready before finishing the module execution, this option requires castai_api_token to be set |
bool |
false |
no |
workload_autoscaler_values | List of YAML formatted string with cluster-workload-autoscaler values | list(string) |
[] |
no |
workload_autoscaler_version | Version of castai-workload-autoscaler helm chart. Default latest | string |
null |
no |
workload_scaling_policies | Map of workload scaling policies to create | any |
{} |
no |
Name | Description |
---|---|
castai_node_configurations | Map of node configurations ids by name |
castai_node_templates | Map of node template by name |
cluster_id | CAST.AI cluster id, which can be used for accessing cluster data using API |