Terraform module for connecting a GKE cluster to CAST AI

Requirements

Terraform 0.13+

Using the module

A module to connect a GKE cluster to CAST AI.

Requires castai/castai and hashicorp/google providers to be configured.

For Phase 2 onboarding credentials from terraform-gke-iam are required

module "castai_gke_cluster" {
  source = "castai/gke-cluster/castai"

  project_id           = var.project_id
  gke_cluster_name     = var.cluster_name
  gke_cluster_location = module.gke.location # cluster region or zone

  gke_credentials            = module.castai_gke_iam.private_key
  delete_nodes_on_disconnect = var.delete_nodes_on_disconnect
  autoscaler_policies_json   = var.autoscaler_policies_json

  default_node_configuration = module.castai_gke_cluster.node_configurations["default"]

  node_configurations = {
    default = {
      disk_cpu_ratio = 25
      subnets        = [module.vpc.subnets_ids[0]]
      tags = {
        "node-config" : "default"
      }

      max_pods_per_node = 110
      network_tags      = ["dev"]
      disk_type         = "pd-balanced"

    }
  }
  node_templates = {
    spot_tmpl = {
      configuration_id = module.castai_gke_cluster.node_configurations["default"]

      should_taint = true

      custom_labels = {
        custom-label-key-1 = "custom-label-value-1"
        custom-label-key-2 = "custom-label-value-2"
      }

      custom_taints = [
        {
          key   = "custom-taint-key-1"
          value = "custom-taint-value-1"
        },
        {
          key   = "custom-taint-key-2"
          value = "custom-taint-value-2"
        }
      ]

      constraints = {
        fallback_restore_rate_seconds = 1800
        spot                          = true
        use_spot_fallbacks            = true
        min_cpu                       = 4
        max_cpu                       = 100
        instance_families = {
          exclude = ["e2"]
        }
        compute_optimized_state = "disabled"
        storage_optimized_state = "disabled"
        is_gpu_only             = false
        architectures           = ["amd64"]
      }

      custom_instances_enabled                      = true
      custom_instances_with_extended_memory_enabled = true
    }
  }

  autoscaler_settings = {
    enabled                                 = true
    node_templates_partial_matching_enabled = false

    unschedulable_pods = {
      enabled = true

      headroom = {
        enabled           = true
        cpu_percentage    = 10
        memory_percentage = 10
      }

      headroom_spot = {
        enabled           = true
        cpu_percentage    = 10
        memory_percentage = 10
      }
    }

    node_downscaler = {
      enabled = true

      empty_nodes = {
        enabled = true
      }

      evictor = {
        aggressive_mode           = false
        cycle_interval            = "5s10s"
        dry_run                   = false
        enabled                   = true
        node_grace_period_minutes = 10
        scoped_mode               = false
      }
    }

    cluster_limits = {
      enabled = true

      cpu = {
        max_cores = 20
        min_cores = 1
      }
    }
  }
}

Migrating from 3.x.x to 4.x.x

Version 4.x.x changes:

Removed custom_label attribute in castai_node_template resource. Use custom_labels instead.

Old configuration:

module "castai-gke-cluster" {
  node_templates = {
    spot_tmpl = {
      custom_label = {
        key = "custom-label-key-1"
        value = "custom-label-value-1"
      }
    }
  }
}

New configuration:

module "castai-gke-cluster" {
  node_templates = {
    spot_tmpl = {
      custom_labels = {
        custom-label-key-1 = "custom-label-value-1"
      }
    }
  }
}

Migrating from 4.x.x to 5.x.x

Version 5.x.x changed:

Removed compute_optimized and storage_optimized attributes in castai_node_template resource, constraints object. Use compute_optimized_state and storage_optimized_state instead.

Old configuration:

module "castai-gke-cluster" {
  node_templates = {
    spot_tmpl = {
      constraints = {
        compute_optimized = false
        storage_optimized = true
      }
    }
  }
}

New configuration:

module "castai-gke-cluster" {
  node_templates = {
    spot_tmpl = {
      constraints = {
        compute_optimized_state = "disabled"
        storage_optimized_state = "enabled"
      }
    }
  }
}

Migrating from 6.1.x to 6.3.x

Version 6.3.x changed:

Deprecated autoscaler_policies_json attribute. Use autoscaler_settings instead.

Old configuration:

module "castai-gke-cluster" {
  autoscaler_policies_json = <<-EOT
    {
        "enabled": true,
        "unschedulablePods": {
            "enabled": true
        },
        "nodeDownscaler": {
            "enabled": true,
            "emptyNodes": {
                "enabled": true
            },
            "evictor": {
                "aggressiveMode": false,
                "cycleInterval": "5m10s",
                "dryRun": false,
                "enabled": true,
                "nodeGracePeriodMinutes": 10,
                "scopedMode": false
            }
        },
        "nodeTemplatesPartialMatchingEnabled": false,
        "clusterLimits": {
            "cpu": {
                "maxCores": 20,
                "minCores": 1
            },
            "enabled": true
        }
    }
  EOT
}

New configuration:

module "castai-gke-cluster" {
  autoscaler_settings = {
    enabled                                 = true
    node_templates_partial_matching_enabled = false

    unschedulable_pods = {
      enabled = true
    }

    node_downscaler = {
      enabled = true

      empty_nodes = {
        enabled = true
      }

      evictor = {
        aggressive_mode           = false
        cycle_interval            = "5m10s"
        dry_run                   = false
        enabled                   = true
        node_grace_period_minutes = 10
        scoped_mode               = false
      }
    }

    cluster_limits = {
      enabled = true

      cpu = {
        max_cores = 20
        min_cores = 1
      }
    }
  }
}

Examples

Usage examples are located in terraform provider repo

Requirements

Name	Version
terraform	>= 0.13
castai	~> 7.17
google	>= 2.49
helm	>= 2.0.0

Providers

Name	Version
castai	~> 7.17
helm	>= 2.0.0
null	n/a

Modules

No modules.

Resources

Name	Type
castai_autoscaler.castai_autoscaler_policies	resource
castai_gke_cluster.castai_cluster	resource
castai_node_configuration.this	resource
castai_node_configuration_default.this	resource
castai_node_template.this	resource
castai_workload_scaling_policy.this	resource
helm_release.castai_agent	resource
helm_release.castai_cloud_proxy	resource
helm_release.castai_cluster_controller	resource
helm_release.castai_cluster_controller_self_managed	resource
helm_release.castai_evictor	resource
helm_release.castai_evictor_ext	resource
helm_release.castai_evictor_self_managed	resource
helm_release.castai_kvisor	resource
helm_release.castai_kvisor_self_managed	resource
helm_release.castai_pod_pinner	resource
helm_release.castai_pod_pinner_self_managed	resource
helm_release.castai_spot_handler	resource
helm_release.castai_workload_autoscaler	resource
helm_release.castai_workload_autoscaler_self_managed	resource
null_resource.wait_for_cluster	resource

Inputs

Name	Description	Type	Default	Required
agent_values	List of YAML formatted string values for agent helm chart	`list(string)`	`[]`	no
agent_version	Version of castai-agent helm chart. Default latest	`string`	`null`	no
api_grpc_addr	CAST AI GRPC API address	`string`	`"api-grpc.cast.ai:443"`	no
api_url	URL of alternative CAST AI API to be used during development or testing	`string`	`"https://api.cast.ai"`	no
autoscaler_policies_json	Optional json object to override CAST AI cluster autoscaler policies. Deprecated, use `autoscaler_settings` instead.	`string`	`null`	no
autoscaler_settings	Optional Autoscaler policy definitions to override current autoscaler settings	`any`	`null`	no
castai_api_token	Optional CAST AI API token created in console.cast.ai API Access keys section. Used only when `wait_for_cluster_ready` is set to true	`string`	`""`	no
castai_components_labels	Optional additional Kubernetes labels for CAST AI pods	`map(any)`	`{}`	no
cloud_proxy_grpc_url_override	Override for the castai-cloud-proxy gRPC URL	`string`	`null`	no
cloud_proxy_values	List of YAML formatted strings with castai-cloud-proxy values	`list(string)`	`[]`	no
cloud_proxy_version	Version of the castai-cloud-proxy Helm chart. Defaults to latest.	`string`	`null`	no
cluster_controller_values	List of YAML formatted string values for cluster-controller helm chart	`list(string)`	`[]`	no
cluster_controller_version	Version of castai-cluster-controller helm chart. Default latest	`string`	`null`	no
default_node_configuration	ID of the default node configuration	`string`	`""`	no
default_node_configuration_name	Name of the default node configuration	`string`	`""`	no
delete_nodes_on_disconnect	Optionally delete Cast AI created nodes when the cluster is destroyed	`bool`	`false`	no
evictor_ext_values	List of YAML formatted string with evictor-ext values	`list(string)`	`[]`	no
evictor_ext_version	Version of castai-evictor-ext chart. Default latest	`string`	`null`	no
evictor_values	List of YAML formatted string values for evictor helm chart	`list(string)`	`[]`	no
evictor_version	Version of castai-evictor chart. Default latest	`string`	`null`	no
gke_cluster_location	Location of the cluster to be connected to CAST AI. Can be region or zone for zonal clusters	`string`	n/a	yes
gke_cluster_name	Name of the cluster to be connected to CAST AI.	`string`	n/a	yes
gke_credentials	Optional GCP Service account credentials.json	`string`	n/a	yes
grpc_url	gRPC endpoint used by pod-pinner	`string`	`"grpc.cast.ai:443"`	no
install_cloud_proxy	Optional flag for installation of castai-cloud-proxy	`bool`	`false`	no
install_security_agent	Optional flag for installation of security agent (https://docs.cast.ai/product-overview/console/security-insights/)	`bool`	`false`	no
install_workload_autoscaler	Optional flag for installation of workload autoscaler (https://docs.cast.ai/docs/workload-autoscaling-configuration)	`bool`	`false`	no
kvisor_controller_extra_args	Extra arguments for the kvisor controller. Optionally enable kvisor to lint Kubernetes YAML manifests, scan workload images and check if workloads pass CIS Kubernetes Benchmarks as well as NSA, WASP and PCI recommendations.	`map(string)`	{ "image-scan-enabled": "true", "kube-bench-enabled": "true", "kube-linter-enabled": "true" }	no
kvisor_values	List of YAML formatted string values for kvisor helm chart	`list(string)`	`[]`	no
kvisor_version	Version of kvisor chart. If not provided, latest version will be used.	`string`	`null`	no
node_configurations	Map of GKE node configurations to create	`any`	`{}`	no
node_templates	Map of node templates to create	`any`	`{}`	no
pod_pinner_values	List of YAML formatted string values for agent helm chart	`list(string)`	`[]`	no
pod_pinner_version	Version of pod-pinner helm chart. Default latest	`string`	`null`	no
project_id	The project id from GCP	`string`	n/a	yes
self_managed	Whether CAST AI components' upgrades are managed by a customer; by default upgrades are managed CAST AI central system.	`bool`	`false`	no
spot_handler_values	List of YAML formatted string values for spot-handler helm chart	`list(string)`	`[]`	no
spot_handler_version	Version of castai-spot-handler helm chart. Default latest	`string`	`null`	no
wait_for_cluster_ready	Wait for cluster to be ready before finishing the module execution, this option requires `castai_api_token` to be set	`bool`	`false`	no
workload_autoscaler_values	List of YAML formatted string with cluster-workload-autoscaler values	`list(string)`	`[]`	no
workload_autoscaler_version	Version of castai-workload-autoscaler helm chart. Default latest	`string`	`null`	no
workload_scaling_policies	Map of workload scaling policies to create	`any`	`{}`	no

Outputs

Name	Description
castai_node_configurations	Map of node configurations ids by name
castai_node_templates	Map of node template by name
cluster_id	CAST.AI cluster id, which can be used for accessing cluster data using API

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

Terraform module for connecting a GKE cluster to CAST AI

Requirements

Using the module

Migrating from 3.x.x to 4.x.x

Migrating from 4.x.x to 5.x.x

Migrating from 6.1.x to 6.3.x

Examples

Requirements

Providers

Modules

Resources

Inputs

Outputs

Files

README.md

Latest commit

History

README.md

File metadata and controls

Terraform module for connecting a GKE cluster to CAST AI

Requirements

Using the module

Migrating from 3.x.x to 4.x.x

Migrating from 4.x.x to 5.x.x

Migrating from 6.1.x to 6.3.x

Examples

Requirements

Providers

Modules

Resources

Inputs

Outputs