-
Notifications
You must be signed in to change notification settings - Fork 278
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Do not delete / recreate cluster when changing node pool #424
Comments
Just dropping some notes here in case anyone else takes a look at this one...
--- a/digitalocean/resource_digitalocean_kubernetes_node_pool.go
+++ b/digitalocean/resource_digitalocean_kubernetes_node_pool.go
@@ -43,6 +43,9 @@ func nodePoolResourceSchema() map[string]*schema.Schema {
ForceNew: true,
}
+ // Size should force a new node pool but not a new cluster
+ s["size"].ForceNew = true
+
// remove the id when this is used in a specific resource
// not as a child
delete(s, "id")
@@ -65,7 +68,6 @@ func nodePoolSchema() map[string]*schema.Schema {
"size": {
Type: schema.TypeString,
Required: true,
- ForceNew: true,
ValidateFunc: validation.NoZeroValues,
},
Nodes are first drained before being deleted. So in theory, depending on the work load, this might not cause down time if the new pool is up first. Though we shouldn't make assumptions about what is running in the cluster.
See: hashicorp/terraform-plugin-sdk#459 and hashicorp/terraform-plugin-sdk@1e08e98#diff-5572654f34bded06e0d3e5eb9ed7d1bf
Using This will also complicates https://github.com/terraform-providers/terraform-provider-digitalocean/issues/303 |
This is also critical for our use case as well :-) Would it simplify anything to make the |
I have just tested and the |
Would love this feature as well |
I'd LOVE this change, for real |
Scaling the default node pool in general is a bit funky. If I change the size of the default node pool I am not getting a full recreation, but a lot of these
Since it can't connect to the node pool anymore. I tried creating a separate node pool with I don't mind doing manual steps like manually adding the new nodepool then removing the old one, but I couldn't find a proper way to tell the provider that this new node pool should be the default one |
This is a real buzz killer that you need to recreate control plane to resize default node pool. I'd like to be able to create kubernetes cluster without default node pool, there is no way to perform any stable updates with current setup... Seems like this issue here is for very long can someone from @digitalocean take a look into that ? |
As a workaround, for Jenkins Infrastructure we went on a cluster with a minimal node pool just for keeping our cluster safe, and another autoscaled node pool which can be scaled and modified at ease. |
If somebody doesn't like to burn resources I've figured out how to do it :). According to the docs (https://registry.terraform.io/providers/digitalocean/digitalocean/latest/docs/resources/kubernetes_node_pool) Note: If the node pool has the terraform:default-node-pool tag, then it is a default node pool for an existing cluster.
The provider will refuse to import the node pool in that case because the node pool is managed by the digitalocean_kubernetes_cluster resource and not by this digitalocean_kubernetes_node_pool resource. So I ended up writting code like that resource "digitalocean_kubernetes_cluster" "cluster" {
name = var.name
region = var.region
auto_upgrade = true
version = data.digitalocean_kubernetes_versions.version.latest_version
vpc_uuid = var.vpc_uuid
surge_upgrade = false
node_pool {
name = format("%s-hacky", local.node_pool_name)
size = "s-1vcpu-1gb"
node_count = 1
}
maintenance_policy {
start_time = "04:00"
day = "monday"
}
lifecycle {
ignore_changes = [
node_pool.0
]
}
}
resource "null_resource" "remove_cluster_node_pool" {
triggers = {
cluster_id = digitalocean_kubernetes_cluster.cluster.id
}
provisioner "local-exec" {
command = format("%s/remove-default-node-pool.sh %s", path.module, digitalocean_kubernetes_cluster.cluster.id)
}
}
resource "digitalocean_kubernetes_node_pool" "default_node_pool" {
cluster_id = digitalocean_kubernetes_cluster.cluster.id
name = local.node_pool_name
size = var.size
auto_scale = true
min_nodes = 1
max_nodes = 1
tags = local.tags
} which creates node pool together with default node pool and after that null resource runs a script #!/bin/bash
DOCTL_VERSION="1.70.0"
cluster_name="$1"
wget "https://github.com/digitalocean/doctl/releases/download/v${DOCTL_VERSION}/doctl-${DOCTL_VERSION}-linux-amd64.tar.gz"
tar xf "doctl-${DOCTL_VERSION}-linux-amd64.tar.gz"
if ./doctl --access-token $DIGITALOCEAN_TOKEN kubernetes cluster node-pool list "${cluster_name}" | grep -v 'hacky'; then
node_pool_id=$(./doctl --access-token $DIGITALOCEAN_TOKEN kubernetes cluster node-pool list "${cluster_name}" | grep 'hacky' | awk '{print $1}')
./doctl --access-token $DIGITALOCEAN_TOKEN kubernetes cluster node-pool "${cluster_name}" "${node_pool_id}" --force
fi it's just imporant that your |
Interesting, thank very much for the detailed information @mkjmdski ! |
up |
This would be preferable, as we may create a K8s cluster with some small shared-CPU instances, then need to scale up to general purpose nodes. |
I was able to work around this manually by:
I had some issues with the node_pool size being 0 but it may be because I renamed my node pool after creating it. |
@ChiefMateStarbuck what is the status of this at the moment? There are currently a lot of issues regarding this problem. |
When you create a Kubernetes cluster with one node pool, and then afterward change the node pool properties (e.g.
size
), it will delete the entire cluster.This is similar to this issue. If you believe this is a duplicate issue, feel free to close this issue.
This behaviour is problematic because it messes up any authentication you might have, among other things.
It would be great if this would add a node pool, then delete the previous node pool, without deleting the cluster. This is possible via the DigitalOcean UI, though I don't know if there's a technical reason this behaviour isn't implemented or hard to implement.
Thanks!
The text was updated successfully, but these errors were encountered: