Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

"file" provisioner doesn't detect changes when copying a directory #6065

Open
stu-smith opened this issue Apr 7, 2016 · 8 comments
Open

Comments

@stu-smith
Copy link

If I use a "file" provisioner to copy a directory:

  provisioner "file" {
    source = "my-dir/"
    destination = "/home/ec2-user"

    connection {
      ...
    }
  }

Changes to files within that directory do not trigger a rebuild - or in other words, terraform plan says there are no changes to make.

(Currently using 0.6.11)

@carlosonunez
Copy link

carlosonunez commented Jan 2, 2017

+1. I've experienced this as well. I added a file provisioner to an aws_instance resource, and terraform didn't pick up the changes.

I think that this is by design as per the documentation on provisioners, but it would be nice to be able to have provisioners run without destroying infrastructure for things that can't easily be managed by configuration management tools like CoreOS instances.

@philippevk
Copy link

philippevk commented Feb 21, 2017

Here's a module I use in my project as a workaround. It has the following dependencies:

  • jq
  • bash
  • md5sum, tar, cut, cat

usage.tf

module "myinstance__pathsync" {
  source = "./utils/pathsync"

  local_path = "some-dir/"
  remote_path = "/etc/some-dir"

  host = "${aws_instance.myinstance.public_ip}"
  user = "..."
  private_key = "..."
}

pathsync.tf

variable "local_path" { type = "string" }
variable "remote_path" { type = "string" }

variable "host" { type = "string" }
variable "user" { type = "string" }
variable "private_key" { type = "string" }

resource "null_resource" "provisioner_container" {
  triggers {
    host = "${var.host}"
    md5 = "${data.external.md5path.result.md5}"
  }
  connection {
    host = "${var.host}"
    user = "${var.user}"
    private_key = "${var.private_key}"
  }
  provisioner "file" {
    source = "${var.local_path}"
    destination = "${var.remote_path}"
  }
}

data "external" "md5path" {
  program = ["bash", "${path.module}/md5path.sh"]
  query = { path = "${var.local_path}" }
}

md5path.sh

#!/bin/bash
set -ueo pipefail
query_path=$(cat | jq -r '.path')
md5=$(tar -cf - $query_path | md5sum | cut -d' ' -f1)
printf "{\\\"md5\\\":\\\"$md5\\\"}"

@apparentlymart
Copy link
Contributor

Hi all! Sorry this wasn't as easy as it could've been.

It is actually by design that the file provisioner does not continually update files on the remote host, since provisioners in Terraform are different from resources and run only during the initial creation of a resource, which then allows them to take actions that might be unsafe to re-run.

In the case of the file provisioner there are of course some things you can do with it that would be safe to re-run, so I can see why you'd expect it to behave in this way. @philippevk's workaround illustrates that it's possible to model this problem with resources, which is the most likely way this use-case would get addressed in Terraform but we could make it convenient by providing a first-class resource for it:

### Hypothetical example. Not valid yet! ###
resource "ssh_file_tree" "example" {
  host        = "${var.host}"
  user        = "${var.user}"
  private_key = "${var.private_key}"

  source_dir      = "${var.local_path}"
  destination_dir = "${var.remote_path}"
}

It is unfortunately not as simple as just adding the above resource, since as defined there it would have the same problem as the provisioner: it would run only once on creation. To fix that, we must have a way for the configuration to include some description of the contents of the files, as @philippevk did with the external script to take an MD5 of a tar archive. You can see this same problem in the design of the aws_s3_bucket_object resource, where one must explicitly set the etag to an MD5 hash of the object contents so that Terraform can detect when the contents have changed, since Terraform is currently able to look only directly at configuration when looking for differences.

So there's some work to do to meet this use-case in a convenient way, but it does seem like a valid use-case to me. I expect this would lead also to requests to take some action after the files are uploaded (such as to send SIGHUP to a process to re-read a config file), but it's less obvious how to do that safely.

In the meantime the workaround of using a null_resource trigger to force the re-run of the provisioner is a valid (though inconvenient) workaround. A variant on that would be to use the archive_file data source in place of the external script running tar, since that too can produce a hash of an archive (zip, in this case) resulting from a directory, with one notable difference that it must also redundantly write the file to disk as part of doing that.

@Crapworks
Copy link

@apparentlymart I recently hit this issue and was wondering if there was any progress in the meantime? I am also not sure if my problem is the same that you addressed here:

I have a null_resource that has a file provisioner that copies a directory to the remote system (a bunch of docker compose files) and then runs the remote-exec provisioner to deploy the compose files to docker swarm.

If I run terraform apply again after chaning the compose files, no changes are detected, like described above. I then manually taint the null_resource which lead to only the remote-exec provisioner running, the files are NOT copied again. Is this the expected behaviour?

@apparentlymart
Copy link
Contributor

Hi @Crapworks,

Any time a resource is planned for replacement (-/+ in the plan output) then applying it should cause all of its provisioners to run again. If you've seen Terraform run only one of several provisioners defined on the same resource then that does indeed sound like a bug, separate from what's described in this issue. Please feel free to open a new issue for it, and hopefully with a real configuration example we can reproduce it and see what's going on.

@davewoodward
Copy link

Hi all! Sorry this wasn't as easy as it could've been.

It is actually by design that the file provisioner does not continually update files on the remote host, since provisioners in Terraform are different from resources and run only during the initial creation of a resource, which then allows them to take actions that might be unsafe to re-run.

In the case of the file provisioner there are of course some things you can do with it that would be safe to re-run, so I can see why you'd expect it to behave in this way. @philippevk's workaround illustrates that it's possible to model this problem with resources, which is the most likely way this use-case would get addressed in Terraform but we could make it convenient by providing a first-class resource for it:

### Hypothetical example. Not valid yet! ###
resource "ssh_file_tree" "example" {
  host        = "${var.host}"
  user        = "${var.user}"
  private_key = "${var.private_key}"

  source_dir      = "${var.local_path}"
  destination_dir = "${var.remote_path}"
}

It is unfortunately not as simple as just adding the above resource, since as defined there it would have the same problem as the provisioner: it would run only once on creation. To fix that, we must have a way for the configuration to include some description of the contents of the files, as @philippevk did with the external script to take an MD5 of a tar archive. You can see this same problem in the design of the aws_s3_bucket_object resource, where one must explicitly set the etag to an MD5 hash of the object contents so that Terraform can detect when the contents have changed, since Terraform is currently able to look only directly at configuration when looking for differences.

So there's some work to do to meet this use-case in a convenient way, but it does seem like a valid use-case to me. I expect this would lead also to requests to take some action after the files are uploaded (such as to send SIGHUP to a process to re-read a config file), but it's less obvious how to do that safely.

In the meantime the workaround of using a null_resource trigger to force the re-run of the provisioner is a valid (though inconvenient) workaround. A variant on that would be to use the archive_file data source in place of the external script running tar, since that too can produce a hash of an archive (zip, in this case) resulting from a directory, with one notable difference that it must also redundantly write the file to disk as part of doing that.

I tried to using the archive_file data source to obtain a hash tied to the contents of a directory. My goal was to use the hash to trigger downstream resource updates as appropriate. Unfortunately, it looks like the only archive format supported by archive_file is zip and zip results in a different hash for the same contents every time it is run.

@gouraharidas
Copy link

I used the following trigger to execute the null_resource every time.

triggers { build_number = "${timestamp()}" }

@tv42
Copy link

tv42 commented May 25, 2020

The workaround above can only show the hash changing, for single files if you do

resource "null_resource" "foo" {
  triggers = {
    bar = data.local_file.bar.content
  }
}

data "local_file" "bar" {
  filename = "quux"
}

you get a diff at plan time.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

9 participants