Module-aware explicit dependencies

Terraform currently allows the declaration of explicit inter-resource dependencies using `depends_on`:

```hcl
resource "example" "example1" {
}

resource "example" "example2" {
  depends_on = ["example.example1"]
}
```

The presence of the `depends_on` in the above example causes the graph builder to create a dependency edge from `example2` to `example1`, which ensures that `example1` is visited first during any graph traversal.

This mechanism does not generalize to other constructs within Terraform. In particular, it doesn't generalize to modules, since a module is not represented as a single node in the graph. Instead, each individual `variable` and `output` in a module is its own graph node, which allows us to optimize our parallelism by getting started on _some_ aspects of a module before all of the input variables are ready, and to begin processing resources that _depend_ on a module before all of its outputs are complete. Even though variables and outputs _are_ in the graph, we do not currently support referring to them in `depends_on`.

The following proposal describes a generalization of the `depends_on` mechanism to apply to both resources and modules, with the goal of satisfying the use-cases discussed in #10462, allowing explicit dependencies on module variables and outputs, along with a syntax that creates the _effect_ of an entire-module dependency.

---

## New addressing forms for `depends_on`

We currently allow references to managed and data resources in `depends_on`. To support dependencies with modules, we must extend this to support the following forms:

* `aws_instance.example` - managed resource dependency, as today
* `aws_instance.another_example[2]` - a particular instance of a managed resource with `count` set
* `data.template_file.example` - data resource dependency, as today
* `var.foo` - dependency on an input variable passed by a parent module
* `module.example.foo` - dependency on an output of a named child module
* `module.example` - dependency on an entire module

Our improved configuration language parser (which, at the time of writing, is in the process of being integrated into Terraform Core) allows us to improve the `depends_on` syntax through direct use of expressions, rather than requiring these references to be inside quoted strings:

```hcl
# DESIGN SKETCH: not yet implemented and may change before release

resource "example" "example2" {
  depends_on = [
    aws_instance.example,
    aws_instance.another_example[2]
    data.template_file.example,
    var.foo,
    module.example.foo,
    module.example,
  ]
}
```

This syntax will be used for the examples in the remainder of this proposal.

## Support `depends_on` as a `module` block argument

The above allows modules to be used as explicit dependencies, but we need to additionally support `depends_on` inside `module` blocks in order to allow _modules_ to have dependencies:

```hcl
# DESIGN SKETCH: not yet implemented and may change before release

module "example" {
  depends_on = [
    aws_instance.example,
  ]
}
```

## Depending on a Module Variable

At first glance, an explicit dependency on a `var.foo` expression feels a little strange: variables don't have externally-visible side-effects, so it's strange to want to depend on them without using their result.

However, allowing explicit dependencies on variables creates a mechanism for the author of a more-complex reusable module to create custom `depends_on`-like attributes that serve to block _subsets_ of the functionality of the module. For example:

```hcl
# DESIGN SKETCH: not yet implemented and may change before release

### in root module

module "database" {
}

module "app" {
  ami_id = "ami-1234"
  app_server_depends_on = [
    module.database,
  ]
}

### in module "app"

variable "app_server_depends_on" {
  default = []
}

resource "aws_security_group" "foo" {
  # Work on _this_ resource can begin immediately
  # ...
}


resource "aws_instance" "app_server" {
  ami = var.ami_id
  # ...

  # We can't create this resource until the caller tells us that it's
  # prepared some hidden dependencies.
  depends_on = [
    var.app_server_depends_on,
  ]
}
```

This makes it possible to create a re-usable module for deploying arbitrary applications (parameterized by an AMI to deploy, etc), which can immediately create supporting resources like the security group in this example, but defer creating the actual compute resources until some arbitrary, caller-defined dependencies have been dealt with. The caller knows that `ami-1234` expects to have a database available to it on boot, while the re-usable module has no direct knowledge of that database.

The actual _value_ of `app_server_depends_on` in the above example is not actually significant. Instead, we effectively pass the _dependencies_ of that expression through to the module by creating a transitive dependency relationship in the graph.

## Depending on a Whole Module

As noted above, modules are not represented directly by graph nodes today, so whole-module dependencies (either as dependencies or dependents) require some new graph-building functionality.

The most likely user intent for a dependency of the form `module.example` is to wait until _everything_ in the module has completed before continuing. This behavior would have a severe impact on Terraform's ability to achieve parallelism though, and so this proposal suggests a compromise for when `depends_on` references a whole module: treat this as an alias for depending on each of the module's outputs, but not on any resources or nested modules.

![Terraform graph where a nested module called "example" has two resources, example1 and example2, where only example1 is a dependency of the module's outputs](https://user-images.githubusercontent.com/20180/34902152-9400045c-f7ca-11e7-855d-468396834f5a.png)

The biggest consequence of this compromise is that in the above example `null_resource.example` will block until `module.example.null_resource.example2` is complete, but will not wait for `module.example.null_resource.example3` because none of the module's outputs depend on that resource.

This consequence gives a measure of flexibility and control for the module author, however: if the author knows that the module performs a time-consuming operation but that this operation does not block access to the objects that the caller will depend on then this can be expressed by making that operation _not_ be a dependency of the outputs. From the module _caller's_ perspective, the module can still be thought of as a black box, with the module author designing it such that all significant effects of the module are referenced in an output. In effect, the module author uses `output` blocks to define what it means for the module to be considered "complete".

The improved configuration language, whose integration is in progress as we write this, allows passing the result of an entire module as a value into another module:

```hcl
# DESIGN SKETCH: not yet implemented and may change before release

### root module
module "example1" {
}
module "example2" {
  example1 = module.example1
}

### module example1

output "id" {
  value = "placeholder-id"
}

### module example2

variable "example1" {
}

resource "null_resource" "example" {
  triggers = {
    example1_id = var.example1.id
  }
}

```

This new usage creates an _implicit_ dependency between `module.example2.var.example1` and all of the outputs of `module.example1`, since they must all be complete before the language runtime can construct the value of `module.example1` to assign. This implicit usage further reinforces the idea that only the outputs are dependencies in this case, because that is what is necessary to construct the object value returned by `module.example1`.

## Whole-module `depends_on`

Using `depends_on` in a `module` block will also limit parallelism, but the impact is less severe in this case because the effect is under the direct control of the _caller_ module, and so its author can make a tradeoff to decide at what point the limited parallelism hurts enough to warrant more precise dependency handling:

```hcl
# DESIGN SKETCH: not yet implemented and may change before release

### root module

variable "baz" {
}

resource "null_resource" "example1" {
  triggers = {
    example = "hello"
  }
}

module "example" {
  foo = var.baz

  depends_on = [
    null_resource.example1,
  ]
}

### module "example"

variable "foo" {
}

resource "null_resource.example2" {
  triggers = {
    foo = var.foo
  }
}

resource "null_resource.example3" {
}

module "example2" {
}

### module "example2"

resource "null_resource.example4" {
}
```

![](https://user-images.githubusercontent.com/20180/34902408-a936e91c-f7cf-11e7-8212-caf745cd33b9.png)

Dependencies _away_ from the module require the creation of a new "begin" graph node for the module that declares `depends_on`, which must then be a dependency of every resource in the module _and_ of any downstream modules. To reduce the number of graph edges, a "begin" node will be created for each of the downstream modules too, so that only one additional edge needs to be added _between_ the modules (to connect the "begin" nodes).

A "begin" graph node takes no action when visited during a walk and so just serves as an aggregation point to reduce the number of dependency edges. For a `module` block without `depends_on` the "begin" graph node can be safely optimized away, along with its incoming dependency edges, during graph construction.

## `depends_on` in other contexts

`depends_on` can be useful for any Terraform construct that causes externally-visible side-effects, as a means to influence the ordering of those side-effects.

Provider initialization also sometimes has side effects, such as reaching out to an external network service to begin a session or to validate credentials. `depends_on` could therefore also be useful in `provider` blocks, as described in #2430. However, providers are special in that they need to be instantiated in _all_ phases of Terraform's operation, and thus it is not always possible to force an ordering for provider initialization relative to resource creation as described in #4149. Implementation of `depends_on` for modules should not block on the implementation of "partial apply", but we should reserve the `depends_on` argument for `provider` blocks as part of implementing _this_ proposal to minimize the risk that a provider in the wild will introduce its own `depends_on` configuration argument that would then be in conflict.

`output`, `variable` and `locals` blocks do not have any externally-visible side-effects and so `depends_on` would not serve any useful purpose for these blocks; it is always safe to evaluate the corresponding graph nodes as soon as their implicit dependencies become ready.

`provisioner` blocks within managed resources are not currently represented as separate graph nodes, and so they are processed as part of a create action for their parent resource node.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Module-aware explicit dependencies #17101

New addressing forms for `depends_on`

Support `depends_on` as a `module` block argument

Depending on a Module Variable

Depending on a Whole Module

Whole-module `depends_on`

`depends_on` in other contexts

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

Module-aware explicit dependencies #17101

Description

New addressing forms for depends_on

Support depends_on as a module block argument

Depending on a Module Variable

Depending on a Whole Module

Whole-module depends_on

depends_on in other contexts

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions

New addressing forms for `depends_on`

Support `depends_on` as a `module` block argument

Whole-module `depends_on`

`depends_on` in other contexts