Incorporate missing post plan terminal statuses to ensure plan is completely processed before attempting to confirm run #921

Uk1288 · 2023-06-06T14:39:23Z

Description

The changes in this PR checks potential terminal post plan statuses to ensure that a plan has been completely processed before proceeding to confirm the associated run. The terminal status changes depending on the post plan lifecycle that a run has. For example a plan can terminate at planned status or cost_estimated or policy_checked.

See issue for more details: #898

This PR depends on go-tfe PR#712

Testing plan

Workspace_run resource should create run as usual, the only way to produce the issue consistently is to force a run to remain in a status, e.g policy_checked, for a longer period of time such that workspace_run receives that status while polling the run. I achieved this by adding delays to each run state

External links

Blocked by go-tfe PR

Output from acceptance tests

$ TESTARGS="-run TestAccTFEWorkspaceRun" make testacc

...

brandonc · 2023-06-07T18:19:15Z

tfe/resource_tfe_workspace_run.go

+	tfe.RunQueuingApply:     true,
+}
+
+var policyOverridenStatuses = map[tfe.RunStatus]bool{


Suggested change

var policyOverridenStatuses = map[tfe.RunStatus]bool{

var policyOverriddenStatuses = map[tfe.RunStatus]bool{

brandonc · 2023-06-07T22:45:50Z

tfe/workspace_run_helpers.go

In general, I find your diagnosis and fix to be convincing... but I'm also finding the helpers to be difficult to read and reason about and feel like we should take a moment to simplify them. I identified these specific things that could make this module simpler:

syntactically, I think there are two problems with awaitRun:
a) nested switch statements-- I find it difficult to mentally retain a stack context once we reach that second nesting.
b) this function takes a lot of arguments, which is an indicator that it could be doing too much or the input data is not related enough

mixed responsibilities: there are interface{} and provider-specific args in many helpers. Example: meta interface{} and schema.ResourceData args could be replaced with strong types like ConfiguredClient, retry/retry backoff integers, workspace ID, etc -- The idea would be that the helpers aren't interpreting provider schema or interfaces so that narrows the scope of their input.

the main entry point, createWorkspaceRun is looong so it may be best to try and break it into conceptual stages. I don't have a strong feeling about how to improve it, yet, but just moving code to subroutines that reflect the actual run stages might be a first step.

Does any of this resonate with you? What are your thoughts?

yes, this makes sense to me. I broke down the methods into smaller ones to represent action steps. The only one I have not updated is the double switch statement because the method is now smaller and straight forward. If we still want to get rid of that, I can replace it with an if..else statement. WDYT?

Uk1288 · 2023-06-14T16:25:47Z

TO-DO: upgrade go-tfe after changes are good to go.

nfagerlund

This is getting there! I think you've split out the duties in a fairly legible way among these functions; some of the argument lists are still a bit unpleasant, but I think that's destiny, kind of, when the task is this annoying.

But I have some major concerns about mutating that top-scope planPendingStatuses map, and I don't think we should do it. I think what I'd suggest for a refactor goes something like this:

The root issue:

isPlanComplete() needs to inherently know about the list of plan terminal statuses.
The call to awaitRun() needs to pass in the list of plan pending statuses.
The actual lists of plan pending/terminal statuses depend on the run, plus whether hasPostPlanTaskStage (which is inherent in the run but has to be discovered by making additional requests and then remembered for later, and we don't want those tightly-scoped helper funcs hitting the client.)

What to do:

Make a func that takes (run, hasPostPlanTaskStage) and returns a fresh planPendingStatuses map. (Which you'll later pass to awaitRun().)
Make a func that takes (run, hasPostPlanTaskStage) and returns a fresh planTerminalStatuses map.
Make a func that takes planTerminalStatuses and returns an anonymous function that can be used as an isPlanComplete() func.
Then completely remove hasPostPlanTaskStage from the signatures of awaitRun and all the completion checker helper funcs, because isPlanComplete doesn't need it anymore -- it'll be a closure that already knows the things it currently needs hasPostPlanTaskStage in order to derive.

nfagerlund · 2023-06-15T23:33:58Z

tfe/workspace_run_helpers.go

+		planPendingStatuses[tfe.RunCostEstimated] = true
+		planPendingStatuses[tfe.RunPlanned] = true


Oh! Uh...... hmm. The planPendingStatuses map is defined at global scope in this package, which means that this is 🚨 mutating shared global state in the process of checking the run status. That freaks me out a bit! I can't remember whether the lifecycle of a provider server means we'll never be processing multiple runs concurrently, but this really feels like it's asking for trouble.

I DO like the logic of "sort out which statuses are actually pending or terminal based on the nature of the run we're working with," but I believe this requires each run to have its own list of pending statuses. (Or, like you already did with the terminal map, each invocation of the polling helper to have its own list.)

Right, mutating the global planPendingStatuses map raised questions for me as well, am glad that this comment validated my thought.

nfagerlund · 2023-06-26T22:10:09Z

So these revisions look pretty great. Once the required extra status is merged into go-tfe and the provider can build properly, hit me up if I'm around and I'll try and do a smoke test; I think I still have a config sitting around from the last time we jammed on this.

nfagerlund

All right!!! I smoke-tested this with the update to a released go-tfe version, and it works like a charm. We won't know for sure that this fixes the issue until we have the depth of randomness provided by the CI tests for some weeks, but I have a very good feeling about it at this point!!

We still need to fix the dang TFC runs API though 😆

This PR contains the following updates: | Package | Type | Update | Change | |---|---|---|---| | [tfe](https://registry.terraform.io/providers/hashicorp/tfe) ([source](https://togithub.com/hashicorp/terraform-provider-tfe)) | required_provider | minor | `0.45.0` -> `0.46.0` | --- ### ⚠ Dependency Lookup Warnings ⚠ Warnings were logged while processing this repo. Please check the Dependency Dashboard for more information. --- ### Release Notes <details> <summary>hashicorp/terraform-provider-tfe (tfe)</summary> ### [`v0.46.0`](https://togithub.com/hashicorp/terraform-provider-tfe/blob/HEAD/CHANGELOG.md#v0460-July-3-2023) [Compare Source](https://togithub.com/hashicorp/terraform-provider-tfe/compare/v0.45.0...v0.46.0) FEATURES: - **New Resource**: `r/tfe_agent_pool_allowed_workspaces` restricts the use of an agent pool to particular workspaces, by [@hs26gill](https://togithub.com/hs26gill) [870](https://togithub.com/hashicorp/terraform-provider-tfe/pull/870) - `r/tfe_organization_token`: Add optional `expired_at` field to organization tokens, by [@juliannatetreault](https://togithub.com/juliannatetreault) (#&#8203[https://github.com/hashicorp/terraform-provider-tfe/pull/844](https://togithub.com/hashicorp/terraform-provider-tfe/pull/844)ll/844)) - `r/tfe_team_token`: Add optional `expired_at` field to team tokens, by [@juliannatetreault](https://togithub.com/juliannatetreault) (#&#8203[https://github.com/hashicorp/terraform-provider-tfe/pull/844](https://togithub.com/hashicorp/terraform-provider-tfe/pull/844)ll/844)) - `r/tfe_agent_pool`: Add attribute `organization_scoped` to set the scope of an agent pool, by [@hs26gill](https://togithub.com/hs26gill) [870](https://togithub.com/hashicorp/terraform-provider-tfe/pull/870) - `d/tfe_agent_pool`: Add attribute `organization_scoped` and `allowed_workspace_ids` to retrieve agent pool scope and associated allowed workspace ids, by [@hs26gill](https://togithub.com/hs26gill) [870](https://togithub.com/hashicorp/terraform-provider-tfe/pull/870) BUG FIXES: - `r/tfe_workspace_run`: Ensure `wait_for_run` correctly results in a fire-and-forget run when set to `false`, by [@lucymhdavies](https://togithub.com/lucymhdavies) (#&#8203[https://github.com/hashicorp/terraform-provider-tfe/pull/910](https://togithub.com/hashicorp/terraform-provider-tfe/pull/910)ll/910)) - `r/tfe_workspace_run`: Fix rare random run failures; adjust lists of expected run statuses to ensure that a plan is completely processed before attempting to apply it, by [@uk1288](https://togithub.com/uk1288) (#&#8203[https://github.com/hashicorp/terraform-provider-tfe/pull/921](https://togithub.com/hashicorp/terraform-provider-tfe/pull/921)ll/921)) - `r/tfe_notification_configuration`: Add support for missing "Check failed" Health Event notifications, by [@lucymhdavies](https://togithub.com/lucymhdavies) (#&#8203[https://github.com/hashicorp/terraform-provider-tfe/pull/927](https://togithub.com/hashicorp/terraform-provider-tfe/pull/927)ll/927)) - `r/tfe_registry_module`: Fix a bug that prevented users from being able to create a registry module using a github app, by [@dsa0x](https://togithub.com/dsa0x) (#&#8203[https://github.com/hashicorp/terraform-provider-tfe/pull/935](https://togithub.com/hashicorp/terraform-provider-tfe/pull/935)ll/935)) </details> --- ### Configuration 📅 **Schedule**: Branch creation - At any time (no schedule defined), Automerge - At any time (no schedule defined). 🚦 **Automerge**: Enabled. ♻ **Rebasing**: Whenever PR becomes conflicted, or you tick the rebase/retry checkbox. 🔕 **Ignore**: Close this PR and you won't be reminded about this update again. --- - [ ] If you want to rebase/retry this PR, check this box --- This PR has been generated by [Renovate Bot](https://togithub.com/renovatebot/renovate).  Co-authored-by: astronaut-panda[bot] <137164246+astronaut-panda[bot]@users.noreply.github.com>

Uk1288 requested a review from a team as a code owner June 6, 2023 14:39

Uk1288 force-pushed the TF-6735-fix-post-plan-ops-in-workspace-run-resource branch 2 times, most recently from dc9fe26 to 4ec3109 Compare June 6, 2023 14:51

Uk1288 mentioned this pull request Jun 6, 2023

Add missing pre apply statuses hashicorp/go-tfe#712

Merged

brandonc reviewed Jun 7, 2023

View reviewed changes

Uk1288 force-pushed the TF-6735-fix-post-plan-ops-in-workspace-run-resource branch 2 times, most recently from 368d0b3 to 321d64e Compare June 14, 2023 15:09

nfagerlund requested changes Jun 16, 2023

View reviewed changes

nfagerlund mentioned this pull request Jun 22, 2023

fix flaky test #940

Closed

Uk1288 mentioned this pull request Jun 26, 2023

Add pre_apply_completed status hashicorp/go-tfe#727

Merged

Uk1288 force-pushed the TF-6735-fix-post-plan-ops-in-workspace-run-resource branch from 321d64e to 25c1498 Compare June 26, 2023 18:36

Uk1288 added 5 commits June 29, 2023 09:57

incorporate various post plan terminal statuses

506aec4

update changelog

535297b

simplify methods to represent smaller steps

b2804e7

some plan pending and terminal cleanup

e461898

upgrade go-tfe to 1.29.0

2d12bfd

Uk1288 force-pushed the TF-6735-fix-post-plan-ops-in-workspace-run-resource branch from 25c1498 to 2d12bfd Compare June 29, 2023 14:00

nfagerlund approved these changes Jun 29, 2023

View reviewed changes

Uk1288 merged commit 98a029e into main Jun 29, 2023

Uk1288 deleted the TF-6735-fix-post-plan-ops-in-workspace-run-resource branch June 29, 2023 18:13

lucymhdavies mentioned this pull request Jul 4, 2023

tfe_workspace_run resource's wait behavior is unreliable #898

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Incorporate missing post plan terminal statuses to ensure plan is completely processed before attempting to confirm run #921

Incorporate missing post plan terminal statuses to ensure plan is completely processed before attempting to confirm run #921

Uk1288 commented Jun 6, 2023 •

edited

Loading

brandonc Jun 7, 2023

brandonc Jun 7, 2023

Uk1288 Jun 14, 2023

Uk1288 commented Jun 14, 2023 •

edited

Loading

nfagerlund left a comment •

edited

Loading

nfagerlund Jun 15, 2023

Uk1288 Jun 26, 2023

nfagerlund commented Jun 26, 2023

nfagerlund left a comment

	var policyOverridenStatuses = map[tfe.RunStatus]bool{
	var policyOverriddenStatuses = map[tfe.RunStatus]bool{

		planPendingStatuses[tfe.RunCostEstimated] = true
		planPendingStatuses[tfe.RunPlanned] = true

Incorporate missing post plan terminal statuses to ensure plan is completely processed before attempting to confirm run #921

Incorporate missing post plan terminal statuses to ensure plan is completely processed before attempting to confirm run #921

Conversation

Uk1288 commented Jun 6, 2023 • edited Loading

Description

Testing plan

External links

Output from acceptance tests

brandonc Jun 7, 2023

Choose a reason for hiding this comment

brandonc Jun 7, 2023

Choose a reason for hiding this comment

Uk1288 Jun 14, 2023

Choose a reason for hiding this comment

Uk1288 commented Jun 14, 2023 • edited Loading

nfagerlund left a comment • edited Loading

Choose a reason for hiding this comment

The root issue:

What to do:

nfagerlund Jun 15, 2023

Choose a reason for hiding this comment

Uk1288 Jun 26, 2023

Choose a reason for hiding this comment

nfagerlund commented Jun 26, 2023

nfagerlund left a comment

Choose a reason for hiding this comment

Uk1288 commented Jun 6, 2023 •

edited

Loading

Uk1288 commented Jun 14, 2023 •

edited

Loading

nfagerlund left a comment •

edited

Loading