-
Notifications
You must be signed in to change notification settings - Fork 157
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Incorporate missing post plan terminal statuses to ensure plan is completely processed before attempting to confirm run #921
Incorporate missing post plan terminal statuses to ensure plan is completely processed before attempting to confirm run #921
Conversation
dc9fe26
to
4ec3109
Compare
tfe/resource_tfe_workspace_run.go
Outdated
tfe.RunQueuingApply: true, | ||
} | ||
|
||
var policyOverridenStatuses = map[tfe.RunStatus]bool{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
var policyOverridenStatuses = map[tfe.RunStatus]bool{ | |
var policyOverriddenStatuses = map[tfe.RunStatus]bool{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In general, I find your diagnosis and fix to be convincing... but I'm also finding the helpers to be difficult to read and reason about and feel like we should take a moment to simplify them. I identified these specific things that could make this module simpler:
- syntactically, I think there are two problems with awaitRun:
a) nested switch statements-- I find it difficult to mentally retain a stack context once we reach that second nesting.
b) this function takes a lot of arguments, which is an indicator that it could be doing too much or the input data is not related enough - mixed responsibilities: there are
interface{}
and provider-specific args in many helpers. Example: meta interface{} and schema.ResourceData args could be replaced with strong types like ConfiguredClient, retry/retry backoff integers, workspace ID, etc -- The idea would be that the helpers aren't interpreting provider schema or interfaces so that narrows the scope of their input. - the main entry point, createWorkspaceRun is looong so it may be best to try and break it into conceptual stages. I don't have a strong feeling about how to improve it, yet, but just moving code to subroutines that reflect the actual run stages might be a first step.
Does any of this resonate with you? What are your thoughts?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, this makes sense to me. I broke down the methods into smaller ones to represent action steps. The only one I have not updated is the double switch statement because the method is now smaller and straight forward. If we still want to get rid of that, I can replace it with an if..else
statement. WDYT?
368d0b3
to
321d64e
Compare
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is getting there! I think you've split out the duties in a fairly legible way among these functions; some of the argument lists are still a bit unpleasant, but I think that's destiny, kind of, when the task is this annoying.
But I have some major concerns about mutating that top-scope planPendingStatuses
map, and I don't think we should do it. I think what I'd suggest for a refactor goes something like this:
The root issue:
isPlanComplete()
needs to inherently know about the list of plan terminal statuses.- The call to
awaitRun()
needs to pass in the list of plan pending statuses. - The actual lists of plan pending/terminal statuses depend on the run, plus whether
hasPostPlanTaskStage
(which is inherent in the run but has to be discovered by making additional requests and then remembered for later, and we don't want those tightly-scoped helper funcs hitting the client.)
What to do:
- Make a func that takes (run, hasPostPlanTaskStage) and returns a fresh planPendingStatuses map. (Which you'll later pass to
awaitRun()
.) - Make a func that takes (run, hasPostPlanTaskStage) and returns a fresh planTerminalStatuses map.
- Make a func that takes planTerminalStatuses and returns an anonymous function that can be used as an
isPlanComplete()
func. - Then completely remove
hasPostPlanTaskStage
from the signatures of awaitRun and all the completion checker helper funcs, becauseisPlanComplete
doesn't need it anymore -- it'll be a closure that already knows the things it currently needshasPostPlanTaskStage
in order to derive.
planPendingStatuses[tfe.RunCostEstimated] = true | ||
planPendingStatuses[tfe.RunPlanned] = true |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh! Uh...... hmm. The planPendingStatuses map is defined at global scope in this package, which means that this is 🚨 mutating shared global state in the process of checking the run status. That freaks me out a bit! I can't remember whether the lifecycle of a provider server means we'll never be processing multiple runs concurrently, but this really feels like it's asking for trouble.
I DO like the logic of "sort out which statuses are actually pending or terminal based on the nature of the run we're working with," but I believe this requires each run to have its own list of pending statuses. (Or, like you already did with the terminal map, each invocation of the polling helper to have its own list.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right, mutating the global planPendingStatuses map raised questions for me as well, am glad that this comment validated my thought.
321d64e
to
25c1498
Compare
So these revisions look pretty great. Once the required extra status is merged into go-tfe and the provider can build properly, hit me up if I'm around and I'll try and do a smoke test; I think I still have a config sitting around from the last time we jammed on this. |
25c1498
to
2d12bfd
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
All right!!! I smoke-tested this with the update to a released go-tfe version, and it works like a charm. We won't know for sure that this fixes the issue until we have the depth of randomness provided by the CI tests for some weeks, but I have a very good feeling about it at this point!!
We still need to fix the dang TFC runs API though 😆
This PR contains the following updates: | Package | Type | Update | Change | |---|---|---|---| | [tfe](https://registry.terraform.io/providers/hashicorp/tfe) ([source](https://togithub.com/hashicorp/terraform-provider-tfe)) | required_provider | minor | `0.45.0` -> `0.46.0` | --- ### ⚠ Dependency Lookup Warnings ⚠ Warnings were logged while processing this repo. Please check the Dependency Dashboard for more information. --- ### Release Notes <details> <summary>hashicorp/terraform-provider-tfe (tfe)</summary> ### [`v0.46.0`](https://togithub.com/hashicorp/terraform-provider-tfe/blob/HEAD/CHANGELOG.md#v0460-July-3-2023) [Compare Source](https://togithub.com/hashicorp/terraform-provider-tfe/compare/v0.45.0...v0.46.0) FEATURES: - **New Resource**: `r/tfe_agent_pool_allowed_workspaces` restricts the use of an agent pool to particular workspaces, by [@​hs26gill](https://togithub.com/hs26gill) [870](https://togithub.com/hashicorp/terraform-provider-tfe/pull/870) - `r/tfe_organization_token`: Add optional `expired_at` field to organization tokens, by [@​juliannatetreault](https://togithub.com/juliannatetreault) (#​[https://github.com/hashicorp/terraform-provider-tfe/pull/844](https://togithub.com/hashicorp/terraform-provider-tfe/pull/844)ll/844)) - `r/tfe_team_token`: Add optional `expired_at` field to team tokens, by [@​juliannatetreault](https://togithub.com/juliannatetreault) (#​[https://github.com/hashicorp/terraform-provider-tfe/pull/844](https://togithub.com/hashicorp/terraform-provider-tfe/pull/844)ll/844)) - `r/tfe_agent_pool`: Add attribute `organization_scoped` to set the scope of an agent pool, by [@​hs26gill](https://togithub.com/hs26gill) [870](https://togithub.com/hashicorp/terraform-provider-tfe/pull/870) - `d/tfe_agent_pool`: Add attribute `organization_scoped` and `allowed_workspace_ids` to retrieve agent pool scope and associated allowed workspace ids, by [@​hs26gill](https://togithub.com/hs26gill) [870](https://togithub.com/hashicorp/terraform-provider-tfe/pull/870) BUG FIXES: - `r/tfe_workspace_run`: Ensure `wait_for_run` correctly results in a fire-and-forget run when set to `false`, by [@​lucymhdavies](https://togithub.com/lucymhdavies) (#​[https://github.com/hashicorp/terraform-provider-tfe/pull/910](https://togithub.com/hashicorp/terraform-provider-tfe/pull/910)ll/910)) - `r/tfe_workspace_run`: Fix rare random run failures; adjust lists of expected run statuses to ensure that a plan is completely processed before attempting to apply it, by [@​uk1288](https://togithub.com/uk1288) (#​[https://github.com/hashicorp/terraform-provider-tfe/pull/921](https://togithub.com/hashicorp/terraform-provider-tfe/pull/921)ll/921)) - `r/tfe_notification_configuration`: Add support for missing "Check failed" Health Event notifications, by [@​lucymhdavies](https://togithub.com/lucymhdavies) (#​[https://github.com/hashicorp/terraform-provider-tfe/pull/927](https://togithub.com/hashicorp/terraform-provider-tfe/pull/927)ll/927)) - `r/tfe_registry_module`: Fix a bug that prevented users from being able to create a registry module using a github app, by [@​dsa0x](https://togithub.com/dsa0x) (#​[https://github.com/hashicorp/terraform-provider-tfe/pull/935](https://togithub.com/hashicorp/terraform-provider-tfe/pull/935)ll/935)) </details> --- ### Configuration 📅 **Schedule**: Branch creation - At any time (no schedule defined), Automerge - At any time (no schedule defined). 🚦 **Automerge**: Enabled. ♻ **Rebasing**: Whenever PR becomes conflicted, or you tick the rebase/retry checkbox. 🔕 **Ignore**: Close this PR and you won't be reminded about this update again. --- - [ ] <!-- rebase-check -->If you want to rebase/retry this PR, check this box --- This PR has been generated by [Renovate Bot](https://togithub.com/renovatebot/renovate). <!--renovate-debug:eyJjcmVhdGVkSW5WZXIiOiIzNS4xNTkuNCIsInVwZGF0ZWRJblZlciI6IjM1LjE1OS40IiwidGFyZ2V0QnJhbmNoIjoibWFpbiJ9--> Co-authored-by: astronaut-panda[bot] <137164246+astronaut-panda[bot]@users.noreply.github.com>
Description
The changes in this PR checks potential terminal post plan statuses to ensure that a plan has been completely processed before proceeding to confirm the associated run. The terminal status changes depending on the post plan lifecycle that a run has. For example a plan can terminate at
planned
status orcost_estimated
orpolicy_checked
.See issue for more details: #898
This PR depends on go-tfe PR#712
Testing plan
policy_checked
, for a longer period of time such that workspace_run receives that status while polling the run. I achieved this by adding delays to each run stateExternal links
Output from acceptance tests