Skip to content

Commit

Permalink
azurerm_resource_group: Work around sporadic ARM eventual consisten…
Browse files Browse the repository at this point in the history
…cy issues (hashicorp#25758)

* `azurerm_resource_group`: Work around sporadic ARM eventual consistency issues

* Fix error logic, increase context timeout

* Added elaborate comment as to why this is needed

* Update internal/services/resource/resource_group_resource.go

Co-authored-by: Tom Bamford <tom@bamford.io>

* Lower TargetOccurence to 3 as discussed

* Wrap in IsNewResource

* comment formatting

---------

Co-authored-by: Tom Bamford <tom@bamford.io>
  • Loading branch information
favoretti and manicminer authored May 1, 2024
1 parent 00d1938 commit 9e45ece
Showing 1 changed file with 45 additions and 0 deletions.
45 changes: 45 additions & 0 deletions internal/services/resource/resource_group_resource.go
Original file line number Diff line number Diff line change
Expand Up @@ -93,6 +93,51 @@ func resourceResourceGroupCreateUpdate(d *pluginsdk.ResourceData, meta interface
return fmt.Errorf("creating Resource Group %q: %+v", name, err)
}

// TODO: remove this once ARM team confirms the issue is fixed on their end
//
// @favoretti: Working around a race condition in ARM eventually consistent backend data storage
// Sporadically, the ARM api will return successful creation response, following by a 404 to a
// subsequent `Get()`. Usually, seconds later, the storage is reconciled and following terraform
// run fails with `RequiresImport`.
//
// Snippet from MSFT support:
// The issue is related to replication of ARM data among regions. For example, another customer
// has some requests going to East US and other requests to East US 2, and during the time it takes
// to replicate between the two, they get 404's. The database account is a multi-master account with
// session consistency - so, write operations will be replicated across regions asynchronously.
// Session consistency only guarantees read-you-write guarantees within the scope of a session which
// is either defined by the application (ARM) or by the SDK (in which case the session spans only
// a single CosmosClient instance) - and given that several of the reads returning 404 after the
// creation of the resource group were done not only from a different ARM FD machine but even from
// a different region, they were made outside of the session scope - so, effectively eventually
// consistent. ARM team has worked in the past to make the multi-master model work transparently,
// and I assume they will continue this work as will our other teams working on the problem.
if d.IsNewResource() {
stateConf := &pluginsdk.StateChangeConf{ //nolint:staticcheck
Pending: []string{"Waiting"},
Target: []string{"Done"},
Timeout: 10 * time.Minute,
MinTimeout: 4 * time.Second,
ContinuousTargetOccurence: 3,
Refresh: func() (interface{}, string, error) {
rg, err := client.Get(ctx, name)
if err != nil {
if utils.ResponseWasNotFound(rg.Response) {
return false, "Waiting", nil
}
return nil, "Error", fmt.Errorf("retrieving Resource Group: %+v", err)
}

return true, "Done", nil

},
}

if _, err := stateConf.WaitForStateContext(ctx); err != nil {
return fmt.Errorf("waiting for Resource Group %s to become available: %+v", name, err)
}
}

resp, err := client.Get(ctx, name)
if err != nil {
return fmt.Errorf("retrieving Resource Group %q: %+v", name, err)
Expand Down

0 comments on commit 9e45ece

Please sign in to comment.