Create New Resource Group: Status=404 Code="ResourceGroupNotFound" #18268

pkirch · 2022-09-06T13:32:36Z

Is there an existing issue for this?

I have searched the existing issues

Community Note

Please vote on this issue by adding a 👍 reaction to the original issue to help the community and maintainers prioritize this request
Please do not leave "+1" or "me too" comments, they generate extra noise for issue followers and do not help prioritize the request
If you are interested in working on this issue or have submitted a pull request, please leave a comment

Terraform Version

1.2.5

AzureRM Provider Version

3.15.1

Affected Resource(s)/Data Source(s)

azurerm_resource_group

Terraform Configuration Files

# Minimal config. Complete files linked here: https://github.com/agera-edc/MinimumViableDataspace/tree/ec1999cc7a8582407f7d089fb9396dde023e58bb/deployment/terraform/participant

terraform {
  required_providers {
    azurerm = {
      source  = "hashicorp/azurerm"
      version = ">= 3.1.0"
    }
  }

  backend "azurerm" {}
}

provider "azurerm" {
  features {
    key_vault {
      purge_soft_delete_on_destroy = true
    }
  }
}

resource "azurerm_resource_group" "participant" {
  name     = var.resource_group
  location = var.location
}

variable "resource_group" {
  default = "test-resource-group"
}

variable "location" {
  default = "northeurope"
}

Debug Output/Panic Output

Excerpt of error message. Full output: https://gist.github.com/pkirch/a674369d480389ce2ddd57f24499e5b2

azurerm_resource_group.participant: Creating...
╷
│ Error: retrieving Resource Group "rg-company1-mvd116": resources.GroupsClient#Get: Failure responding to request: StatusCode=404 -- Original Error: autorest/azure: Service returned an error. Status=404 Code="ResourceGroupNotFound" Message="Resource group 'rg-company1-mvd116' could not be found."
│ 
│   with azurerm_resource_group.participant,
│   on main.tf line 54, in resource "azurerm_resource_group" "participant":
│   54: resource "azurerm_resource_group" "participant" {
│ 
╵
Releasing state lock. This may take a few moments...
Error: Process completed with exit code 1.

Expected Behaviour

New Azure resource group should be created reliable without error and should exist after creation.

Actual Behaviour

Deployment stopped with error mentioned.

Error occurred sporadically. Our logs show 12 failures in 662 runs.
Failures happend only in a certain time windows from 2022-07-27 02:57 p.m. to 2022-07-28 09:49 a.m.

I expect this issue is hard to troubleshoot from the data given. However, we hope filing this issue helps others in case it happens sporadically again.

As the failures happend already a few weeks ago, Terraform version and AzureRM Provider version are stated as used when the errors occurred.

Steps to Reproduce

We have a GitHub action workflow executing the following commands. (complete file)

      - name: 'Run terraform'
        id: runterraform
        run: |
          # Create backend.conf file to retrieve the remote terraform state during terraform init.
          echo '
            resource_group_name  = "${{ secrets.COMMON_RESOURCE_GROUP }}"
            storage_account_name = "${{ secrets.TERRAFORM_STATE_STORAGE_ACCOUNT }}"
            container_name       = "${{ secrets.TERRAFORM_STATE_CONTAINER }}"
            key                  = "${{ env.RESOURCES_PREFIX }}.tfstate"
          ' >> backend.conf
          terraform init -backend-config=backend.conf
          terraform apply -auto-approve

Important Factoids

No response

References

Issues who seem similar, however, closed and/or fixed a long time ago.

jpmicrosoft · 2022-09-30T18:27:40Z

@pkirch It seems like a dependency issue.

Add a depends on to the resources the resource group.
depends_on = [
azurerm_resource_group.participant
]
Reference
https://www.terraform.io/language/meta-arguments/depends_on

I hope this helps.

DizzyDeveloper · 2024-04-16T09:28:38Z

I am currently having the same issue with azure resource groups.

I get maybe one or two of these message from terraform output:
azurerm_resource_group.default_resource_group: Creating...

Before I get the same 404 error as above:

Error: retrieving Resource Group "zvt4xdts-rg": resources.GroupsClient#Get: Failure responding to request: StatusCode=404 -- Original Error: autorest/azure: Service returned an error. Status=404 Code="ResourceGroupNotFound" Message="Resource group 'zvt4xdts-rg' could not be found."

The annoying thing is tho, is that if I check the subscription that resource group does actually exist, so it does get created.

It also doesn't happen repeatably, we have a variety of different terraform projects each creating there own resource group(s), and this failure seems to happen arbitrarily for different projects.
And in all cases the resource group does actually get created. But terraform fails as per error above.

ggomes-agc · 2024-04-18T13:01:16Z

I am also experienceing this issue faily consistently with azurerm 3.99 (also tried various versions from 3.70 onward)

A debug of the deployment shows that the resource is created. The final call to the API for validation responds with not found.
Then errors out and does not write to state.

Subsequent apply says that the resource exists and must be imported into state.

Not Working Deployment
2024-04-17T17:22:21.410Z [DEBUG] provider.terraform-provider-azurerm_v3.70.0_x5: AzureRM Request:
GET /subscriptions/3828d6d3-52e4-439a-8621-b528739f9fee/resourcegroups/GgPc-ggd-prod_paz_network-rg?api-version=2020-06-01 HTTP/1.1
HTTP/2.0 404 Not Found

2024-04-17T17:22:21.476Z [DEBUG] provider.terraform-provider-azurerm_v3.70.0_x5: AzureRM Request:
PUT /subscriptions/3828d6d3-52e4-439a-8621-b528739f9fee/resourcegroups/GgPc-ggd-prod_paz_network-rg?api-version=2020-06-01 HTTP/1.1
HTTP/2.0 201 Created

2024-04-17T17:22:21.615Z [DEBUG] provider.terraform-provider-azurerm_v3.70.0_x5: AzureRM Request:
GET /subscriptions/3828d6d3-52e4-439a-8621-b528739f9fee/resourcegroups/GgPc-ggd-prod_paz_network-rg?api-version=2020-06-01 HTTP/1.1
HTTP/2.0 404 Not Found
{"error":{"code":"ResourceGroupNotFound","message":"Resource group 'GgPc-ggd-prod_paz_network-rg' could not be found."}}: timestamp=2024-04-17T17:22:21.678Z

Our code deployed 12 resource groups via module calls. Once or two of the RGs fail consistently. It is not consistent which RGs fail. We have about a 10% success rate on apply.

This code worked consistently a month ago.

KealeyGR · 2024-04-18T13:21:05Z

I'm encountering a similar issue with azurerm version 3.95. Like you, I've tried various versions from 3.70 onward with no success. During deployment, the resource is created but the final API call for validation consistently returns 'not found,' resulting in errors and failure to write to state.

We're deploying multiple resource groups via module calls, and about 10% of the time, a few of these RGs consistently fail. Strangely, the failing RGs vary with each deployment. This behavior is inconsistent with what we experienced a month ago when the code worked reliably.

I'd appreciate any insights or suggestions on how to resolve this issue. Thanks!"

dantape · 2024-04-18T14:12:16Z

I am using azurerm 3.83.0 and we started getting this a couple days ago. Our build servers run ubuntu and our builds that create resource groups are not having much success. Sometimes the resource group is created and sometimes it isn't but we are pretty consistently getting the NotFound error either way. When running locally on windows, my latest configuration worked fine the first time. I am not sure if there is any correlation here though between OSs yet, not enough data. I am thinking about opening a ticket with Microsoft with so many provider versions being affected did one of their APIs change?

dingliu · 2024-04-18T15:01:41Z

We’re having this issue as well. When creating multiple resource groups in parallel, inconsistently the creation of this or that resource group fail. The error message is 404 resource group could not be found.

We have tried provider version 3.54.0 to 3.99.0 and the issue persists.

The same code was working about two weeks ago. We started experiencing this issue since last two weeks.

eddieb96 · 2024-04-19T15:14:25Z

We are also having it while creating multiple resource groups in parallel. Tested provider version 3.100 and issue persists.

dantape · 2024-04-19T20:26:14Z

I enabled trace logging and was seeing some weird behavior. I was on terraform version 1.3.6. After updating to 1.8.1 I have not seen this issue again.

haodeon · 2024-04-21T22:17:46Z

Been getting this error for over a week now.

Tried upgrading terraform to 1.8.1

Still having issues with resource group creation. Getting Provider produced inconsistent result after apply and Root object was present, but now absent

Resource group still gets created but doesn't get saved into state. To me it seems like the same issue but different error message.

DizzyDeveloper · 2024-04-21T23:34:44Z

Hi @katbyte,
I am pinging you to see if we can get some input from one of the hashicorp devs on this issue, haven't heard from you guys on this issue yet. Just checking you guys are aware that it has picked up quiet a bit over the last week.

chalecado · 2024-04-22T08:48:51Z

happening for us too - have had to disable azure in or dev environment due to this issue

alexpilon666 · 2024-04-22T18:43:10Z

Been having the exact same problem here inconsistently for the past couple of weeks, always using the latest AzureRM provider version available at the time.

We have a suite of tests that runs in CI in Azure DevOps which will execute a bunch of tests (terraform init/plan/apply/destroy) when a PR is created and it contains a modification to one of our modules. This sometimes means that we may have 30+ tests that will be queued in our ADO pipeline. I just launch a test suite, and with 8 parallel jobs running (1 job == terraform init/plan/apply/destroy performed on one test suite), I have 5 that failed immediately after trying to create the RG with the exact same issue. I now have 8 currently-running tests that managed to create the resource group just fine and proceeded with the rest.

MikeSchiessl · 2024-04-24T08:31:25Z

This sometimes also happened to me whilst running the identical TF code over and over.
There seems to be something wrong on the AZURE side, as we're getting a 404 from AZURE whilst the resource_group has been successfully created but AZURE doesn't seem to be able to find it.

favoretti · 2024-04-25T15:03:22Z

Here's and update I got from MSFT support.

I am an Azure support engineer on the ARM team, and I will be working with you on this case. I understand that you are seeing intermittent 404 errors over the last two weeks.

Our product teams are aware of this problem and working on this issue. There is a lot of work spanning multiple services working on this behavior. I will keep you up to date with their progress.

The issue is related to replication of ARM data among regions. For example, another customer has some requests going to East US and other requests to East US 2, and during the time it takes to replicate between the two, they get 404's. The database account is a multi-master account with session consistency - so, write operations will be replicated across regions asynchronously. Session consistency only guarantees read-you-write guarantees within the scope of a session which is either defined by the application (ARM) or by the SDK (in which case the session spans only a single CosmosClient instance) - and given that several of the reads returning 404 after the creation of the resource group were done not only from a different ARM FD machine but even from a different region, they were made outside of the session scope - so, effectively eventually consistent. ARM team has worked in the past to make the multi-master model work transparently, and I assume they will continue this work as will our other teams working on the problem.

I will keep you up to date. Thanks for reporting the issue.

Zezo0001 · 2024-04-25T15:04:58Z

@favoretti That response is in accurate because in our case we're working with ONLY one region and noticing the error

favoretti · 2024-04-25T15:11:07Z

That response is in accurate because in our case we're working with ONLY one region and noticing the error

ARM loadbalancer will send you places. It's not related to the region where you are creating the resources.

alexpilon666 · 2024-04-25T15:11:14Z

@zoelfakar1 same here. Our use case for deploying tests/examples deploys everything in a single region, in a single subscription. The very first and essentially only pre-step of deploying our tests/examples generates a 4-character random string in Terraform and then creates the resource group. Once the resource group is created using a terraform apply -target module.helper (we created a helper module to manage this instead of repeating the code in all of our tests/examples), then we run a separate terraform apply to actually create everything in our test/example.

So in our case, we're not even trying to manage/create anything other than the resource group, and it still fails at least 20% of the time, whether we're running one or more tests/examples at a time in our CI pipeline.

ggomes-agc · 2024-04-25T15:13:14Z

Same for us. We are only deploying to Canada Central.

Zezo0001 · 2024-04-25T15:59:13Z

We have also noticied a delay in the resource group creation in the azure portal (i.e it gets created after the terraform apply finishes/errors out). Which makes sense as to why the final API call for validation returns '""Resource group 'XXX-XXX' could not be found."" (as it did not exisit then)

srjennings · 2024-04-25T19:19:49Z

+1, same issue.

favoretti · 2024-04-25T21:56:09Z

I'm working on a "fix" for this. So far it seems to work, I'm going to run an overnight test for this, after which we can discuss merging it upstream.

favoretti · 2024-04-25T22:07:28Z

To address comments that are referring to "deployments to a single region". ARM API itself is multi-region. Each request that provider sends to the API can potentially create a new HTTP session, which means session consistency on ARM backend won't help. CreateOrUpdate() method in the resource will send a POST request to the ARM API, which would return a success. To populate the resource data, CreateOrUpdate() calls a resourceResourceGroupRead() method, which in turn calls a Get() client method. Due to the fact that it might end up being another session - it's not guaranteed that your ARM API request will land in the same region of ARM API (not related to the region you're creating your resources in, management.azure.com is a globally loadbalanced thing). If the eventually consistent database that backs azure resources has not finished replicating the fact that your RG was created - it will rightfully respond with a 404 - according to that ARM API instance - the resource group doesn't exist.

Provider, as a consequence, will error out. Subsequent TF run will give you an error that resource group already exists and requires import, most likely because it takes just a couple more seconds for the data to be reconciled across azure backend databases.

The kludge I added will just retry Get() on the resource until it consistently finds it 5 times in a row, after which we can be fairly certain it's all good.

Hope this helps clarify the issue and attempted workaround.

haodeon · 2024-04-26T04:04:34Z

I suspect there has been a change to the ARM service. From my own experience and others who have commented on this issue, the recent problem started appearing on 12th of April.

I agree it’s probably an eventual consistency problem. I wrote a python script which loops through creating, reading and deleting resource groups, printing the response headers and whenever there is a 404 returned for Get resource group the x-ms-routing-request-id is always served from a different region to the one the resource group is created in.

Upon discovering this I opened a support ticket with MS. This might be an intended change and the azurerm provider will need to adapt, the provider code has not been touched for a long time and the API version is fixed to a deprecated version of the Go SDK.

haodeon · 2024-04-28T23:01:06Z

Azure support got back to me and said the product group made a fix to over the weekend.

favoretti · 2024-04-29T15:04:20Z

Azure support got back to me and said the product group made a fix to over the weekend.

They might, however that's not the first time this issue resurfaces unfortunately. Also, my contacts reported nothing about a fix yet :)

eddieb96 · 2024-05-14T07:27:20Z

Started seeing this again yesterday on the 3.100.0 Azure Terraform provider.

favoretti · 2024-05-14T07:43:30Z

Started seeing this again yesterday on the 3.100.0 Azure Terraform provider.

Try 3.102.0 please - that's where my workaround got merged. Would be interested in hearing if it helps.

github-actions · 2024-06-14T02:07:02Z

I'm going to lock this issue because it has been closed for 30 days ⏳. This helps our maintainers find and focus on the active issues.
If you have found a problem that seems similar to this, please open a new issue and complete the issue template so we can capture all the details necessary to investigate further.

pkirch added the bug label Sep 6, 2022

github-actions bot removed the bug label Sep 6, 2022

Amier3 added the question label Sep 8, 2022

ziyeqf mentioned this issue Oct 10, 2022

azurerm_resource_group - fix the issue failed when creating #18675

Closed

catriona-m added the v/3.x label Jul 19, 2023

rain-on mentioned this issue Apr 24, 2024

Pin versions of Terraform and the AzureRM Terraform provider OctopusDeploy/Calamari#1251

Closed

favoretti mentioned this issue Apr 25, 2024

azurerm_resource_group: Work around sporadic ARM eventual consistency issues #25758

Merged

8 tasks

manicminer linked a pull request Apr 29, 2024 that will close this issue

azurerm_resource_group: Work around sporadic ARM eventual consistency issues #25758

Merged

8 tasks

katbyte closed this as completed in #25758 May 1, 2024

github-actions bot locked as resolved and limited conversation to collaborators Jun 14, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Create New Resource Group: Status=404 Code="ResourceGroupNotFound" #18268

Create New Resource Group: Status=404 Code="ResourceGroupNotFound" #18268

pkirch commented Sep 6, 2022

jpmicrosoft commented Sep 30, 2022

DizzyDeveloper commented Apr 16, 2024 •

edited

Loading

ggomes-agc commented Apr 18, 2024

KealeyGR commented Apr 18, 2024

dantape commented Apr 18, 2024

dingliu commented Apr 18, 2024 •

edited

Loading

eddieb96 commented Apr 19, 2024

dantape commented Apr 19, 2024

haodeon commented Apr 21, 2024

DizzyDeveloper commented Apr 21, 2024 •

edited

Loading

chalecado commented Apr 22, 2024

alexpilon666 commented Apr 22, 2024

MikeSchiessl commented Apr 24, 2024

favoretti commented Apr 25, 2024 •

edited

Loading

Zezo0001 commented Apr 25, 2024 •

edited

Loading

favoretti commented Apr 25, 2024 •

edited

Loading

alexpilon666 commented Apr 25, 2024

ggomes-agc commented Apr 25, 2024

Zezo0001 commented Apr 25, 2024

srjennings commented Apr 25, 2024

favoretti commented Apr 25, 2024

favoretti commented Apr 25, 2024

haodeon commented Apr 26, 2024

haodeon commented Apr 28, 2024 •

edited

Loading

favoretti commented Apr 29, 2024

eddieb96 commented May 14, 2024

favoretti commented May 14, 2024

github-actions bot commented Jun 14, 2024

Create New Resource Group: Status=404 Code="ResourceGroupNotFound" #18268

Create New Resource Group: Status=404 Code="ResourceGroupNotFound" #18268

Comments

pkirch commented Sep 6, 2022

Is there an existing issue for this?

Community Note

Terraform Version

AzureRM Provider Version

Affected Resource(s)/Data Source(s)

Terraform Configuration Files

Debug Output/Panic Output

Expected Behaviour

Actual Behaviour

Steps to Reproduce

Important Factoids

References

jpmicrosoft commented Sep 30, 2022

DizzyDeveloper commented Apr 16, 2024 • edited Loading

ggomes-agc commented Apr 18, 2024

KealeyGR commented Apr 18, 2024

dantape commented Apr 18, 2024

dingliu commented Apr 18, 2024 • edited Loading

eddieb96 commented Apr 19, 2024

dantape commented Apr 19, 2024

haodeon commented Apr 21, 2024

DizzyDeveloper commented Apr 21, 2024 • edited Loading

chalecado commented Apr 22, 2024

alexpilon666 commented Apr 22, 2024

MikeSchiessl commented Apr 24, 2024

favoretti commented Apr 25, 2024 • edited Loading

Zezo0001 commented Apr 25, 2024 • edited Loading

favoretti commented Apr 25, 2024 • edited Loading

alexpilon666 commented Apr 25, 2024

ggomes-agc commented Apr 25, 2024

Zezo0001 commented Apr 25, 2024

srjennings commented Apr 25, 2024

favoretti commented Apr 25, 2024

favoretti commented Apr 25, 2024

haodeon commented Apr 26, 2024

haodeon commented Apr 28, 2024 • edited Loading

favoretti commented Apr 29, 2024

eddieb96 commented May 14, 2024

favoretti commented May 14, 2024

github-actions bot commented Jun 14, 2024

DizzyDeveloper commented Apr 16, 2024 •

edited

Loading

dingliu commented Apr 18, 2024 •

edited

Loading

DizzyDeveloper commented Apr 21, 2024 •

edited

Loading

favoretti commented Apr 25, 2024 •

edited

Loading

Zezo0001 commented Apr 25, 2024 •

edited

Loading

favoretti commented Apr 25, 2024 •

edited

Loading

haodeon commented Apr 28, 2024 •

edited

Loading