Randomly fails to create google_sql_database_instance due to timing issue #13091

mogronalol · 2017-03-27T13:11:50Z

Hi there,

Terraform Version

Terraform v0.9.1

Affected Resource(s)

google_sql_database_instance

Terraform Configuration Files

provider "google" {
  credentials = "${file("account.json")}"
  region      = "europe-west-1"
  project = "${data.terraform_remote_state.project.project_id}"
}

resource "google_sql_database_instance" "master" {
  region = "europe-west1"
  project = "${data.terraform_remote_state.project.project_id}"
  database_version = "MYSQL_5_7"

  settings {
    tier = "db-n1-standard-1"
    activation_policy = "ALWAYS"
  }
}

data "terraform_remote_state" "project" {
  backend = "gcs"
  config {
    path = "projects/dev.tfstate"
    bucket = "application-cloud-tf-state"
  }
}

Debug Output

https://gist.github.com/mogronalol/c8c308e17b390023939d85ba9c5853a3

Expected Behavior

Successfully retrieve the instance creation operation, then block until it is complete.

Actual Behavior

A 404 because it does a GET too quickly, so the operation does not exist yet.

terraform/builtin/providers/google/resource_sql_database_instance.go

Line 570 in bfdeae0

err = sqladminOperationWait(config, op, "Create Instance")

is the line which fails.

Steps to Reproduce

Please list the steps required to reproduce the issue, for example:

terraform apply

This does not always fail. It will pretty much never fail if you add a sleep before

terraform/builtin/providers/google/resource_sql_database_instance.go

Line 570 in bfdeae0

err = sqladminOperationWait(config, op, "Create Instance")

then it doesn't fail.

What does not make sense, is that if I change SqlAdminOperationWater.RefreshFunction to retry on a 404 instead of failing, it will still be a 404 after five attempts. But, a simple sleep before calling RefreshFunc for the first time means that there is no 404 so I'm a little stumped.

The text was updated successfully, but these errors were encountered:

paddycarver · 2017-03-27T19:25:39Z

Hey @mogronalol! Thanks for reporting the bug. I've been chasing this for the past few weeks (#12436 was written specifically to help track this down) so it's always good to have an extra datapoint. I'm pretty sure this is an upstream bug, and I'll work with Google to get it addressed. (Opening a bug on https://issuetracker.google.com is on my to-do list.)

If you don't mind, I'd love to know how many SQL instances are defined in your config. (Does it reliably work on the config you provided? Or only sometimes?) I have a hunch that the error is more likely to occur the more instances you define in your config, but can't corroborate that yet.

Definitely interested in getting this resolved, either upstream, through a workaround on our end, or both.

mogronalol · 2017-03-27T21:49:00Z

I only have a single instance defined.

One thing I've noticed since raising this is that if I manually create the project my template works fine. But, if the project is created with Terraform, I get this weird race condition.

I know that sounds strange but I am wondering if project generation with the upstream API is producing some sort of invalid project state.

It's just strange because I spent a good part of today implementing a fix by retrying on a 404, but that unfortunately did not work.

It's really painful as these timing bugs are always hard to diagnose. Adding print statements to try and figure it out was impossible because a failure wasn't deterministic.

paddycarver · 2017-03-27T22:15:42Z

Thanks for the extra info. I've filed this as issue 36656107 in the Google issue tracker. I'll keep an eye on it, and see if they can suggest any workarounds. :) Sorry for the trouble!

ghost · 2020-04-11T02:17:49Z

I'm going to lock this issue because it has been closed for 30 days ⏳. This helps our maintainers find and focus on the active issues.

If you have found a problem that seems similar to this, please open a new issue and complete the issue template so we can capture all the details necessary to investigate further.

grubernaut added bug provider/google-cloud labels Mar 27, 2017

grubernaut assigned paddycarver Mar 27, 2017

paddycarver added the upstream label Mar 27, 2017

danawillow mentioned this issue Jun 7, 2017

provider/google: Add an additional delay when checking for sql operations #15170

Merged

stack72 closed this as completed in #15170 Jun 9, 2017

bradgignac mentioned this issue Jun 29, 2017

SQL Database Instance fails with Not Found hashicorp/terraform-provider-google#167

Closed

ghost locked and limited conversation to collaborators Apr 11, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Randomly fails to create google_sql_database_instance due to timing issue #13091

Randomly fails to create google_sql_database_instance due to timing issue #13091

mogronalol commented Mar 27, 2017 •

edited

Loading

paddycarver commented Mar 27, 2017

mogronalol commented Mar 27, 2017

paddycarver commented Mar 27, 2017

ghost commented Apr 11, 2020

Randomly fails to create google_sql_database_instance due to timing issue #13091

Randomly fails to create google_sql_database_instance due to timing issue #13091

Comments

mogronalol commented Mar 27, 2017 • edited Loading

Terraform Version

Affected Resource(s)

Terraform Configuration Files

Debug Output

Expected Behavior

Actual Behavior

Steps to Reproduce

paddycarver commented Mar 27, 2017

mogronalol commented Mar 27, 2017

paddycarver commented Mar 27, 2017

ghost commented Apr 11, 2020

mogronalol commented Mar 27, 2017 •

edited

Loading