Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Randomly fails to create google_sql_database_instance due to timing issue #13091

Closed
mogronalol opened this issue Mar 27, 2017 · 4 comments · Fixed by #15170
Closed

Randomly fails to create google_sql_database_instance due to timing issue #13091

mogronalol opened this issue Mar 27, 2017 · 4 comments · Fixed by #15170

Comments

@mogronalol
Copy link

mogronalol commented Mar 27, 2017

Hi there,

Terraform Version

Terraform v0.9.1

Affected Resource(s)

  • google_sql_database_instance

Terraform Configuration Files

provider "google" {
  credentials = "${file("account.json")}"
  region      = "europe-west-1"
  project = "${data.terraform_remote_state.project.project_id}"
}

resource "google_sql_database_instance" "master" {
  region = "europe-west1"
  project = "${data.terraform_remote_state.project.project_id}"
  database_version = "MYSQL_5_7"

  settings {
    tier = "db-n1-standard-1"
    activation_policy = "ALWAYS"
  }
}

data "terraform_remote_state" "project" {
  backend = "gcs"
  config {
    path = "projects/dev.tfstate"
    bucket = "application-cloud-tf-state"
  }
}

Debug Output

https://gist.github.com/mogronalol/c8c308e17b390023939d85ba9c5853a3

Expected Behavior

Successfully retrieve the instance creation operation, then block until it is complete.

Actual Behavior

A 404 because it does a GET too quickly, so the operation does not exist yet.

err = sqladminOperationWait(config, op, "Create Instance")
is the line which fails.

Steps to Reproduce

Please list the steps required to reproduce the issue, for example:

  1. terraform apply

This does not always fail. It will pretty much never fail if you add a sleep before

err = sqladminOperationWait(config, op, "Create Instance")
then it doesn't fail.

What does not make sense, is that if I change SqlAdminOperationWater.RefreshFunction to retry on a 404 instead of failing, it will still be a 404 after five attempts. But, a simple sleep before calling RefreshFunc for the first time means that there is no 404 so I'm a little stumped.

@paddycarver
Copy link
Contributor

Hey @mogronalol! Thanks for reporting the bug. I've been chasing this for the past few weeks (#12436 was written specifically to help track this down) so it's always good to have an extra datapoint. I'm pretty sure this is an upstream bug, and I'll work with Google to get it addressed. (Opening a bug on https://issuetracker.google.com is on my to-do list.)

If you don't mind, I'd love to know how many SQL instances are defined in your config. (Does it reliably work on the config you provided? Or only sometimes?) I have a hunch that the error is more likely to occur the more instances you define in your config, but can't corroborate that yet.

Definitely interested in getting this resolved, either upstream, through a workaround on our end, or both.

@mogronalol
Copy link
Author

I only have a single instance defined.

One thing I've noticed since raising this is that if I manually create the project my template works fine. But, if the project is created with Terraform, I get this weird race condition.

I know that sounds strange but I am wondering if project generation with the upstream API is producing some sort of invalid project state.

It's just strange because I spent a good part of today implementing a fix by retrying on a 404, but that unfortunately did not work.

It's really painful as these timing bugs are always hard to diagnose. Adding print statements to try and figure it out was impossible because a failure wasn't deterministic.

@paddycarver
Copy link
Contributor

Thanks for the extra info. I've filed this as issue 36656107 in the Google issue tracker. I'll keep an eye on it, and see if they can suggest any workarounds. :) Sorry for the trouble!

@ghost
Copy link

ghost commented Apr 11, 2020

I'm going to lock this issue because it has been closed for 30 days ⏳. This helps our maintainers find and focus on the active issues.

If you have found a problem that seems similar to this, please open a new issue and complete the issue template so we can capture all the details necessary to investigate further.

@ghost ghost locked and limited conversation to collaborators Apr 11, 2020
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants