-
Notifications
You must be signed in to change notification settings - Fork 9.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add retry handling when a request's connection is reset by peer #10715
Comments
We saw something similar but with an |
This is also happening for us when working with s3 buckets
|
This is also happening for us with a cloudfront distribution:
This is happening with a simple @davegallant maybe this issue could be renamed to be more generic since it's not only a problem for IAM instance profiles. |
This is happening to us, frequently (using both 0.12.29 and 0.13): ` Error: Error retrieving DB Instances: RequestError: send request failed19:26:37 | caused by: Post "https://rds.us-east-1.amazonaws.com/": read tcp 10.98.196.183:58901->52.119.197.147:443: read: connection reset by peer ` |
We're seeing this as well. The specific case for us just now was on the Interestingly for us, this happened whilst trying to investigate "hangs" during a terraform plan/apply cycle which seem somewhat related. In that case, TF would hit some kind of network issue, then not bother retrying for around 15 minutes, but would then retry and succeed. Still digging into this as I'm not sure whether this is a provider or TF issue. |
The same with aws_vpc_endpoint datasource when I run either plan or apply. |
I think this is more serious than a simple retry needed. I had an apply wedge badly today due to this error and it looks like this causes Terraform to lose track of resources it has created. I had to manually hunt down and destroy EC2 instances it built but didn't save into the state to unwedge it. |
This is still an issue with Terraform v0.14.8. I have a project which manages some cross region resources and the us-west-1 ones are failing somewhat regularly while us-east-1 (~20 minutes from my house) is rock-solid.
|
Also encountered this issue with ACM:
|
Having the same "connection reset by peer" issue during state checks on ElasticIPs. Have checked the AWS Service Health & Personal service health dashboards, both show all services up in the region this is running, us-west-2. Haven't seen any examples of EIP failures when searching, so noting here:
Similar error in Security Group state check, about an hour after the above error. AWS Service & Personal health dashboards both show VPC & EC2 services are healthy:
|
Same here using Terraform
Also the same thing here. |
Seeing the same error with ACM, it started happening this evening |
Seeing the same error with IAM as well, using
|
Suddenly seeing the same error with Cloudwatch log group and sqs queues. aws cli equivalent commands work fine. [ UPDATE ] After more digging I found someone suggest Network issues, dropped off corporate VPN and all works ok, reconnected to VPN and it fails - was working a few days ok. Will go and bash Corporate IT and see what they have to say for themselves |
Hi all 👋 Just letting you know that this is issue is featured on this quarters roadmap. If a PR exists to close the issue a maintainer will review and either make changes directly, or work with the original author to get the contribution merged. If you have written a PR to resolve the issue please ensure the "Allow edits from maintainers" box is checked. Thanks for your patience and we are looking forward to getting this merged soon! |
Hi @breathingdust, any update on the progress? |
FTR this happens also in case when the wait loop is waiting for resource to change state (e.g. become active), here is an example for MSK cluster:
�[0m on ../../../../../cluster/cluster.tf line 19, in resource "aws_msk_cluster" "msk_cluster": |
This would be very useful. I'm running into this issue with the msk module. |
This is very annoying! We have a large terraform project, when doing a plan or apply, 1 out of 10 times we got a (random) network error. Such an error should be retried silently! This way, we lose a lot of time waiting for nothing... Is there any progress on the issue? |
Hi everyone, The error The AWS SDK for Go v1 did not retry this error, but the AWS SDK for Go v2 does retry it. As of version 5.73.0 all AWS services in the provider have been implemented using the AWS SDK for Go v2 (except for If you still encounter this error in the provider version 5.73.0 or later, please create a new issue. |
Warning This issue has been closed, meaning that any additional comments are hard for our team to see. Please assume that the maintainers will not see them. Ongoing conversations amongst community members are welcome, however, the issue will be locked after 30 days. Moving conversations to another venue, such as the AWS Provider forum, is recommended. If you have additional concerns, please open a new issue, referencing this one where needed. |
Community Note
Terraform Version
Terraform v0.12.10
Affected Resource(s)
Terraform Configuration Files
Debug Output
N/A
Panic Output
N/A
Expected Behavior
It would be nice if there was a retry mechanism implemented for this resource since it is only doing a read.
Actual Behavior
Steps to Reproduce
terraform apply
Important Factoids
Does not look like there is any retry logic when reading an IAM instance profile:
https://github.com/terraform-providers/terraform-provider-aws/blob/98b8b848ca94031b20c3e626c9d40484e3af80de/aws/resource_aws_iam_instance_profile.go#L287-L305
An example of retrying within the same file:
https://github.com/terraform-providers/terraform-provider-aws/blob/98b8b848ca94031b20c3e626c9d40484e3af80de/aws/resource_aws_iam_instance_profile.go#L163-L175
References
None
The text was updated successfully, but these errors were encountered: