Add retry handling when a request's connection is reset by peer #10715

davegallant · 2019-11-01T17:01:42Z

Community Note

Please vote on this issue by adding a 👍 reaction to the original issue to help the community and maintainers prioritize this request
Please do not leave "+1" or "me too" comments, they generate extra noise for issue followers and do not help prioritize the request
If you are interested in working on this issue or have submitted a pull request, please leave a comment

Terraform Version

Terraform v0.12.10

Affected Resource(s)

aws_iam_instance_profile

Terraform Configuration Files

data "aws_iam_role" "my_role" {
  name = "0f9f1e2t-instance"
}

Debug Output

N/A

Panic Output

N/A

Expected Behavior

It would be nice if there was a retry mechanism implemented for this resource since it is only doing a read.

Actual Behavior

Error: Error reading IAM instance profile 0f9f1e2t-instance: RequestError: send request failed

caused by: Post https://iam.amazonaws.com/: read tcp 172.17.0.2:36404->59.133.22.207:443: read: connection reset by peer

Steps to Reproduce

terraform apply

Important Factoids

Does not look like there is any retry logic when reading an IAM instance profile:

https://github.com/terraform-providers/terraform-provider-aws/blob/98b8b848ca94031b20c3e626c9d40484e3af80de/aws/resource_aws_iam_instance_profile.go#L287-L305

An example of retrying within the same file:
https://github.com/terraform-providers/terraform-provider-aws/blob/98b8b848ca94031b20c3e626c9d40484e3af80de/aws/resource_aws_iam_instance_profile.go#L163-L175

References

None

The text was updated successfully, but these errors were encountered:

camlow325 · 2019-11-12T19:31:33Z

We saw something similar but with an aws_iam_account_alias data source instead. In that case, at least, it appeared that Terraform would attempt to perform some number of retries for the failed API call - up to the value configured for max_retries for the AWS provider instance - for cases where the request failed due to an i/o timeout. If a connection reset by peer failure occurred, though, like the one mentioned in this issue, no further retries were attempted. Would it make sense to make the more generic API retry handling be used for connection reset by peer errors?

analogrithems · 2020-06-16T21:20:39Z

This is also happening for us when working with s3 buckets

Error: error getting S3 Bucket CORS configuration: RequestError: send request failed caused by: Get https://example-config-us-west-2-prod-sandbox.s3.us-west-2.amazonaws.com/?cors=: read tcp 192.168.208.3:57182->52.218.237.161:443: read: connection reset by peer

fattybenji · 2020-07-28T09:03:14Z

This is also happening for us with a cloudfront distribution:

Error: RequestError: send request failed
 caused by: Get https://cloudfront.amazonaws.com/<date>/distribution/<id>>: read tcp <ip>:<port>-><ip>:443: read: connection reset by peer

This is happening with a simple plan in CI, so a retry logic would be nice too.

@davegallant maybe this issue could be renamed to be more generic since it's not only a problem for IAM instance profiles.

gaspo53 · 2020-08-12T22:57:49Z

This is happening to us, frequently (using both 0.12.29 and 0.13):

`

Error: Error retrieving DB Instances: RequestError: send request failed

19:26:37 | caused by: Post "https://rds.us-east-1.amazonaws.com/": read tcp 10.98.196.183:58901->52.119.197.147:443: read: connection reset by peer

`

mattburgess · 2020-08-14T10:31:19Z

We're seeing this as well. The specific case for us just now was on the ec2.eu-west-2.amazonaws.com service that was being reached via a VPC endpoint. But it also happens quite frequently for us on calls to services that have to traverse through our Internet Proxy because VPC endpoints aren't available for those services (or the services exist in a different region to our CI tooling).

Interestingly for us, this happened whilst trying to investigate "hangs" during a terraform plan/apply cycle which seem somewhat related. In that case, TF would hit some kind of network issue, then not bother retrying for around 15 minutes, but would then retry and succeed.

Still digging into this as I'm not sure whether this is a provider or TF issue.

lagrianitis · 2020-12-17T17:53:18Z

The same with aws_vpc_endpoint datasource when I run either plan or apply.

ag-TJNII · 2021-02-03T17:03:59Z

I think this is more serious than a simple retry needed. I had an apply wedge badly today due to this error and it looks like this causes Terraform to lose track of resources it has created. I had to manually hunt down and destroy EC2 instances it built but didn't save into the state to unwedge it.

acdha · 2021-03-23T21:57:30Z

This is still an issue with Terraform v0.14.8. I have a project which manages some cross region resources and the us-west-1 ones are failing somewhat regularly while us-east-1 (~20 minutes from my house) is rock-solid.

Error: Error retrieving list of aggregate authorizations: RequestError: send request failed
caused by: Post https://config.us-west-1.amazonaws.com/: read tcp …->176.32.118.187:443: read: connection reset by peer

Error: RequestError: send request failed
caused by: Post https://logs.us-west-1.amazonaws.com/: read tcp …->52.119.176.231:443: read: connection reset by peer

aws_config_aggregate_authorization
aws_cloudwatch_log_group

dimisjim · 2021-04-22T07:05:49Z

Also encountered this issue with ACM:

Error: error listing tags for ACM Certificate (arn:aws:acm:eu-west-1:<accID>:certificate/<certID>): RequestError: send request failed
caused by: Post "https://acm.eu-west-1.amazonaws.com/": read tcp <privIp>:34550->54.239.33.223:443: read: connection reset by peer

mbijon · 2021-05-24T19:17:00Z

Having the same "connection reset by peer" issue during state checks on ElasticIPs. Have checked the AWS Service Health & Personal service health dashboards, both show all services up in the region this is running, us-west-2.

Haven't seen any examples of EIP failures when searching, so noting here:

module.bastion.aws_eip.default[0]: Refreshing state... [id=eipalloc-xxxxxx]
╷
│ Error: RequestError: send request failed
│ caused by: Post "https://ec2.us-west-2.amazonaws.com/": read tcp 192.168.86.33:51541->54.xxxx:443: read: connection reset by peer
│ 
│ Error: RequestError: send request failed
│ caused by: Post "https://ec2.us-west-2.amazonaws.com/": read tcp 192.168.86.33:51524->54.xxxx:443: read: connection reset by peer

Similar error in Security Group state check, about an hour after the above error. AWS Service & Personal health dashboards both show VPC & EC2 services are healthy:

│ Error: Error authorizing security group rule type egress: RequestError: send request failed
│ caused by: Post "https://ec2.us-west-2.amazonaws.com/": read tcp 192.168.86.33:51771->54.xxxx:443: read: connection reset by peer
│ 
│   on main.tf line 526, in resource "aws_security_group_rule" "egress_sec_to_webresource":
│  526: resource "aws_security_group_rule" "egress_sec_to_webresource" {

davi5e · 2021-06-29T21:53:23Z

Also encountered this issue with ACM:

Same here using Terraform v1.0.1 and aws v3.47.0...

it looks like this causes Terraform to lose track of resources it has created

Also the same thing here.

BNMetrics · 2021-06-29T22:27:41Z

Seeing the same error with ACM, it started happening this evening
Terraform v1.0.0, aws v3.44.0
Region: us-east-2

jiashuChen · 2021-07-24T13:46:05Z

Seeing the same error with IAM as well, using
Terraform v1.0.1 and aws provider v3.51.0
Region: ap-southeast-2

│ Error: error deleting IAM Role (IAM-ROLE-NAME): RequestError: send request failed
│ caused by: Post "https://iam.amazonaws.com/": read tcp IP:PORT->DIFFERENT_IP:PORT: read: connection reset by peer

idharper · 2022-01-21T12:54:29Z

Suddenly seeing the same error with Cloudwatch log group and sqs queues.
TF v 1.0.1 aws provider v 3.73.0
Region: us-west-2 and us-east-1

aws cli equivalent commands work fine.

[ UPDATE ] After more digging I found someone suggest Network issues, dropped off corporate VPN and all works ok, reconnected to VPN and it fails - was working a few days ok. Will go and bash Corporate IT and see what they have to say for themselves

breathingdust · 2022-02-03T14:17:17Z

Hi all 👋 Just letting you know that this is issue is featured on this quarters roadmap. If a PR exists to close the issue a maintainer will review and either make changes directly, or work with the original author to get the contribution merged. If you have written a PR to resolve the issue please ensure the "Allow edits from maintainers" box is checked. Thanks for your patience and we are looking forward to getting this merged soon!

wenqiglantz-agi · 2022-12-07T00:20:35Z

Hi @breathingdust, any update on the progress?

mgusiew-guide · 2023-01-09T17:30:21Z

FTR this happens also in case when the wait loop is waiting for resource to change state (e.g. become active), here is an example for MSK cluster:

TestClusterConfig 2023-01-09T12:51:51Z logger.go:66: �[1m�[31mError: �[0m�[0m�[1mwaiting for MSK Cluster (arn:aws:kafka:xxx) create: RequestError: send request failed
TestClusterConfig 2023-01-09T12:51:51Z logger.go:66: caused by: Get "https://kafka.us-west-2.amazonaws.com/api/v2/clusters/xxx": read tcp xxx:55924->xxx:443: read: connection reset by peer�[0m
TestClusterConfig 2023-01-09T12:51:51Z logger.go:66:
TestClusterConfig 2023-01-09T12:51:51Z logger.go:66: �[0m on ../../../../../cluster/cluster.tf line 19, in resource "aws_msk_cluster" "msk_cluster":
TestClusterConfig 2023-01-09T12:51:51Z logger.go:66: 19: resource aws_msk_cluster msk_cluster �[4m{�[0m
TestClusterConfig 2023-01-09T12:51:51Z logger.go:66: �[0m
TestClusterConfig 2023-01-09T12:51:51Z logger.go:66: �[0m�[0m
TestClusterConfig 2023-01-09T12:51:51Z retry.go:99: Returning due to fatal error: FatalError{Underlying: error while running command: exit status 1; �[31m
�[1m�[31mError: �[0m�[0m�[1mwaiting for MSK Cluster (arn:aws:kafka:xxx) create: RequestError: send request failed
caused by: Get "https://kafka.us-west-2.amazonaws.com/api/v2/clusters/xxx": read tcp xxx:55924->xxx:443: read: connection reset by peer�[0m

�[0m on ../../../../../cluster/cluster.tf line 19, in resource "aws_msk_cluster" "msk_cluster":
19: resource aws_msk_cluster msk_cluster �[4m{�[0m
�[0m
�[0m�[0m}

BrianLovelace128 · 2023-09-13T00:13:20Z

This would be very useful. I'm running into this issue with the msk module.

choeflake · 2024-07-12T13:48:39Z

This is very annoying! We have a large terraform project, when doing a plan or apply, 1 out of 10 times we got a (random) network error. Such an error should be retried silently! This way, we lose a lot of time waiting for nothing...

Is there any progress on the issue?

gdavison · 2024-10-25T22:03:15Z

Hi everyone,

The error connection reset by peer is returned when a network connection is not closed cleanly by the other end of the connection, for example when the service at the other end crashes.

The AWS SDK for Go v1 did not retry this error, but the AWS SDK for Go v2 does retry it. As of version 5.73.0 all AWS services in the provider have been implemented using the AWS SDK for Go v2 (except for simpledb, which is not supported), so this error should now be retried by the provider.

If you still encounter this error in the provider version 5.73.0 or later, please create a new issue.

github-actions · 2024-10-25T22:03:26Z

Warning

This issue has been closed, meaning that any additional comments are hard for our team to see. Please assume that the maintainers will not see them.

Ongoing conversations amongst community members are welcome, however, the issue will be locked after 30 days. Moving conversations to another venue, such as the AWS Provider forum, is recommended. If you have additional concerns, please open a new issue, referencing this one where needed.

ghost added the service/iam Issues and PRs that pertain to the iam service. label Nov 1, 2019

github-actions bot added the needs-triage Waiting for first response or review from a maintainer. label Nov 1, 2019

davegallant changed the title ~~Missing retry logic when reading IAM Instance Profile~~ IAM Instance Profile - Add retry logic when reading Nov 1, 2019

obourdon mentioned this issue Dec 18, 2019

Add possibility to retry/delay/timeout on data sources #11342

Closed

davegallant changed the title ~~IAM Instance Profile - Add retry logic when reading~~ Add retry handling when a request's connection is reset by peer Oct 2, 2020

lagrianitis mentioned this issue Dec 17, 2020

aws_workspace_workspaces plan causes intermittent "RequestError: send request failed, read: connection reset by peer" #13843

Closed

jiashuChen mentioned this issue Jul 24, 2021

WIP: Automatically retry when encounter connection reset by peer error from aws api #20300

Closed

justinretzolk added bug Addresses a defect in current functionality. and removed needs-triage Waiting for first response or review from a maintainer. labels Dec 9, 2021

gdavison added provider Pertains to the provider itself, rather than any interaction with AWS. and removed service/iam Issues and PRs that pertain to the iam service. labels Jan 15, 2024

gdavison closed this as completed Oct 25, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add retry handling when a request's connection is reset by peer #10715

Add retry handling when a request's connection is reset by peer #10715

davegallant commented Nov 1, 2019 •

edited

Loading

camlow325 commented Nov 12, 2019

analogrithems commented Jun 16, 2020

fattybenji commented Jul 28, 2020

gaspo53 commented Aug 12, 2020

mattburgess commented Aug 14, 2020

lagrianitis commented Dec 17, 2020 •

edited

Loading

ag-TJNII commented Feb 3, 2021

acdha commented Mar 23, 2021

dimisjim commented Apr 22, 2021

mbijon commented May 24, 2021 •

edited

Loading

davi5e commented Jun 29, 2021

BNMetrics commented Jun 29, 2021

jiashuChen commented Jul 24, 2021 •

edited

Loading

idharper commented Jan 21, 2022 •

edited

Loading

breathingdust commented Feb 3, 2022

wenqiglantz-agi commented Dec 7, 2022

mgusiew-guide commented Jan 9, 2023 •

edited

Loading

BrianLovelace128 commented Sep 13, 2023

choeflake commented Jul 12, 2024

gdavison commented Oct 25, 2024

github-actions bot commented Oct 25, 2024

Add retry handling when a request's connection is reset by peer #10715

Add retry handling when a request's connection is reset by peer #10715

Comments

davegallant commented Nov 1, 2019 • edited Loading

Community Note

Terraform Version

Affected Resource(s)

Terraform Configuration Files

Debug Output

Panic Output

Expected Behavior

Actual Behavior

Steps to Reproduce

Important Factoids

References

camlow325 commented Nov 12, 2019

analogrithems commented Jun 16, 2020

fattybenji commented Jul 28, 2020

gaspo53 commented Aug 12, 2020

Error: Error retrieving DB Instances: RequestError: send request failed

mattburgess commented Aug 14, 2020

lagrianitis commented Dec 17, 2020 • edited Loading

ag-TJNII commented Feb 3, 2021

acdha commented Mar 23, 2021

dimisjim commented Apr 22, 2021

mbijon commented May 24, 2021 • edited Loading

davi5e commented Jun 29, 2021

BNMetrics commented Jun 29, 2021

jiashuChen commented Jul 24, 2021 • edited Loading

idharper commented Jan 21, 2022 • edited Loading

breathingdust commented Feb 3, 2022

wenqiglantz-agi commented Dec 7, 2022

mgusiew-guide commented Jan 9, 2023 • edited Loading

BrianLovelace128 commented Sep 13, 2023

choeflake commented Jul 12, 2024

gdavison commented Oct 25, 2024

github-actions bot commented Oct 25, 2024

davegallant commented Nov 1, 2019 •

edited

Loading

lagrianitis commented Dec 17, 2020 •

edited

Loading

mbijon commented May 24, 2021 •

edited

Loading

jiashuChen commented Jul 24, 2021 •

edited

Loading

idharper commented Jan 21, 2022 •

edited

Loading

mgusiew-guide commented Jan 9, 2023 •

edited

Loading