-
Notifications
You must be signed in to change notification settings - Fork 9.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
WIP: Automatically retry when encounter connection reset by peer error from aws api #20300
Conversation
Thank you for your contribution! 🚀 Please note that typically Go dependency changes are handled in this repository by dependabot or the maintainers. This is to prevent pull request merge conflicts and further delay reviews of contributions. Remove any changes to the Additional details:
|
@@ -114,3 +115,15 @@ func RetryConfigContext(ctx context.Context, delay time.Duration, delayRand time | |||
// more likely to be useful | |||
return resultErr | |||
} | |||
|
|||
func RetryOnConnectionResetByPeer(timeout time.Duration, f resource.RetryFunc) error { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Main Change
@@ -97,3 +98,55 @@ func TestRetryConfigContext_error(t *testing.T) { | |||
t.Fatal("timeout") | |||
} | |||
} | |||
|
|||
func TestRetryOnConnectionResetByPeer(t *testing.T) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unit Test
Question for the author: the AWS GoSDK has retry configured by default for all client operations. The retry mechanism will retry 3 times by default and use exponential back off with jitter. I’m curious as to how this PR will be any different? I would suggest modifying the default policy for the SDK rather than implement additional retry logic, specifically because that logic does not include exponential backoff with jitter. https://aws.github.io/aws-sdk-go-v2/docs/configuring-sdk/retries-timeouts/ Additionally, TCP reset could be happening for a number of reasons. I use Terraform all the time, running the Acceptance Tests, 20 threads at a time, and I’ve never gotten a TCP reset error. Have you checked your network route to AWS? A proxy, machine-in-the-middle, or AWS API throttling are all potential sources of TCP reset. I would recommend checking your CloudTrail logs for your account to see if you are being throttled. |
Thanks. I'll close the current pull requests. |
I'm going to lock this pull request because it has been closed for 30 days ⏳. This helps our maintainers find and focus on the active issues. |
Community Note
Relates #10715
Draft PR, I'm not sure if this is the most correct way to solve this issue, raising as a draft, feedback will be appreciated. The goal of this pull request is to resolve to have a global wrapper function to deal with connection reset by peer error from aws api, which seems to impact multiple aws region and services.
Furthermore, this helps resolve the issue with
resource.Retry
being marked deprecated.https://github.com/hashicorp/terraform-plugin-sdk/blob/main/helper/resource/wait.go#L66-L69
Lastly, this pull request now seems to have more than 100 labels automatically applied to it, which has caused verification to fail. I'm not sure how to resolve it. Some help will be appreciated.
NOTE: although there are many files changes, the core of the change is here in this file:
terraform-provider-aws/aws/internal/tfresource/retry.go
Lines 119 to 129 in 03ea02c