-
Notifications
You must be signed in to change notification settings - Fork 8.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
HADOOP-16499. S3A retry policy to be exponential #1246
HADOOP-16499. S3A retry policy to be exponential #1246
Conversation
-exponential retry policy used -retry interval = 1s -retry attempts = 8 -docs updated Change-Id: I48b17af34a50b7291f2c360ee74dbce7540556b4
…d were generating failures Change-Id: Id3692c29584d08749e1a78f4c95c3298a61672b7
test run has shown some tuning is needed on those which simulate failures |
Change-Id: Idb034b81c12246bcb69de050ac91643c6ac34327
latest run against ireland with s3guard and versioning |
Change-Id: I2c62aceb344a7bb531860651babce9ff85486df5
For reviewers: the reason I move off the |
two other tests timed out on a test run
|
💔 -1 overall
This message was automatically generated. |
🎊 +1 overall
This message was automatically generated. |
Tuning some tests to cope with shorter timeouts by setting the timeouts to ranges which they are happy with. For ITestS3AInconsistency that means we want S3Guard to time out eventually. All the various values are no all configured in one place so their requirements are visible. Change-Id: I8fffc95d79a74f371b2ff78271e58530857b73cd
🎊 +1 overall
This message was automatically generated. |
Turns out there are even more tests which have expectations that connectivity/read exceptions retry for a time less than the test timeout, i.e. the default of 20s. Shrink the retry time to a couple of seconds and the tests are faster
|
it turns out that with a long retry interval, various tests which were expecting failures, be it in network setup or OOB tests were using the default retry times -extend them and tests take longer, sometimes so much that they time out. This patch cranks back the retry intervals on more tests to fail much faster than before, which is actually an improvement. Considered but not done: changing the defaults in the AbstractS3ATestBase; It seems better to use the real defaults and identify those tests which expect to retry a lot Change-Id: I1e9bac752907a6ced8a0bc7c37c95006dd8d6e98
final test run: all good. test failure in [ERROR] {{test_300_MetastorePrune}} which is HADOOP-16481 with pr #1209 |
🎊 +1 overall
This message was automatically generated. |
🎊 +1 overall
This message was automatically generated. |
one question for reviewers, what are the good initial values? with 500ms and 7 attempts, that's 64s of delay, with 32s being the final pause. I don't want to massively expand the time to fail beyond 1m as this timeout also handles things like time to give up on network errors and we don't want to block for vast amounts of time there. |
Change seems good to me +1 on it.
looks reasonable. I'm running the test now, I'll be back with the results soon. |
thx. i will wait for your test run before committing |
Running tests against ireland, I got the following error:
Otherwise nothing else. This failure is intermittent and I will not show on re-run. |
seems good to me, +1; will commit soon. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1; committing.
…ibuted by Steve Loughran.
-exponential retry policy used
-retry interval = 1s
-retry attempts = 8
-docs updated
tested s3 ireland
Change-Id: I48b17af34a50b7291f2c360ee74dbce7540556b4