-
Notifications
You must be signed in to change notification settings - Fork 10.5k
Default to 3 local retries #40503
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Default to 3 local retries #40503
Conversation
|
Hi @HaoK. If this is not a tell-mode PR, please make sure to follow the instructions laid out in the servicing process document. |
|
@dougbu this is probably worth a shot for servicing helix runs, always retry 3 times for failures universally |
|
Sorry, I'm not going to approve this without buy-in on ignoring flakiness in our servicing tests from @mkArtakMSFT @adityamandaleeka @rafikiassumani-msft and hopefully @Pilchie. This is a heavy hammer and, as you said elsewhere, it's not supported in the release/5.0 branch (or did I misread something❔). Another option would be to make it our policy to backport all test-only changes as far as they're relevant. Leads, see also discussion in #40467 |
|
I'm sure it is supported in release/5.0 as its a helix feature, but I don't think the retry stuff was ever ported to 5.0, so it'd require that. Just to be clear, I'm not pushing for this, this was just in response to your comment
This would get you what you want (more green servicing builds), without whack-a-mole in servicing for tests |
Do you mean our retry stuff wasn't ported to release/5.0 in this repo❔ We definitely don't have an eng/test-configuration.json file. In any case, Helix SDK features vary between the Arcade branches. My question was whether their down-level branch contained the support. Reading https://github.com/dotnet/arcade/blob/release/5.0/src/Microsoft.DotNet.Helix/Sdk/tools/Microsoft.DotNet.Helix.Sdk.MonoQueue.targets (which is where test-configuration.json is found in release/6.0), the answer seems to be "no". We'd either need to ask for the feature to be backported (probably easily justified) or choose another option.
No worries. I'll admit I wasn't positive how strongly you felt about this option and appreciate the clarification❕ |
|
Personally, I'm in favor of both approaches:
Both together would reduce risk of retries masking real issues, while also helping get PRs through servicing branches. |
To clarify: Increase retries for all Helix work items or use retries similarly to what we have in 'main' (possibly w/ additions for servicing-specific test flakiness)❔ |
|
Do we need this PR open @HaoK❔ |
|
No, I'll send out an email for discussion with some ideas about tweaking our quarantine process soon |
|
@HaoK should we wait for an update here (given problems w/ retries in 'main') before merging❔ |
|
What problems with retries in main are you referring to? IIS test issues are orthogonal to retrying by 3 times by default |
The IIS test issues are exactly what I was referring to. The problems there are intertwined because the impact of retries when they fail will simply be increased Helix work item times. If #41283 wasn't backported to release/6.0 or all of the affected IIS tests are quarantined here, I'm fine. Please let me know. |
|
Yeah the dispose changes weren't backported anywhere so we should be fine just defaulting to 3 local retries |
This should help with any low frequency test flakiness