Skip to content

Conversation

@HaoK
Copy link
Member

@HaoK HaoK commented Mar 2, 2022

This should help with any low frequency test flakiness

@ghost ghost added the area-infrastructure Includes: MSBuild projects/targets, build scripts, CI, Installers and shared framework label Mar 2, 2022
@ghost ghost added this to the 6.0.x milestone Mar 2, 2022
@ghost
Copy link

ghost commented Mar 2, 2022

Hi @HaoK. If this is not a tell-mode PR, please make sure to follow the instructions laid out in the servicing process document.
Otherwise, please add tell-mode label.

@HaoK HaoK requested a review from dougbu March 2, 2022 21:57
@HaoK
Copy link
Member Author

HaoK commented Mar 2, 2022

@dougbu this is probably worth a shot for servicing helix runs, always retry 3 times for failures universally

@dougbu
Copy link
Contributor

dougbu commented Mar 3, 2022

Sorry, I'm not going to approve this without buy-in on ignoring flakiness in our servicing tests from @mkArtakMSFT @adityamandaleeka @rafikiassumani-msft and hopefully @Pilchie. This is a heavy hammer and, as you said elsewhere, it's not supported in the release/5.0 branch (or did I misread something❔).

Another option would be to make it our policy to backport all test-only changes as far as they're relevant.

Leads, see also discussion in #40467

@HaoK
Copy link
Member Author

HaoK commented Mar 3, 2022

I'm sure it is supported in release/5.0 as its a helix feature, but I don't think the retry stuff was ever ported to 5.0, so it'd require that. Just to be clear, I'm not pushing for this, this was just in response to your comment

we need to keep on top of flaky test fixes in our servicing branches because the window the branches are open doesn't leave us much time for retrying PRs and the impact of flakiness here can be delayed releases.

This would get you what you want (more green servicing builds), without whack-a-mole in servicing for tests

@dougbu
Copy link
Contributor

dougbu commented Mar 3, 2022

I'm sure it is supported in release/5.0 as its a helix feature, but I don't think the retry stuff was ever ported to 5.0, so it'd require that.

Do you mean our retry stuff wasn't ported to release/5.0 in this repo❔ We definitely don't have an eng/test-configuration.json file.

In any case, Helix SDK features vary between the Arcade branches. My question was whether their down-level branch contained the support. Reading https://github.com/dotnet/arcade/blob/release/5.0/src/Microsoft.DotNet.Helix/Sdk/tools/Microsoft.DotNet.Helix.Sdk.MonoQueue.targets (which is where test-configuration.json is found in release/6.0), the answer seems to be "no". We'd either need to ask for the feature to be backported (probably easily justified) or choose another option.

Just to be clear, I'm not pushing for this, this was just in response to your comment

No worries. I'll admit I wasn't positive how strongly you felt about this option and appreciate the clarification❕

@Pilchie
Copy link
Member

Pilchie commented Mar 3, 2022

Personally, I'm in favor of both approaches:

  1. When making flaky test fixes in main, spend the time to see if they also apply in servicing branches and can be easily ported there. If so, then port them.
  2. Increase retries for servicing branches to help get PR builds through in the limited windows we have.

Both together would reduce risk of retries masking real issues, while also helping get PRs through servicing branches.

@dougbu
Copy link
Contributor

dougbu commented Mar 3, 2022

Increase retries for servicing branches to help get PR builds through in the limited windows we have.

To clarify: Increase retries for all Helix work items or use retries similarly to what we have in 'main' (possibly w/ additions for servicing-specific test flakiness)❔

@dougbu
Copy link
Contributor

dougbu commented May 5, 2022

Do we need this PR open @HaoK

@HaoK
Copy link
Member Author

HaoK commented May 5, 2022

No, I'll send out an email for discussion with some ideas about tweaking our quarantine process soon

@HaoK HaoK closed this May 5, 2022
@dougbu dougbu removed this from the 6.0.x milestone May 12, 2022
@HaoK HaoK reopened this Jul 6, 2022
@HaoK HaoK marked this pull request as ready for review July 6, 2022 16:57
@HaoK HaoK requested a review from a team as a code owner July 6, 2022 16:57
@dougbu
Copy link
Contributor

dougbu commented Aug 3, 2022

@HaoK should we wait for an update here (given problems w/ retries in 'main') before merging❔

@HaoK
Copy link
Member Author

HaoK commented Aug 3, 2022

What problems with retries in main are you referring to? IIS test issues are orthogonal to retrying by 3 times by default

@dougbu
Copy link
Contributor

dougbu commented Aug 3, 2022

What problems with retries in main are you referring to? IIS test issues are orthogonal to retrying by 3 times by default

The IIS test issues are exactly what I was referring to. The problems there are intertwined because the impact of retries when they fail will simply be increased Helix work item times. If #41283 wasn't backported to release/6.0 or all of the affected IIS tests are quarantined here, I'm fine. Please let me know.

@HaoK
Copy link
Member Author

HaoK commented Aug 3, 2022

Yeah the dispose changes weren't backported anywhere so we should be fine just defaulting to 3 local retries

@dougbu dougbu merged commit f46133f into release/6.0 Aug 3, 2022
@dougbu dougbu deleted the haok/6retry branch August 3, 2022 19:40
@dougbu dougbu added the tell-mode Indicates a PR which is being merged during tell-mode label Aug 3, 2022
@wtgodbe wtgodbe added this to the 6.0.9 milestone Aug 9, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area-infrastructure Includes: MSBuild projects/targets, build scripts, CI, Installers and shared framework tell-mode Indicates a PR which is being merged during tell-mode

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants