Skip to content

Handling of Skipped Tests in Helix framework #7286

@miniksa

Description

@miniksa

We have been using the Skipped state for two situations in our code:

  1. The test is dynamically deciding it cannot run because of situations only detectible at runtime in the operating system
  2. The test is broken or flaky for some reason and we want to turn it off temporarily until we can fix it in a work item.

Unfortunately, with the adoption of the Helix scripts from Microsoft-UI-XAML, this isn't fully compatible.
This is because the Helix scripts, when run on the Helix machines, have some retry logic in them. They take anything that is NOT a pass and they re-run it up to 10 times. They store all the subresults and bucket tests into 3 categories:

  1. Pass - Just flat out passed
  2. Skipped - Flaky/unreliable, had some mix of pass/fail on the runs
  3. Fail - Just flat out failed.

Then in the final post-processing steps for test results, things that are Skipped are evaluated for the pass/fail threshold and converted to either Pass or Fail based on whether they indicated "Pass" above or below the threshold count.

This makes our usage of Skipped do the following:

  1. The scripts see the 'Skipped' out of TAEF as !Passed and mark them for rerun.
  2. They rerun the 'Skipped' test up to 10 times and see !Passed and mark each run as a Fail.
  3. The whole flakiness is reported as 'Skipped' temporarily on the build machine
  4. In the post processing steps, the subruns of 'Skipped' are evaluated, every one of them is !Passed (and thus Fail) and the test is marked as Fail.

So in summary, the scripts right now only understand two things: Pass and Fail. Also they reuse Skipped as an intermediate counter state only to further evaluate it into Pass/Fail later.

That doesn't work for us.

To complete the PR where I brought Helix online, I used the TestAttribute "Ignore" to "True" instead on tests that need to be disabled temporarily for some reason. I did this only on the UIA/LocalTests because they're the only ones affected by running inside Helix with the retry scripts. The Unit/Feature tests that run on the build machine (and use a small subset of the scripts just to report their test results to AzDO) are fine reporting Skipped as they are.

On talking with the Microsoft UI XAML team, for handling tests that dynamically decide they're OK, they instead chose to report "Pass" and exit early as necessary.

Overall, I think this is what we need to do going forward:

  1. For tests dynamically deciding we need to either
    • Convert them to report Pass early and accept that.
    • Teach HelixTestHelpers.cs and the retry logic how to handle either the Skipped or NotRun state (and update the relevant tests as appropriate) to pass that all the way through and report up to AzDO without rerunning them when it's intentionally skipped (and without post-processing them into a binary Pass/Fail state.)
  2. For tests that are broken or flaky and need turning off with a filed work item, just set Ignore=True as the test attribute. (This also allows the te.exe command-line argument /runIgnoredTests to be used to run them when attempting to fix them up.)

Metadata

Metadata

Assignees

No one assigned

    Labels

    Area-BuildIssues pertaining to the build system, CI, infrastructure, metaIssue-TaskIt's a feature request, but it doesn't really need a major design.Needs-Tag-FixDoesn't match tag requirementsProduct-ConhostFor issues in the Console codebaseProduct-ConptyFor console issues specifically related to conptyProduct-TerminalThe new Windows Terminal.

    Type

    No type

    Projects

    No projects

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions