Skip to content

HELIX failures investigations #41563

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 22 commits into from
May 13, 2022
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
22 commits
Select commit Hold shift + click to select a range
3018652
Update template baseline test to use original file name
DamianEdwards May 6, 2022
3c2b8d0
Comment out tenmplate baseline test namespace declaration
DamianEdwards May 8, 2022
f038bb5
Comment out template warning checks
DamianEdwards May 9, 2022
ad357c5
Trigger matrix helix builds for this branch
DamianEdwards May 9, 2022
82217ae
Enable more tracing & dump collect in helix test runner
DamianEdwards May 9, 2022
64e1e78
Disable test parallelization for project template tests
DamianEdwards May 9, 2022
70a7049
Update Web API template test to ensure unique project key
DamianEdwards May 9, 2022
7082f66
Remove parallelism from template tests
DamianEdwards May 9, 2022
3872c14
Fix Blazor WASM template tests
DamianEdwards May 9, 2022
19d6783
Fix xunit warning
DamianEdwards May 9, 2022
05edb44
Bump test timeout to 60m to see if that helps
HaoK May 10, 2022
643c693
Print test timeout message & prepare for clean-up
DamianEdwards May 10, 2022
0d0dd5b
Ignore process exit code when printing CTS timeout
DamianEdwards May 11, 2022
039e163
Add timestamps to helix runner console logs & bump helix timeout for …
DamianEdwards May 11, 2022
bc1d6a1
Skip LocalDb template tests on non-Windows platforms
DamianEdwards May 11, 2022
0614b27
Actually skip tests
DamianEdwards May 11, 2022
94e1d63
Update test cmd args in TestRunner.cs
DamianEdwards May 11, 2022
e44846b
Split up long-running theory to avoid test timeouts
DamianEdwards May 11, 2022
e06b8d7
Remove template test locks & skip template tests on Debian11Arm
DamianEdwards May 12, 2022
9bff3cc
Revert helix matrix config
DamianEdwards May 12, 2022
0f5be08
PR feedback
DamianEdwards May 12, 2022
7cc446e
PR feedback
DamianEdwards May 12, 2022
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
8 changes: 6 additions & 2 deletions eng/tools/HelixTestRunner/ProcessUtil.cs
Original file line number Diff line number Diff line change
Expand Up @@ -4,6 +4,7 @@
using System;
using System.Collections.Generic;
using System.Diagnostics;
using System.Globalization;
using System.IO;
using System.Runtime.InteropServices;
using System.Text;
Expand Down Expand Up @@ -79,7 +80,7 @@ public static async Task<ProcessResult> RunAsync(
Action<int>? onStart = null,
CancellationToken cancellationToken = default)
{
Console.WriteLine($"Running '{filename} {arguments}'");
PrintMessage($"Running '{filename} {arguments}'");
using var process = new Process()
{
StartInfo =
Expand Down Expand Up @@ -151,7 +152,7 @@ public static async Task<ProcessResult> RunAsync(

process.Exited += (_, e) =>
{
Console.WriteLine($"'{process.StartInfo.FileName} {process.StartInfo.Arguments}' completed with exit code '{process.ExitCode}'");
PrintMessage($"'{process.StartInfo.FileName} {process.StartInfo.Arguments}' completed with exit code '{process.ExitCode}'");
if (throwOnError && process.ExitCode != 0)
{
processLifetimeTask.TrySetException(new InvalidOperationException($"Command {filename} {arguments} returned exit code {process.ExitCode} output: {outputBuilder.ToString()}"));
Expand Down Expand Up @@ -206,4 +207,7 @@ public static async Task<ProcessResult> RunAsync(

return await processLifetimeTask.Task;
}

public static void PrintMessage(string message) => Console.WriteLine($"{DateTime.UtcNow.ToString("O", CultureInfo.InvariantCulture)} {message}");
public static void PrintErrorMessage(string message) => Console.Error.WriteLine($"{DateTime.UtcNow.ToString("O", CultureInfo.InvariantCulture)} {message}");
}
17 changes: 12 additions & 5 deletions eng/tools/HelixTestRunner/Program.cs
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,7 @@
// The .NET Foundation licenses this file to you under the MIT license.

using System;
using System.Globalization;
using System.Threading.Tasks;

namespace HelixTestRunner;
Expand All @@ -28,7 +29,7 @@ static async Task Main(string[] args)
}
else
{
Console.WriteLine("Playwright install skipped.");
ProcessUtil.PrintMessage("Playwright install skipped.");
}
}

Expand All @@ -38,23 +39,29 @@ static async Task Main(string[] args)
{
if (!await runner.CheckTestDiscoveryAsync())
{
Console.WriteLine("RunTest stopping due to test discovery failure.");
ProcessUtil.PrintMessage("RunTest stopping due to test discovery failure.");
Environment.Exit(1);
return;
}

ProcessUtil.PrintMessage("Start running tests");
var exitCode = await runner.RunTestsAsync();
ProcessUtil.PrintMessage("Running tests complete");

ProcessUtil.PrintMessage("Uploading test results");
runner.UploadResults();
Console.WriteLine($"Completed Helix job with exit code '{exitCode}'");
ProcessUtil.PrintMessage("Test results uploaded");

ProcessUtil.PrintMessage($"Completed Helix job with exit code '{exitCode}'");
Environment.Exit(exitCode);
}

Console.WriteLine("Tests were not run due to previous failures. Exit code=1");
ProcessUtil.PrintMessage("Tests were not run due to previous failures. Exit code=1");
Environment.Exit(1);
}
catch (Exception e)
{
Console.WriteLine($"HelixTestRunner uncaught exception: {e.ToString()}");
ProcessUtil.PrintMessage($"HelixTestRunner uncaught exception: {e.ToString()}");
Environment.Exit(1);
}
}
Expand Down
117 changes: 64 additions & 53 deletions eng/tools/HelixTestRunner/TestRunner.cs

Large diffs are not rendered by default.

Original file line number Diff line number Diff line change
Expand Up @@ -29,7 +29,7 @@ public BlazorServerTemplateTest(ProjectFactoryFixture projectFactory)
[InlineData(BrowserKind.Chromium)]
public async Task BlazorServerTemplateWorks_NoAuth(BrowserKind browserKind)
{
var project = await CreateBuildPublishAsync("blazorservernoauth" + browserKind);
var project = await CreateBuildPublishAsync();

await using var browser = BrowserManager.IsAvailable(browserKind) ?
await BrowserManager.GetBrowserInstance(browserKind, BrowserContextInfo) :
Expand Down Expand Up @@ -83,9 +83,9 @@ await BrowserManager.GetBrowserInstance(browserKind, BrowserContextInfo) :
[Theory(Skip = "https://github.com/dotnet/aspnetcore/issues/30882")]
[MemberData(nameof(BlazorServerTemplateWorks_IndividualAuthData))]
[SkipOnHelix("https://github.com/dotnet/aspnetcore/issues/30825", Queues = "All.OSX")]
public async Task BlazorServerTemplateWorks_IndividualAuth(BrowserKind browserKind, bool useLocalDB)
public async Task BlazorServerTemplateWorks_IndividualAuth(BrowserKind browserKind)
{
var project = await CreateBuildPublishAsync("blazorserverindividual" + browserKind + (useLocalDB ? "uld" : ""));
var project = await CreateBuildPublishAsync();

var browser = !BrowserManager.IsAvailable(browserKind) ?
null :
Expand Down Expand Up @@ -187,5 +187,5 @@ private async Task TestBasicNavigation(IPage page)
[InlineData("SingleOrg", new string[] { "--called-api-url \"https://graph.microsoft.com\"", "--called-api-scopes user.readwrite" })]
[InlineData("SingleOrg", new string[] { "--calls-graph" })]
public Task BlazorServerTemplate_IdentityWeb_BuildAndPublish(string auth, string[] args)
=> CreateBuildPublishAsync("blazorserveridweb" + Guid.NewGuid().ToString().Substring(0, 10).ToLowerInvariant(), auth, args);
=> CreateBuildPublishAsync(auth, args);
}
Original file line number Diff line number Diff line change
Expand Up @@ -24,12 +24,12 @@ public BlazorTemplateTest(ProjectFactoryFixture projectFactory)

public abstract string ProjectType { get; }

protected async Task<Project> CreateBuildPublishAsync(string projectName, string auth = null, string[] args = null, string targetFramework = null, bool serverProject = false, bool onlyCreate = false)
protected async Task<Project> CreateBuildPublishAsync(string auth = null, string[] args = null, string targetFramework = null, bool serverProject = false, bool onlyCreate = false)
{
// Additional arguments are needed. See: https://github.com/dotnet/aspnetcore/issues/24278
Environment.SetEnvironmentVariable("EnableDefaultScopedCssItems", "true");

var project = await ProjectFactory.GetOrCreateProject(projectName, Output);
var project = await ProjectFactory.CreateProject(Output);
if (targetFramework != null)
{
project.TargetFramework = targetFramework;
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -27,7 +27,7 @@ public BlazorWasmTemplateTest(ProjectFactoryFixture projectFactory)
[InlineData(BrowserKind.Chromium)]
public async Task BlazorWasmStandaloneTemplate_Works(BrowserKind browserKind)
{
var project = await CreateBuildPublishAsync("blazorstandalone" + browserKind);
var project = await CreateBuildPublishAsync();

// The service worker assets manifest isn't generated for non-PWA projects
var publishDir = Path.Combine(project.TemplatePublishDir, "wwwroot");
Expand Down Expand Up @@ -63,7 +63,7 @@ private static async Task<IPage> NavigateToPage(IBrowserContext browser, string
[InlineData(BrowserKind.Chromium)]
public async Task BlazorWasmHostedTemplate_Works(BrowserKind browserKind)
{
var project = await CreateBuildPublishAsync("blazorhosted" + BrowserKind.Chromium, args: new[] { "--hosted" }, serverProject: true);
var project = await CreateBuildPublishAsync(args: new[] { "--hosted" }, serverProject: true);

var serverProject = GetSubProject(project, "Server", $"{project.ProjectName}.Server");

Expand Down Expand Up @@ -111,7 +111,7 @@ private static async Task AssertCompressionFormat(AspNetProcess aspNetProcess, s
[InlineData(BrowserKind.Chromium)]
public async Task BlazorWasmStandalonePwaTemplate_Works(BrowserKind browserKind)
{
var project = await CreateBuildPublishAsync("blazorstandalonepwa", args: new[] { "--pwa" });
var project = await CreateBuildPublishAsync(args: new[] { "--pwa" });

await BuildAndRunTest(project.ProjectName, project, browserKind);

Expand Down Expand Up @@ -146,7 +146,7 @@ public async Task BlazorWasmStandalonePwaTemplate_Works(BrowserKind browserKind)
[InlineData(BrowserKind.Chromium)]
public async Task BlazorWasmHostedPwaTemplate_Works(BrowserKind browserKind)
{
var project = await CreateBuildPublishAsync("blazorhostedpwa", args: new[] { "--hosted", "--pwa" }, serverProject: true);
var project = await CreateBuildPublishAsync(args: new[] { "--hosted", "--pwa" }, serverProject: true);

var serverProject = GetSubProject(project, "Server", $"{project.ProjectName}.Server");

Expand Down Expand Up @@ -226,13 +226,12 @@ public Task BlazorWasmHostedTemplate_IndividualAuth_Works_WithLocalDB(BrowserKin
public Task BlazorWasmHostedTemplate_IndividualAuth_Works_WithOutLocalDB(BrowserKind browserKind)
=> BlazorWasmHostedTemplate_IndividualAuth_Works(browserKind, false);

private async Task<Project> CreateBuildPublishIndividualAuthProject(BrowserKind browserKind, bool useLocalDb)
private async Task<Project> CreateBuildPublishIndividualAuthProject(bool useLocalDb)
{
// Additional arguments are needed. See: https://github.com/dotnet/aspnetcore/issues/24278
Environment.SetEnvironmentVariable("EnableDefaultScopedCssItems", "true");

var project = await CreateBuildPublishAsync("blazorhostedindividual" + browserKind + (useLocalDb ? "uld" : ""),
args: new[] { "--hosted", "-au", "Individual", useLocalDb ? "-uld" : "" });
var project = await CreateBuildPublishAsync(args: new[] { "--hosted", "-au", "Individual", useLocalDb ? "-uld" : "" });

var serverProject = GetSubProject(project, "Server", $"{project.ProjectName}.Server");

Expand Down Expand Up @@ -274,7 +273,7 @@ private async Task<Project> CreateBuildPublishIndividualAuthProject(BrowserKind

private async Task BlazorWasmHostedTemplate_IndividualAuth_Works(BrowserKind browserKind, bool useLocalDb)
{
var project = await CreateBuildPublishIndividualAuthProject(browserKind, useLocalDb: useLocalDb);
var project = await CreateBuildPublishIndividualAuthProject(useLocalDb: useLocalDb);

var serverProject = GetSubProject(project, "Server", $"{project.ProjectName}.Server");

Expand Down Expand Up @@ -376,7 +375,7 @@ public TemplateInstance(string name, params string[] arguments)
[Theory(Skip = "https://github.com/dotnet/aspnetcore/issues/37782")]
[MemberData(nameof(TemplateData))]
public Task BlazorWasmHostedTemplate_AzureActiveDirectoryTemplate_Works(TemplateInstance instance)
=> CreateBuildPublishAsync(instance.Name, args: instance.Arguments, targetFramework: "netstandard2.1");
=> CreateBuildPublishAsync(args: instance.Arguments, targetFramework: "netstandard2.1");

protected async Task BuildAndRunTest(string appName, Project project, BrowserKind browserKind, bool usesAuth = false)
{
Expand Down
97 changes: 96 additions & 1 deletion src/ProjectTemplates/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -36,7 +36,7 @@ To build the ProjectTemplates, use one of:

### Test

#### Running ProjectTemplate tests:
#### Running ProjectTemplate tests

To run ProjectTemplate tests, first ensure the ASP.NET localhost development certificate is installed and trusted.
Otherwise, you'll get a test error "Certificate error: Navigation blocked".
Expand All @@ -62,6 +62,101 @@ Then, use one of:
previous step, it is NOT advised that you install templates created on your local machine using just
`dotnet new -i [nupkgPath]`.

#### Conditional tests & skipping test platforms
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How much of this is specific to template tests? Should it be documented in another location instead?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah @HaoK said the same thing in a comment on a commit before I rebased (so it got lost from here) but I'll do that shuffle once this is in main.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe docs/Helix.md for most of this❔


Individual test methods can be decorated with attributes to configure them to not run ("skip running") on certain platforms. The `[ConditionalFact]` and `[ConditionalTheory]` attributes must be used on tests using the skip attributes in order for them to actually be skipped:

``` csharp
[ConditionalFact]
[OSSkipCondition(OperatingSystems.Linux | OperatingSystems.MacOSX)]
[SkipOnHelix("cert failure", Queues = "All.OSX;" + HelixConstants.Windows10Arm64)]
public async Task MvcTemplate_SingleFileExe()
{
```

An entire test project can be configured to skip specific platforms using the `<SkipHelixQueues>` property in the project's .csproj file, e.g.:

```xml
<SkipHelixQueues>
$(HelixQueueArmDebian11);
</SkipHelixQueues>
```

Tests that are skipped should have details, or better yet link to an issue, explaining why they're being skipped, either as a string argument to the attribute or a code comment.

#### Test timeouts

When tests are run as part of the CI infrastructure, a number of different timeouts can impact whether tests pass or not.

##### Helix job timeout

When queuing test jobs to the Helix infrastructure, a timeout value is passed that the entire Helix job must complete within, i.e. that job running on a single queue. This default value is set in [eng\targets\Helix.props](eng/targets/Helix.props):

```xml
<HelixTimeout>00:45:00</HelixTimeout>
```

This value is printed by the Helix runner at the beginning of the console log, formatted in seconds, e.g.:

```log
Console log: 'ProjectTemplates.Tests--net7.0' from job b2f6fbe0-4dbe-45fa-a123-9a8c876d5923 (ubuntu.1804.armarch.open) using docker image mcr.microsoft.com/dotnet-buildtools/prereqs:debian-11-helix-arm64v8-20211001171229-97d8652 on ddvsotx2l137
running $HELIX_CORRELATION_PAYLOAD/scripts/71557bd7f20e49fbbaa81cc79bd57fd6/execute.sh in /home/helixbot/work/C08609D9/w/B3D709E1/e max 2700 seconds
```

Note that some test projects might override this value in their project file and that some Helix queues are slower than others, so the same test job might complete within the timeout on one queue but exceed the timeout on another (the ARM queues are particularly prone to being slower than their AMD/Intel counterparts).

##### Helix runner timeout

The [Helix test runner](eng/tools/HelixTestRunner) launches the actual process that runs tests within a Helix job and when doing so configures its own timeout that is 5 minutes less than the Helix job timeout, e.g. if the Helix job timeout is 45 minutes, the Helix test runner process timeout will be 40 minutes.

If this timeout is exceeded, the Helix test runner will capture a dump of the test process before terminating it and printing a message in the console log, e.g.:

```log
2022-05-12T00:27:28.8279660Z Non-quarantined tests exceeded configured timeout: 40m.
```

##### Helix runner `dotnet test` timeout

When running in Helix, a test hang timeout, e.g. `dotnet test --blame-hang-timeout 15m` , is configured in [eng\tools\HelixTestRunner\TestRunner.cs](eng/tools/HelixTestRunner/TestRunner.cs)

```csharp
public async Task<int> RunTestsAsync()
{
...
var commonTestArgs = $"test {Options.Target} --diag:{diagLog} --logger xunit --logger \"console;verbosity=normal\" " +
"--blame-crash --blame-hang-timeout 15m";
```

This timeout applies to each individual `[Fact]` or `[Theory]`. Note that for `[Theory]` this timeout is **not** reset for each instance of the theory, i.e. the entire `[Theory]` must run within the specified timeout.

If this timeout is triggered, a message will be printed to the `vstest.datacollector.[date-time-stamp].log` file for the test run, e.g.:

```
19:54:18.888, 4653892436493, datacollector.dll, The specified inactivity time of 15 minutes has elapsed. Collecting hang dumps from testhost and its child processes
```

**Note:** It's a good idea to spread the number of cases for `[Theory]` tests across different test methods if the test method takes more than a few seconds to complete as this will help to keep the total `[Theory]` runtime less than the configured timeout, e.g.:

``` csharp
[ConditionalTheory]
[SkipOnHelix("https://github.com/dotnet/aspnetcore/issues/28090", Queues = HelixConstants.Windows10Arm64 + HelixConstants.DebianArm64)]
[InlineData("IndividualB2C", null)]
[InlineData("IndividualB2C", new[] { ArgConstants.UseProgramMain })]
[InlineData("IndividualB2C", new[] { ArgConstants.CalledApiUrlGraphMicrosoftCom, ArgConstants.CalledApiScopesUserReadWrite })]
[InlineData("IndividualB2C", new[] { ArgConstants.UseProgramMain, ArgConstants.CalledApiUrlGraphMicrosoftCom, ArgConstants.CalledApiScopesUserReadWrite })]
public Task MvcTemplate_IdentityWeb_IndividualB2C_BuildsAndPublishes(string auth, string[] args) => MvcTemplateBuildsAndPublishes(auth: auth, args: args);

[ConditionalTheory]
[SkipOnHelix("https://github.com/dotnet/aspnetcore/issues/28090", Queues = HelixConstants.Windows10Arm64 + HelixConstants.DebianArm64)]
[InlineData("SingleOrg", null)]
[InlineData("SingleOrg", new[] { ArgConstants.UseProgramMain })]
[InlineData("SingleOrg", new[] { ArgConstants.CalledApiUrlGraphMicrosoftCom, ArgConstants.CalledApiScopesUserReadWrite })]
[InlineData("SingleOrg", new[] { ArgConstants.UseProgramMain, ArgConstants.CalledApiUrlGraphMicrosoftCom, ArgConstants.CalledApiScopesUserReadWrite })]
[InlineData("SingleOrg", new[] { ArgConstants.CallsGraph })]
[InlineData("SingleOrg", new[] { ArgConstants.UseProgramMain, ArgConstants.CallsGraph })]
public Task MvcTemplate_IdentityWeb_SingleOrg_BuildsAndPublishes(string auth, string[] args) => MvcTemplateBuildsAndPublishes(auth: auth, args: args);
```

## More Information

For more information, see the [ASP.NET Core README](../../README.md).
Loading