Skip to content

Networking stress tests moved out of Hosted pool #35011

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged

Conversation

alnikola
Copy link
Contributor

Fixes #34780

@ghost
Copy link

ghost commented Apr 15, 2020

Tagging subscribers to this area: @ViktorHofer
Notify danmosemsft if you want to be subscribed.

@davidsh davidsh added this to the 5.0 milestone Apr 15, 2020
@ghost
Copy link

ghost commented Apr 15, 2020

Tagging subscribers to this area: @dotnet/ncl
Notify danmosemsft if you want to be subscribed.

@davidsh davidsh added the test-bug Problem in test source code (most likely) label Apr 15, 2020
@alnikola
Copy link
Contributor Author

/azp list

@alnikola
Copy link
Contributor Author

/azp run runtime-libraries stress-http

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@alnikola
Copy link
Contributor Author

/azp run runtime-libraries stress-ssl

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@safern
Copy link
Member

safern commented Apr 15, 2020

So the BYOC unix pools don't have powershell in them. I spoke with @alnikola and the most natural thing is to follow what we do for all our build scripts which is providing a Unix and a Windows build scripts to not require devs to install other dependencies in their machines and to have native support. I helped @alnikola by creating the Unix scripts. PTAL

cc: @eiriktsarpalis

@safern
Copy link
Member

safern commented Apr 15, 2020

/azp run runtime-libraries stress-http

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@safern
Copy link
Member

safern commented Apr 15, 2020

/azp run runtime-libraries stress-ssl

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@safern
Copy link
Member

safern commented Apr 15, 2020

/azp run runtime-libraries stress-http

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@safern
Copy link
Member

safern commented Apr 15, 2020

/azp run runtime-libraries stress-ssl

@alnikola
Copy link
Contributor Author

I found the cause of current failures in master and opened the PR #38027
to fix it.

As the matter of fact, that PR can fix only Linux tests. but on NanoServer they fail with actual HttpClient errors detected by stress test application. I see a lot of TaskCanceledExceptions thrown on client container while making calls to server, so it might be an infra issue caused by VM perf degradation. I will check if there are same errors in this PR running on a different queue.

@alnikola
Copy link
Contributor Author

@MattGal I ran commands from http.yml building Linux tests locally on my Windows machine and confirmed that step 6 completes successfully (see below). Since I updated the base SDK image version to .NET 5.0, it now uses Build Engine version 16.7.0 whereas previously it was Build Engine version 16.3, so it could be causing that hang. Let's wait and see if build succeeds on CI (SDK image version update was pushed yesterday)

Step 6/10 : RUN dotnet build -c $CONFIGURATION
 ---> Running in 935dd7d2147a
Microsoft (R) Build Engine version 16.7.0-preview-20310-07+ee1c9fd0c for .NET
Copyright (C) Microsoft Corporation. All rights reserved.

  Determining projects to restore...
  Restored /app/HttpStress.csproj (in 3.24 sec).
  You are using a preview version of .NET. See: https://aka.ms/dotnet-core-preview
  HttpStress -> /app/bin/Release/netcoreapp3.0/HttpStress.dll

Build succeeded.
    0 Warning(s)
    0 Error(s)

Time Elapsed 00:00:06.41

@alnikola
Copy link
Contributor Author

/azp run runtime-libraries stress-http

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

- Windows OS version set to 1809
@alnikola
Copy link
Contributor Author

/azp run runtime-libraries stress-http

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@alnikola
Copy link
Contributor Author

/azp run runtime-libraries stress-ssl

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@alnikola
Copy link
Contributor Author

alnikola commented Jun 17, 2020

@MattGal I fixed container builds on both of Windows and Linux for HttpStress and only Windows build for SslStress. On Linux SslStress container build cannot execute SslStress/run-docker-compose.sh and fails with Permission denied. This command is defined in ssl.yml, so it seems to be happening on the host OS. Could you please check why test runner process doesn't have required permissions?

This is the last issue preventing me from completing this PR. HttpStress tests are currently failing due to actual errors in the scenario and our team will check and prioritize them accordingly, but I have to complete this PR first.

@MattGal
Copy link
Member

MattGal commented Jun 17, 2020

@MattGal I fixed container builds on both of Windows and Linux for HttpStress and only Windows build for SslStress. On Linux SslStress container build cannot execute SslStress/run-docker-compose.sh and fails with Permission denied. This command is defined in ssl.yml, so it seems to be happening on the host OS. Could you please check why test runner process doesn't have required permissions?

This is the last issue preventing me from completing this PR. HttpStress tests are currently failing due to actual errors in the scenario and our team will check and prioritize them accordingly, but I have to complete this PR first.

I'll take a look and see if I have anything useful to note; do note the running user on these build agents has some options to unblock itself (will ping you on Teams about specifics)

@safern
Copy link
Member

safern commented Jun 17, 2020

On Linux SslStress container build cannot execute SslStress/run-docker-compose.sh and fails with Permission denied. This command is defined in ssl.yml.

@alnikola we need to fix the permissions of the added script (chmod +x). I will push that now.

I will close the PR and trigger the pipelines manually from a branch to not waste resources (runtime, runtime-live-build, runtime-perf build, etc).

@safern safern closed this Jun 17, 2020
@safern
Copy link
Member

safern commented Jun 17, 2020

Test build with chmod +x in the .sh file: https://dev.azure.com/dnceng/public/_build/results?buildId=692516&view=results

@safern
Copy link
Member

safern commented Jun 17, 2020

@alnikola since I closed the PR it seems like I can't push to your branch; however it seems like the Linux ssl build is now fixed, just the windows one failing. This is the commit that fixes it so that you can cherry pick it: f3466ad

@alnikola alnikola reopened this Jun 18, 2020
@alnikola
Copy link
Contributor Author

@safern I repeated chmod x command on my branch and reopened this PR. Additionally, I'd like to clarify that HttpStress Windows build is failing due to some actual issues detected by the stress test. We will investigate it once this PR is merged.

@alnikola
Copy link
Contributor Author

/azp run runtime-libraries stress-http

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@alnikola
Copy link
Contributor Author

/azp run runtime-libraries stress-ssl

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@alnikola
Copy link
Contributor Author

@MattGal @safern Strangely, but after I merged this into master Windows container builds started failing on docker-compose with a NativeCommandError without reporting any further details on both of runtime-libraries http-stress and runtime-libraries ssl-stress.
Could you please check what is going on? Could it be that docker-compose is not properly configured on some agents?

For the reference, these are pipeline runs I did on this PR right before merging. As you can see, Build HttpStress and Build SslStress completed successfully.

@MattGal
Copy link
Member

MattGal commented Jun 18, 2020

@MattGal @safern Strangely, but after I merged this into master Windows container builds started failing on docker-compose with a NativeCommandError without reporting any further details on both of runtime-libraries http-stress and runtime-libraries ssl-stress.
Could you please check what is going on? Could it be that docker-compose is not properly configured on some agents?

For the reference, these are pipeline runs I did on this PR right before merging. As you can see, Build HttpStress and Build SslStress completed successfully.

I'm looking, but maybe we can hold off declaring "some machines are not properly configured" til this has worked a single time in master after your commit :)? All we know is it was tried once and failed.

All this error is telling us is that the thing you invoked wrote to standard error. We get nothing else, and with the way it's being invoked I'm pretty sure that we're not even getting its std err.

Example local ISE docker compose std err:

docker-compose : --build-arg is only supported when services are specified for API version < 1.25. Please use a Compose file version > 2.2 or specify which services to build.
At D:\scratch\compose\somestuff.ps1:10 char:1
+ docker-compose --file "$COMPOSE_FILE" build $BUILD_ARGS.Split()
+ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
    + CategoryInfo          : NotSpecified: (--build-arg is ...vices to build.:String) [], RemoteException
    + FullyQualifiedErrorId : NativeCommandError

That top line may be missed by console host. Since you log none of the args being passed to the command, so I'd recommend adding logging to what the actual command you're running and all other values we can't see here; a few Write-Outputs can really explain things.

My first guess is that PRs have a different value for BUILD_CONFIGURATION than post-CI

@ghost ghost locked as resolved and limited conversation to collaborators Dec 9, 2020
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
area-System.Net test-bug Problem in test source code (most likely)
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Out of disk space building on Docker NanoServer
9 participants