Skip to content

attempt to fix up EnterpriseTests #65981

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 12 commits into from
Mar 3, 2022
Merged

attempt to fix up EnterpriseTests #65981

merged 12 commits into from
Mar 3, 2022

Conversation

wfurt
Copy link
Member

@wfurt wfurt commented Mar 1, 2022

contributes to https://github.com/dotnet/core-eng/issues/15594. It seems like te container cannot resolve outside addresses,
There may be cleaner ways how to fix this But as stop-gap, tis seems to work e.g. there is already test script so if the lookup fails it would add public server.

@ghost ghost added the area-System.Net label Mar 1, 2022
@ghost ghost assigned wfurt Mar 1, 2022
@ghost
Copy link

ghost commented Mar 1, 2022

Tagging subscribers to this area: @dotnet/ncl
See info in area-owners.md if you want to be subscribed.

Issue Details

null

Author: wfurt
Assignees: wfurt
Labels:

area-System.Net

Milestone: -

@wfurt
Copy link
Member Author

wfurt commented Mar 1, 2022

The change itself seems to work e.g. the build is running again. But it times out as the new machines/pool seems to be bit slower. My attempt to fix the timeout are failing so for. Let me know if you have any ideas @ulisesh @MattGal

@MattGal
Copy link
Member

MattGal commented Mar 1, 2022

The change itself seems to work e.g. the build is running again. But it times out as the new machines/pool seems to be bit slower. My attempt to fix the timeout are failing so for. Let me know if you have any ideas @ulisesh @MattGal

Taking a peek.

@MattGal
Copy link
Member

MattGal commented Mar 1, 2022

@wfurt looking at your build, if you just set a higher timeout it'd probably pass. Why it's so slow I'd guess is the many, many, instances of this retry:

  Retrying 'FindPackagesByIdAsync' for source 'https://pkgs.dev.azure.com/dnceng/9ee6d478-d288-47f7-aacc-f6e6d082ae6d/_packaging/c9f8ac11-6bd8-4926-8306-f075241547f7/nuget/v3/flat2/microsoft.net.compilers.toolset/index.json'.
  The HTTP request to 'GET https://pkgs.dev.azure.com/dnceng/9ee6d478-d288-47f7-aacc-f6e6d082ae6d/_packaging/c9f8ac11-6bd8-4926-8306-f075241547f7/nuget/v3/flat2/microsoft.net.compilers.toolset/index.json' has timed out after 100000ms.

... which I'd venture could be addressed by some judicious pruning of the nuget.config file.

@wfurt
Copy link
Member Author

wfurt commented Mar 1, 2022

Do you know how to set the overall timeout @MattGal? I did one attempt and it broke the build e.g. the pipeline would not even start.

@MattGal
Copy link
Member

MattGal commented Mar 1, 2022

Do you know how to set the overall timeout @MattGal? I did one attempt and it broke the build e.g. the pipeline would not even start.

Oh! I thought we were talking about the nuget issue. I would convert this yaml to use stages and jobs then follow the instructions as seen here:
https://docs.microsoft.com/en-us/azure/devops/pipelines/process/phases?view=azure-devops&tabs=yaml

The way you're doing it isn't super well supported any more and it's possible just putting timeoutInMinutes: 90 at the top would work, but if you want a format that is "officially supported" the one-time conversion seems worth it and can be part of this PR. Lots of examples to be found in the dotnet org on GitHub.

jobs:
- job: myJob
  timeoutInMinutes: 10
  pool:
    vmImage: 'ubuntu-latest'
  steps:
  - bash: echo "Hello world"

@wfurt wfurt requested review from MattGal, ulisesh and a team March 3, 2022 16:25
@wfurt wfurt marked this pull request as ready for review March 3, 2022 16:26
@wfurt
Copy link
Member Author

wfurt commented Mar 3, 2022

Test failures seem unrelated. Mix of infrastructure and product issues.
The enterprise-linux pipeline finished successfully

https://dev.azure.com/dnceng/public/_build/results?buildId=1642471&view=logs&j=47c231e8-52e2-5eb6-8574-66afcfcee82a

Copy link
Member

@rzikm rzikm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@wfurt wfurt merged commit 9174037 into dotnet:main Mar 3, 2022
@wfurt wfurt deleted the enterpriseTests branch March 3, 2022 22:16
@agocke
Copy link
Member

agocke commented Mar 3, 2022

@wfurt
Copy link
Member Author

wfurt commented Mar 3, 2022

no, I don't think so. This change touched only code used in the enterprise-linux path.

@MattGal
Copy link
Member

MattGal commented Mar 4, 2022

Was this the cause of the docker pull failures, e.g. in https://dev.azure.com/dnceng/public/_build/results?buildId=1643557&view=logs&j=8679535e-9046-505f-8bbe-da251e73ecbd&t=ab7958ed-5297-5b46-f99a-06ce433e067d

Unless you accidentally pasted the wrong link, this isn't a docker pull failure. (Regardless those are mostly Mcr.microsoft.com having a bad minute or two)

While it is a docker work item, this failure is :

urllib3.exceptions.MaxRetryError: HTTPSConnectionPool(host='dev.azure.com', port=443): Max retries exceeded with url: /dnceng/_apis (Caused by NewConnectionError('<urllib3.connection.VerifiedHTTPSConnection object at 0x04A42DA8>: Failed to establish a new connection: [WinError 10060] A connection attempt failed because the connected party did not properly respond after a period of time, or established connection failed because connected host has failed to respond'))

reference query:

let logsIWant=
WorkItems
| where JobName == "90026d98-4db1-4199-a444-1d69b8aafa14"
| where FriendlyName == "System.Xml.Linq.Properties.Tests"
| project WorkItemId;
Logs
| where WorkItemId in (logsIWant)

There's lots of retries for this talking to Azdo stage, but sometimes it just doesn't work well. It's also important to note that the "talking to azdo" stage occurs on the host, not inside the container, so we can't blame Docker for this one.

@agocke
Copy link
Member

agocke commented Mar 4, 2022

I think there were two failures in there: the Mono llvmfull job failed with:

##[warning]Docker pull failed with exit code 1, back off 5.667 seconds before retry.
/usr/bin/docker pull mcr.microsoft.com/dotnet-buildtools/prereqs:centos-7-20210714125435-9b5bbc2
Error response from daemon: Get "[https://mcr.microsoft.com/v2/"](https://mcr.microsoft.com/v2/%22): net/http: request canceled while waiting for connection (Client.Timeout exceeded while awaiting headers)
##[error]Docker pull failed with exit code 1

@wfurt
Copy link
Member Author

wfurt commented Mar 4, 2022

getting more clean runs on that pipeline https://dev.azure.com/dnceng/public/_build?definitionId=690&_a=summary
Got 100% pass on unrelated PR #65876

@karelz karelz added this to the 7.0.0 milestone Apr 8, 2022
akoeplinger pushed a commit to akoeplinger/runtime that referenced this pull request May 5, 2022
* attemp to fix up EnterpriseTests

* fix text

* increase timoeout on build step

* fix temeout

* add jobs

* update job

* fix formating

* fix name

* update server check

* restore original resolver before tests

* fix path

* experiment

(cherry picked from commit 9174037)
@ghost ghost locked as resolved and limited conversation to collaborators May 8, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants