Skip to content

Add test leg to the PR build to run libraries tests on iOS x64 Simulators #37476

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 94 commits into from
Sep 10, 2020

Conversation

premun
Copy link
Member

@premun premun commented Jun 5, 2020

Forked from #37057

Adds the iOS test leg that will execute selected BCL tests on an iOS x64 Simulator (13.5) using the OSX.1015.Amd64.Open queue which is our on-prem Mac Mini infrastructure (60 machines).

To prepare the Helix job, we are using the Arcade Helix SDK, more specifically the XHarness Helix SDK which uses the XHarness tool to control the test run.

@ghost
Copy link

ghost commented Jun 5, 2020

Tagging subscribers to this area: @ViktorHofer
Notify danmosemsft if you want to be subscribed.

@azure-pipelines
Copy link

Commenter does not have sufficient privileges for PR 37476 in repo dotnet/runtime

@premun premun force-pushed the prvysoky/use-helix-sdk branch from 71b4ee2 to be7b233 Compare June 5, 2020 09:40
Copy link
Member

@akoeplinger akoeplinger left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@directhex
Copy link
Contributor

https://helix.dot.net/api/2019-06-17/jobs/ba701e37-fa40-4a78-92d9-729043b36d9c/workitems/Microsoft.Extensions.Configuration.Json.Tests/console

Timed out installing the test app?


dotnet xharness ios test --app="$APP_BUNDLE" \
--targets=$TARGET \
dotnet xharness ios test \
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will we support filtering tests or any xunit parameters? Like -notrait Category=Outerloop? If so, this is wrong and should use RunScriptCommand in the MSBuild side to set this command.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ideally yes, but we first need a way to flow arbitrary command line arguments to the iOS/Android test apps and that will need some changes in xharness and the iOS/Android test runners, I don't think we need to block the PR on that.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good. We just need to be conscious that PR validation is running outerloop tests for iOS as well, which might be causing the slow runs.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ah no, we added handling for that already with

<WriteLinesToFile File="$(PublishDir)xunit-excludes.txt" Lines="$(_withoutCategories.Replace(';', '%0dcategory='))" />
which is read by the testrunner so Outerloop and failing tests should already be excluded.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, sounds good. Maybe we should do it how we do it in other places passing them as an argument but that is in the future, no need to block this PR.

@premun
Copy link
Member Author

premun commented Sep 7, 2020

https://helix.dot.net/api/2019-06-17/jobs/ba701e37-fa40-4a78-92d9-729043b36d9c/workitems/Microsoft.Extensions.Configuration.Json.Tests/console

Timed out installing the test app?

Looks like it happened again but on the very same machine (dci-macpro-06). I will investigate what's up

@premun
Copy link
Member Author

premun commented Sep 8, 2020

I took the problematic machine offline for investigations so let's see whether the rest goes green now.
I will have time to investigate next week.

@premun
Copy link
Member Author

premun commented Sep 8, 2020

Alright, looks everything is green. So I wonder if we want to check this in or not. The reason is that sometimes the iOS build times out when it is on one of the slow agents (or whatever the reason is), so this will bring flakey-ness. On the other hand, I am not sure whether this is not happening already then?

@safern
Copy link
Member

safern commented Sep 8, 2020

I think if we don't check this in we won't find those flaky issues and fix them. However, the question I have is, do we have the capacity yet to enable this on all PRs and CI?

@premun
Copy link
Member Author

premun commented Sep 9, 2020

The capacity for the iOS tests itself is OK. The tests only take 6 minutes right now. We have 59 machines for that and there's 155 of work items so the test itself is usually super short. The capacity for building the iOS tests is coming from AzDO and I am not sure how are we doing there. I will try to find out.

@safern
Copy link
Member

safern commented Sep 9, 2020

Building the product and tests took 36 mins in last build which is not bad.

@directhex
Copy link
Contributor

Practically speaking, what blockers remain on this PR?

@premun
Copy link
Member Author

premun commented Sep 9, 2020

@directhex Practically speaking, what blockers remain on this PR?

So, from my point of view (and please consider that this is my first and only interaction with dotnet/runtime so far and I don't have experience in how the overall PR builds behave):

  • Building the iOS apps times out every now and then when a "slow agent" is processing the build. A re-run is needed. I am not sure whether this causes problems with other legs already since it doesn't seem to be tied with this particular leg but..
  • I don't know how often it happens on regular basis that you need to re-run a leg in a PR build so I don't know what is the "regular flakey-ness".
  • This week one machine somehow went bad and installation of apps started to hang. I took it offline for investigations but I am not sure about the root cause still and I am not sure this won't happen again to bring more flakey-ness. (but as @safern noted, maybe the only way to find out is to check it in and observe).
  • I am not sure how to approach the question of capacity. I personally think we are fine for iOS but again - maybe the only way to see is to check it in.

So.. I don't know how flakey the PR builds in dotnet/runtime are already and whether we won't bring a lot of pain by checking this in, but I guess this is up to you guys. I don't know whether we shouldn't just flag the leg as not required and observe first or what is the regular procedure when adding new platforms, but I just don't have experience with this repository to decide this.

Other than that, the iOS tests part seems quite stable and fast, so..

@safern
Copy link
Member

safern commented Sep 9, 2020

Building the iOS apps times out every now and then when a "slow agent" is processing the build. A re-run is needed. I am not sure whether this causes problems with other legs already since it doesn't seem to be tied with this particular leg but..

We're already seeing this in other OSX, iOS legs and it seems to be an AzDo bug which seems to be under investigation so I wouldn't be concerned by this and also seems like it has somewhat being mitigated.

This week one machine somehow went bad and installation of apps started to hang. I took it offline for investigations but I am not sure about the root cause still and I am not sure this won't happen again to bring more flakey-ness. (but as @safern noted, maybe the only way to find out is to check it in and observe).

I still stand my point the best way to mitigate issues is to check-in and investigate issues that come show up.

I am not sure how to approach the question of capacity. I personally think we are fine for iOS but again - maybe the only way to see is to check it in.

Given the number of machines you shared above and the time it is taking to process work items it seems fine. We won't know this with certainty until we check it in, and as we enable more test projects and run on more PRs if we hit an issue we can always disable the leg as we did for a couple of days for WASM until we got more capacity.

Overall, this LGTM and we should try checking it in, specially since master is already 6.0.

@steveisok
Copy link
Member

@premun Please give your approval on the PR and we can merge.

@premun premun merged commit bb921d6 into dotnet:master Sep 10, 2020
TARGET_ARCH=$2
TARGET=
SCHEME_SDK=
[[RunCommands]]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm trying to get rid of the runcommands templating mechanism as having individual runner scripts for different targets (ios/android/wasm) seems to be sufficient. Doing that in #39923. That will basically undo this change here. Any objections?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, I support that change as we should be able to define it via MSBuild.

Copy link
Member

@steveisok steveisok Sep 10, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No objections from me.

@directhex - any objections?

@ghost ghost locked as resolved and limited conversation to collaborators Dec 8, 2020
@premun premun deleted the prvysoky/use-helix-sdk branch January 26, 2021 13:03
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants