Skip to content

CoreCLR test suite optimization proposal: support for test project grouping #54512

Open
@trylek

Description

@trylek

Problem description

Current CoreCLR Pri1 test set has over 10K individual test projects. This is beyond the means of a single msbuild execution and is mitigated by partitioning the test projects into subgroups. Today at least three such partitionings exist (partitioning during test build, partitioning into XUnit wrappers, partitioning for Helix execution). While @echesakov did his best to make the Helix partitioning as good as possible, the entire logic adds enormous complexity to the test system, complicates developer ramp-up and is a constant cause of developer complaints. The 10K separate apps also mean 10K .NET Core runtime startups incurring enormous testing cost, it's not hard to imagine that the repeated .NET Core runtime initializations take an equal or greater amount of time than the actual test code execution.

Caveat - we don't yet have any hard data to substantiate this claim. I'm working on figuring out how to produce it in some form.

Ideal state

As I personally heard in presentations by @jaredpar and @stephentoub, perf optimization of Roslyn and libraries tests that took place several years ago involved the reduction of the number of separate test apps as a key step. I believe we should take the same route in CoreCLR testing; in bulk testing (local or lab Pri0 / Pri1 testing) we should run fewer than 1K test apps, ideally less than 500. Once that happens, we should be able to remove all the partitioning goo and just run the tests one by one, both locally and in Helix.

Downsides, challenges and problems to solve

Today, about 3/4 of the test suite corresponds to the JIT unit tests - a search in my runtime repo clone under src\tests\JIT for *.csproj/ilproj yields 7312 matches. If we're serious about this effort, we must tackle JIT tests first. According to the proposed ideal state, we should strive to reduce the number of separate apps to about 300~400. I think that roughly corresponds to two subdirectory levels under JIT (e.g. Methodical\divrem) but I have yet to provide more precise numbers.

While the test aggregation is expected to solve a known set of problems (test system complexity caused by the partitioning systems, performance of test build and execution), it has the potential to introduce a new set of problems we should plan ahead of and work on fixing or mitigating as part of the proposal. In particular, a larger number of tests being run as a single app can complicate debugging, profiling, TTT analysis, and JIT dump analysis; runtime and / or hard crash in one test tears down the subsequent tests in an aggregated test app, reducing test coverage in the presence of failures.

The counter-arguments clearly highlight sets of tests that are unsuitable for aggregation - typically interop tests where the individual tests sometimes tamper with the machine state (e.g. by registering COM classes), perhaps also the GC tests that are often lengthy and / or have the potential to tear down the app like in the case of negative OOM tests.

Even in cases where the test aggregation is expected to be benign, e.g. in the case of the JIT methodical tests, we still need to address the question of aggregation hampering developer productivity, typically in various diagnostic scenarios. @AndyAyersMS proposed a dual system where the tests would be aggregated by default in bulk testing but the developer could explicitly request the build of a single test case to mitigate the aforementioned complications.

Proposed solution

I have yet to make any real experiments in this space but it seems to me that we might be able to solve much of this puzzle by introduction of group projects. My initial thinking is that, for a particular test project, e.g. JIT\Methodical\divrem\div\i4div_cs_do.csproj, we would use a new property to declare that the test is a part of the test group project, say, JIT\Methodical\divrem\divrem_do.csproj (JIT tests often come in groups that require different optimization flags so that would need preserving in the groupings). Hopefully it should be possible to tweak msbuild to normally build just the group projects; these would need to use either some form of code generators or reflection to run all the relevant test “cases” represented by the grouped projects but that should no longer blow up msbuild as we could easily build the individual group projects serially.

I already have a work item on adding a new command-line option to src\tests\build.cmd/sh to let developers build just a particular test project or project subtree. It should be trivial to consolidate this option with the proposed project grouping such that in bulk testing we’d end up with just the group projects whereas targeted local scenarios would end up producing a single-test executable (as before) with the caveat that trying to build the entire tree in this “separate” mode would likely trigger an msbuild OOM or some other failure.

Proposed sequencing

  1. I’m going to perform at least a series of local experiments to measure how much of the running time of the individual tests is coming from runtime initialization vs. actual test code execution and I’ll share them on this issue thread. I have yet to see whether this approach can be easily applied in the lab. Locally it might suffice to tweak R2RTest to use ETW mode to monitor at which point Main got executed.

  2. Assuming the perf experiments do confirm a perf win in test grouping (especially for tiny tests like the JIT unit tests) and we agree on this proposal in some form, I’ll look into implementing its basic underpinnings in the CoreCLR test build / execution infra scripts and I’ll test the approach on a small suite of JIT tests.

  3. Once the PR per (2) is merged in, we can trigger a “quality-week-like” combined effort to apply the technique to additional CoreCLR test areas. At this point we would be still using the pre-existing infrastructure including the XUnit wrappers and test partitionings, we’d just gradually reduce the number of test apps being run. (The proposed conservative approach doesn’t address actual test code merging i.e. the test build time win will likely be smaller if any. This is further aggravated by the fact that many of the JIT unit tests come in form of IL source code.)

  4. The work per (3) should yield gradually accumulating benefits in form of reducing the total CoreCLR test running time, both locally and in the lab. Once the work advances enough so that we get under the envisioned 1K test projects, we can proceed to experimenting with removal of the test partitionings. At that point we may be also able to consider removing the Pri0 / Pri1 distinction and always run all the tests.

Thanks

Tomas

/cc @dotnet/runtime-infrastructure

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    Status

    No status

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions