-
Notifications
You must be signed in to change notification settings - Fork 2.6k
Improve Windows test build performance by building projects in groups - 4x less memory 326% faster build. #17161
Conversation
test Windows_NT x64 Checked jitincompletehwintrinsic test Windows_NT x86 Checked jitincompletehwintrinsic test Ubuntu x64 Checked jitincompletehwintrinsic |
Performance comparison between priority 1 build and test runs - time elapsed from build start is on the left: With MSBuild cache workaround:
Without MSBuild cache workaround:
|
Test build time goes from 165 min to 30 min? Very nice indeed. It will be interesting to see what this does on Linux once it's working. I wonder if we can express the partitioning a bit more organically so that as new test directories are added they don't get inadvertently left out. |
After investigating the reasons for slow build I think it would be very helpful if all test projects structure will be rewritten to use large projects with batch build MSBuild feature what could reduce build time even further - my guess is that to half of what we have with this workaround ~15 - 20 min. MSBuild batch build feature is used now as well but in a different way. If we use this path than it would be much easier to add tests and keep them in the build system. But it is a complete rewrite of current > 12k projects 😄 |
@RussKeldorph @AndyAyersMS @weshaggard @jkotas PTAL PR is ready for review - it affects Windows only but improvements are so large and better than expected that it could be worth merging without Unix test build port which anyway is not used in CI as all tests are built on Windows. |
Most of the jit test project files are simple and very stylized. So there's a good chance we can write a program to revise the project files as needed. For instance there is some simple rewriting in CheckProjects. |
Agree, this work has to be done programmatically or with scripting. |
Looks like it would be trivial to enable for Linux too. I do not have time before ZBB though |
True and I have the same time problem |
Oh man this is awesome. |
set __msbuildWrn=/flp1:WarningsOnly;LogFile="%__BuildWrn%";Append=!__AppendToLog! | ||
set __msbuildErr=/flp2:ErrorsOnly;LogFile="%__BuildErr%";Append=!__AppendToLog! | ||
|
||
set TestBuildSlice=%%G |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Any reason you aren't passing this as a property instead of depending on environment variable?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Both are possible but for me it is more clear code if it remains environment variable which like property passed on MSBuild commandline will be available during project build, however, commandline is already 233 characters long so I decided to not pass it that way while having very same effect.
If that style is not expected in build scripts I am open to any changes which will suit the team
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was looking at the command line to try and figure out how you split these and I didn't find it and then realized it was happening because of the env variable. I generally try to avoid env variables in build scripts/projects because it is harder to reproduce and diagnose. In this case I don't have a strong opinion though it is just a general principle.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If it is general principle and you have had problem decoding usage of variable than I think it should be passed via command line. Would it be fine to fixup this in follow PR?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would it be fine to fixup this in follow PR?
Yes that is fine with me.
While the performance improvments look great and I'm cool with working around this issue I share the worry with @AndyAyersMS about tests getting missed because we don't maintain this file and include them. The maintenance of this worries me some. |
One approach would be to base the bucket on a few characters of a hash of the project name. I would use 256 buckets based on two characters of the md5. Maybe the test build project could be built and assign the groups. Or perhaps the test groups could be included? |
Ideally it we could always wildcard all the test projects and then do the partitioning dynamically. Perhaps a target that generates this partition group and then does the build. |
@weshaggard @AndyAyersMS @sdmaclea The current groups were created by hand after tedious testing to balance load for each group. I entirely agree that we need better mechanism for group creation and the solution which I would prefer could be based on splitting in MSBuild collection of all globbed projects into equal "sized" (by build time) groups and use them during build. The hard part here is that projects need to be evaluated to a level where MSBuild would be aware to which priority they belong. There are still other optimizations possible including one I have discussed with @AndyAyersMS where we could reduce number of projects by roughly 3 orders of magnitude from > 12k to 16 and build individual executables and startup scripts using batch build. Then getting project built would require instead of creating new project just inclusion of files into existing large project. Another solution, albeit less efficient, could be based on programmatic or script based generation of of dirs.proj file once new project with tests is added.
Every startup of the whole MSBuild node tree is quite time consuming not only due to CLR and MSBuild startup costs but due to first evaluation of the whole common projects structure, globbing etc. We do not know currently where optimum lies. For instance 16 groups for inner loop build is too much. Anyway, if developer adds new test he/she should always check if it is active - tests are run. If we provide clear documentation and instructions even in this shape inclusion of new projects should make anyone with basic knowledge of MSBuild capable to include project into build. |
This is one option I have been thinking about. However, the most tempting one is to rewrite whole project system from > 12k projects down to number of groups we need. This should further significantly decrease build time. |
Due to ZBB and my everyday work assignments I am very tight on time during current two weeks. I hope you could agree to commit this PR in current or close to current state and I would follow with improvements after 8th April. In current state we reduce test time for all HW intrinsic work from 4,5 to less than 2h - we run jitstress jobs on every PR we work on. |
@4creators We really appreciate all the efforts you put in on improving our infrastructure.
I don't have any concerns with this going in but I don't actively work in this repo so others should chime in. @RussKeldorph @AndyAyersMS @BruceForstall any issues with this going in? |
This is a tremendous improvement @4creators . thank you. |
This is pretty ugly, but it will save so many resources that it's hard to resist taking it immediately. Hopefully we get time to refactor the test projects or find some better fix soon. @rainersigwald FYI. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks a lot for this, @4creators.
Yes I think we can take this as is and work on a more comprehensive fix later. @4creators thanks much. Can you open a follow up issue for a revised version that addresses the concerns raised above? |
@AndyAyersMS Issue #17282 tracks the code review feedback. |
@4creators It looks like this change caused a failure in the ARM Pri-1 build with these failures:
It looks like the tests\src\hosting directory doesn't get built at all? |
@BruceForstall Investigating |
@4creators I've already submitted a PR that should fix this: #17322 |
…fore file Fixes #17503 The error is caused by both: 1. Unnecessary usage of 'managed_test_build' semaphore file which is incorrectly set after /t:BatchRestorePackages build target and prevents managed test build which is invoked after semaphore alredy exists 2. Masked by the above error is a wrong condition in dirs.proj non-windows test build which was introduced by PR dotnet#17161 and which prevented unix build due to missing dotnet#17161 group build port to unix
Ports dotnet#17161 to linux
Ports dotnet#17161 to linux
* Split unix test builds in slices Ports #17161 to linux * Address review feedback
* Split unix test builds in slices Ports dotnet#17161 to linux * Address review feedback
Fixes #14594 and is a part of #17149
The build time improvement for priority 1 tests: 02:09:09.700 (2 hours 9 minutes 9 seconds .700) on 03:46:42.325 original time and now takes 01:37:32.619 what means that build and test is ~132% faster.
Test build time improved from 02:45:09:518 to 00:38:47.199 what is ~326% faster.