-
Notifications
You must be signed in to change notification settings - Fork 689
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improve NuGet-based MSBuild project SDK resolver #4522
Conversation
/cc @davkean I'm accessing |
@jeffkl |
I don't see how my changes would have affected those tests, I've re-run them yet again... |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM.
/// <summary> | ||
/// Represents an <see cref="EventSource" /> for the NuGet-based MSBuild project SDK resolver. | ||
/// </summary> | ||
[EventSource(Name = "Microsoft-NuGet-SdkResolver")] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Bear in mind for traces, Perf DDRITs, feedback bugs, targeted tracing etc to capture this, you'll need to opt them into this provider, or alternatively, reuse "Microsoft-VisualStudio-Common". The latter I'm less unclear of how that works to avoid clashes, etc.
@jeffkl Can you share out a zip/merged trace with the above changes? I want to see how much time is spent blocked on evaluation. |
My test scenario is 100 projects that each randomly reference one of five MSBuild project SDKs. The packages that represent the SDKs are hosted on a local feed so its not technically hitting the network (NuGet does have to check compatibility and unzip them). I do the following:
WithPackageRestore.PerfViewData.etl.zip I also did the steps again without clearing the NuGet global packages folder: WithoutPackageRestore.PerfViewData.etl.zip I see 500 total project evaluations and the NuGet SDK resolver is called 195 times but only does 5 unique look ups (the others receive a cached value and return almost immediately). There is some waiting for package restore since I'm using a semaphore to ensure only one restore is happening at a time: It looks like the first package restore is slower than the rest, I'm assuming because of JIT: |
/// <param name="logger">A <see cref="ILogger" /> to use when logging messages.</param> | ||
/// <param name="sdkLogger">A <see cref="NuGetSdkLogger" /> to use when logging errors or warnings.</param> | ||
/// <returns>An <see cref="SdkResult" /> representing the details of the package if it was found or errors if any occured.</returns> | ||
private SdkResult ResolveSdk(string id, NuGetVersion nuGetVersion, SdkResolverContext context, SdkResultFactory factory, ISettings settings, VersionFolderPathResolver versionFolderPathResolver, ILogger logger, NuGetSdkLogger sdkLogger) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is the new method that does package restore
@jeffkl Can you give me a trace with these options turned on after clearing the cache? That trace is missing CPU/thread time. |
@davkean Okay, I ran it with the following: The file is too big to upload to GitHub so I shared it with you on OneDrive, hopefully it doesn't take too long to download it. If so, I can try putting it on a VM that you can remote into. |
@jeffkl I took a look:
This looks great, and happy for this to go in. I think we still need to solve a few problems before we call this done done:
|
The packages are on a local feed so NuGet is just finding the
Yeah not surprising. There's another problem I was looking into a few years ago, the SDK resolvers are initialized via
Woohoo, that is fast! I did optimize loading of
100% agree. I have a few ideas in this regard:
I'll send you a link to another perfview ZIP with a network download where I did the following:
Keep in mind Speedtest.net says I'm getting 180Mbps down with a 6ms ping, so NuGet reportedly took 2.1 sec to download Arcade and 631ms to download CentralPackageVersions. |
This is a different project, right? So I can't compare evaluation time. This trace shows what I'm worried about. Now you haven't introduced this, but this lock contention is just an insane amount of block time. This is a whole lot of threads sitting there not making progress both with the resolver and restore itself (both are intermixed so we would need to tease them apart). When the UI thread blocks on one of these evaluations, it will hang: I'm away for next week, but when I get back this is a treasure trove of optimizations that we can make to this. However, due to the design of how SDK resolution works,. it is going to be completely unavoidable to some of these locks, and hence I cannot yet recommend that folks use this in VS scenarios. |
One more thing in this trace that I've noticed, is the impact this is having on other VS features. Because we're seeing more blocked time, more thread pool threads are being fired up, this is resulting in more work items getting worked on at the same, which results in more blocked time in features that we don't see elsewhere: For example, I never see the above in any real world solution (everything in set_HasAllInformation is blocked time). |
@davkean I'll schedule some time we can discuss this! |
211e309
to
03f48de
Compare
03f48de
to
90442c4
Compare
This PR has been automatically marked as stale because it has no activity for 7 days. It will be closed if no further activity occurs within another 7 days of this comment. If it is closed, you may reopen it anytime when you're ready again, as long as you don't delete the branch. |
This PR has been automatically marked as stale because it has no activity for 7 days. It will be closed if no further activity occurs within another 7 days of this comment. If it is closed, you may reopen it anytime when you're ready again, as long as you don't delete the branch. |
Bug
Regression? Last working version:
Description
I've decided to make one big change to the NuGet-based MSBuild project SDK resolver in order to fix the major problems which was a lot easier to test all at once.
Concurrency NuGet/Home#11443, NuGet/Home#8652
The original design of the SDK resolver assumed it would only be called once per SDK by MSBuild but we've since discovered that its being called a lot more. Particularly, in Visual Studio, the SDK resolver is being called a lot for design-time builds which leads to thread starvation during solution load. This is because the locking happens very late in the APIs, when a restore is taking place. This means a lot of threads are all trying to restore the same package, one of them finishes, and the other notice there's nothing to do after they did quite a bit to get that far.
To resolve this, I've added a
ConcurrentDictionary<LibraryIdentity, Lazy<SdkResult>>
to ensure that the same SDK is never resolved more than once per process. ForMSBuild.exe
this works the same as before but for Visual Studio it greatly improves SDK resolution times as some overhead isn't needed (loadingglobal.json
andNuGet.Config
/ISettings
).To further avoid thread starvation, I also added a
SempahoreSlim
around downloading of a package to ensure that the SDK resolver is never trying to do this work more than once at a time.How packages are restored NuGet/Home#10935, NuGet/Home#7777
In the original implementation, I made a new API around generating a fake project on the fly so a
DependencyGraphSpec
could be passed toRestoreRunner
. This is a lot of overhead to restore a single project.The new API uses a
RemoteWalkContext
to resolve the package and anIPackageDownloader
to download the package. Under the covers, these are the same classes used byRestoreRunner
without the other stuff.Logging NuGet/Home#11445
I added some low importance/verbose messages to help with debugging the SDK resolver. I also added an
EventSource
which lets you create traces with start/stop information. Here's an example of the events:In a solution with 5 SDKs and 100 projects, I can see the SDK resolver called 204 times:
And with this change, the logic to resolve an SDK really only happens once per SDK:
PR Checklist
PR has a meaningful title
PR has a linked issue.
Described changes
Tests
Documentation