Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve the no-op hashing performance #10068

Open
nkolev92 opened this issue Sep 25, 2020 · 8 comments
Open

Improve the no-op hashing performance #10068

nkolev92 opened this issue Sep 25, 2020 · 8 comments
Assignees
Labels
Area:RestoreNoOp The PackageReference no-op Functionality:Restore Priority:2 Issues for the current backlog. Tenet:Performance Performance issues Type:Bug

Comments

@nkolev92
Copy link
Member

Currently the no-op hash is calculated by creating a json string of the dependency graph spec for said project.
This involves a lot of duplicate + using jsonwriter for hash is not ideal.

This issue tracks the investigation for improving that hashing performance.

I discussed with @cristinamanum and @zivkan

The suggestion is move to a normal hashing function instead of using the package specwriter.
The challenge is that we don't know if the package spec is mutated during restore.

Given that here are the options.

  • Consider setting a Locked property, to disable mutation after a certain point.
  • Use an intermediate class that can generate the hashcode on demand (this will require coordination to ensure the hash generation is synchronized).
  • Restore Runner before invoking the restore command can prepopulate the hash code for all the package specs.

This is expected to be a long investigation, because while perf is one goal, we need to ensure this doesn't get regressed in the future.

I did prototype these changes and it showed a significant improvement for large slns with deep project graphs, up to 2-4x faster hash generation. The hash generation right now is about ~30% of the time spent in the restore part (not dg spec generation).

So cmd could see numbers like 7s to 6.8s, but VS will relatively better perf, example, 1.1s -> 900ms.

@nkolev92 nkolev92 added Area:RestoreNoOp The PackageReference no-op Functionality:Restore Tenet:Performance Performance issues Type:DCR Design Change Request Type:Bug Priority:1 High priority issues that must be resolved in the current sprint. and removed Type:DCR Design Change Request labels Sep 25, 2020
@nkolev92 nkolev92 added this to the Sprint 177 - 2020.09.28 milestone Sep 25, 2020
@nkolev92
Copy link
Member Author

Setting the estimate to 30-ish, but likely I will only spend 5 on it in S177.

@nkolev92
Copy link
Member Author

Moving to S178.

@nkolev92
Copy link
Member Author

deprioritizing for now.

@nkolev92 nkolev92 added Priority:2 Issues for the current backlog. and removed Priority:1 High priority issues that must be resolved in the current sprint. labels Jul 1, 2021
@KirillOsenkov
Copy link

KirillOsenkov commented Nov 16, 2023

@KirillOsenkov
Copy link

I'd love to see this prioritized because it does seem like we could speed up restore significantly.

@nkolev92
Copy link
Member Author

nkolev92 commented Mar 1, 2024

In this branch, I'm caching the hash generation for the package spec https://github.com/NuGet/NuGet.Client/compare/dev-nkolev92-storePackageSpecHashes.

There's another change for the algorithm itself: NuGet/NuGet.Client#5655.

@KirillOsenkov
Copy link

@jeffkl

@nkolev92
Copy link
Member Author

nkolev92 commented Mar 5, 2024

I rebased my branch to get NuGet/NuGet.Client@89f1251.

I'll probably need @jeffkl or @GenelleM to rerun the no-op tests with my change to see how much of a dent they're making.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Area:RestoreNoOp The PackageReference no-op Functionality:Restore Priority:2 Issues for the current backlog. Tenet:Performance Performance issues Type:Bug
Projects
None yet
3 participants