Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

No-op restore caches and replays NU1900 #13615

Open
KirillOsenkov opened this issue Jul 10, 2024 · 15 comments
Open

No-op restore caches and replays NU1900 #13615

KirillOsenkov opened this issue Jul 10, 2024 · 15 comments
Assignees
Labels

Comments

@KirillOsenkov
Copy link

If you have a restore that failed with NU1900: Error occurred while getting package vulnerability data: Unable to load the service index for source...

You can then make a change to the NuGet Credential Provider, add the proper token, and re-run restore, it will still fail with the same error NU1900.

This is because the cache file is valid (since nothing changed on disk), so it doesn't attempt to re-restore.

This was an absolute nightmare to investigate (because when NuGet prints 401 you still think the problem is with auth, but really it doesn't even attempt to re-auth because it just plays back the no-op restore).

We also have warnings as errors, so the warning is logged as an error and fails the build.

@dotnet-policy-service dotnet-policy-service bot added the missing-required-type The required type label is missing. label Jul 10, 2024
@KirillOsenkov
Copy link
Author

I think we shouldn't be caching failed restores at all

@donnie-msft donnie-msft added Type:Engineering product/infrastructure work/not a customer bug/feature/DCR Triage:Untriaged Type:DCR Design Change Request Functionality:Restore Area:Authentication and removed missing-required-type The required type label is missing. Type:Engineering product/infrastructure work/not a customer bug/feature/DCR Triage:Untriaged labels Jul 11, 2024
@nkolev92
Copy link
Member

nkolev92 commented Jul 15, 2024

No-op should not be caching failed restores: https://github.com/NuGet/NuGet.Client/blob/dev/src/NuGet.Core/NuGet.Commands/RestoreCommand/RestoreCommand.cs#L806C21-L806C30, which leads to https://github.com/NuGet/NuGet.Client/blob/38ab39a6691e6e5d54e29055d9e258b8b5d57220/src/NuGet.Core/NuGet.ProjectModel/CacheFile.cs#L43, so I'm guessing there's probably something else going.

How did you conclude it was a no-op caching issue?

@nkolev92 nkolev92 added WaitingForCustomer Applied when a NuGet triage person needs more info from the OP Type:Bug and removed Type:DCR Design Change Request labels Jul 15, 2024
@KirillOsenkov
Copy link
Author

I debugged and saw that it considers everything up-to-date and doesn't attempt to rerun the restore

@dotnet-policy-service dotnet-policy-service bot added WaitingForClientTeam Customer replied, needs attention from client team. Do not apply this label manually. and removed WaitingForCustomer Applied when a NuGet triage person needs more info from the OP labels Jul 15, 2024
@KirillOsenkov
Copy link
Author

KirillOsenkov commented Jul 15, 2024

When you say noop should be caching failed restores, that's what I think is happening, but this is wrong behavior. If the restore failed, we should delete the cache.

@nkolev92
Copy link
Member

nkolev92 commented Jul 15, 2024

When you say noop should be caching failed restores, that's what I think is happening, but this is wrong behavior. If the restore failed, we should delete the cache.

My bad, typo. It should not.
The cache file has a status in it, which tells us not to cache it.

I debugged and saw that it considers everything up-to-date and doesn't attempt to rerun the restore

Might need help getting to this ourselves, since I'm wondering why the code I linked wasn't working.

@dotnet-policy-service dotnet-policy-service bot added WaitingForCustomer Applied when a NuGet triage person needs more info from the OP and removed WaitingForClientTeam Customer replied, needs attention from client team. Do not apply this label manually. labels Jul 15, 2024
@KirillOsenkov
Copy link
Author

Try injecting a network error (like 401 access denied to an azdo feed with a wrong PAT), then restore again. It will play back the 401 but won't even attempt to hit the network.

@dotnet-policy-service dotnet-policy-service bot added WaitingForClientTeam Customer replied, needs attention from client team. Do not apply this label manually. and removed WaitingForCustomer Applied when a NuGet triage person needs more info from the OP labels Jul 15, 2024
@nkolev92
Copy link
Member

cc @zivkan in case anything rings a bell.

I'm wondering if there's something we're messing up on the audit front, since Audit is not supposed to fail restore by default (hence NU1900), but I don't think we're doing anything custom. it's originally a warning getting logged.

@dotnet-policy-service dotnet-policy-service bot added WaitingForCustomer Applied when a NuGet triage person needs more info from the OP and removed WaitingForClientTeam Customer replied, needs attention from client team. Do not apply this label manually. labels Jul 15, 2024
@nkolev92 nkolev92 added WaitingForClientTeam Customer replied, needs attention from client team. Do not apply this label manually. Triage:Investigate and removed WaitingForCustomer Applied when a NuGet triage person needs more info from the OP labels Jul 15, 2024
@KirillOsenkov
Copy link
Author

we may have WarningsAsErrors set

Audit and check for vulnerabilities does ring a bell, I think I saw something while debugging

@KirillOsenkov
Copy link
Author

And I did see the NU1900 warning being converted to error after it's been played back, so maybe the restore is considered successful because it doesn't know the warning will later be converted to an error by MSBuild? This would explain what we're seeing

@nkolev92 nkolev92 added Category:Quality Week Issues that should be considered for quality week and removed WaitingForClientTeam Customer replied, needs attention from client team. Do not apply this label manually. labels Jul 22, 2024
@nkolev92 nkolev92 self-assigned this Jul 22, 2024
@nkolev92 nkolev92 added the Priority:2 Issues for the current backlog. label Oct 3, 2024
@nkolev92
Copy link
Member

nkolev92 commented Nov 1, 2024

I had a really tough time reproducing this.

I tried a bunch of scenarios:

  • Successful restore with all packages on disk and then I add a bad soruce.
  • Successful restore with a real private source, and then I tried to provide bad credentials.
  • Failed restore with all packages on disk and then provide bad creds.

I couldn't really get a true repro.

Then I looked into recent changes in the code, and NuGet/NuGet.Client@f4a6118#diff-df853d83ba471f11c087ec99cfc8ea2f1b59dd456de4f01199e643480d697011 stood out, because it changes up the error handling with Audit in particular.

I think this is likely fixed.

@KirillOsenkov

If by any chance you do hit this again, please get the content of the obj/* directory, so we can see the status of restore when restore is run again.

@nkolev92 nkolev92 closed this as not planned Won't fix, can't repro, duplicate, stale Nov 1, 2024
@KirillOsenkov
Copy link
Author

OK sounds good, I'll look into it further if I ever see this again.

@KirillOsenkov
Copy link
Author

@nkolev92 I just saw this again, for a project called restore.proj (not a .csproj).

NuGet was just playing back the NU1900 without hitting the network, but as soon as I cleared the obj directory it hit the network again and fixed itself.

I can't reopen this bug, can you help me reopen?

@KirillOsenkov
Copy link
Author

I forgot to stash the obj contents, but this time I'll remember, I'm seeing it every few days or so

@KirillOsenkov
Copy link
Author

I can give you the obj contents after a successful restore, but I doubt that will help (you want the cached NU1900, right?)

@KirillOsenkov
Copy link
Author

for some reason it's specific to package vulnerability data (AuditChecker/AuditUtility?)

@nkolev92 nkolev92 reopened this Dec 9, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants