Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Feature]: Make retry count and backoff behavior configurable for Nuget Clients #11027

Closed
MattGal opened this issue Jul 14, 2021 · 7 comments · Fixed by NuGet/NuGet.Client#4220
Labels
Area:Protocol Client/Server protocol /code around it help wanted Considered good issues for community contributions. Priority:3 Issues under consideration. With enough upvotes, will be reconsidered to be added to the backlog. Type:Feature

Comments

@MattGal
Copy link

MattGal commented Jul 14, 2021

NuGet Product(s) Involved

NuGet.exe, dotnet.exe

The Elevator Pitch

Users often do know when they use Nuget Package feeds heavily, and certain > 500 responses (such as 503) are often circuit breaker responses to excessive traffic, only made worse by even more requests. Providing users with a way to change their retry interval, linearity of retry timing, and number of retries would let them tune their NuGet clients for their particular scenarios.

Additional Context and Details

We routinely hit HTTP 503s from over-use of Azure DevOps Nuget feeds. When this happens, it's usually due to overloading of the endpoint. After a chat with @zivkan , he helpfully pointed me at HttpRetryHandlerRequest.cs which nicely matched the logging I was seeing.

The way this file is authored today, someone using .NET core (i.e. dotnet restore ...) to do Nuget Restores will always try the same package three times in a row, with 200 ms delays between, for every active package source in the nuget.config file.

It would be nice to be able to configure longer (and ideally, exponentially backed-off) retry intervals, or maybe even disable retry entirely. Specific understanding of HTTP 503 could be cool too.

@heng-liu heng-liu added Area:Protocol Client/Server protocol /code around it and removed Triage:Untriaged labels Jul 15, 2021
@zkat
Copy link
Contributor

zkat commented Jul 19, 2021

Adding configuration can be a fairly costly thing for customers sometimes and lead to unexpected issues as things change. As such, I'm wondering if it would work for your case if NuGet added support for the Retry-After HTTP Header, which is designed precisely for this use-case, but that's a discussion you would need to have with Azure as well, as it would require changes on the service and the nuget client, both.

@zkat zkat added help wanted Considered good issues for community contributions. Priority:3 Issues under consideration. With enough upvotes, will be reconsidered to be added to the backlog. labels Jul 19, 2021
@MattGal
Copy link
Author

MattGal commented Jul 19, 2021

Thanks for the info @zkat. I think it's well within the realm of things we can change to ask Azure DevOps Package feeds to do this, and as an interested party in the .NET engineering services team, I'd gladly handle facilitating these conversations.

Is there any hope for shorter-term relief, such as at making the retry delay greater than 200 ms?

@zkat
Copy link
Contributor

zkat commented Jul 19, 2021

unfortunately, due to the stage in our net6/dev17 release cycle, we're really trying to avoid more impactful changes like this one, so unless it involves a server-requested delay change, we don't think it would be a good idea to do this as a quick patch.

That said thanks for starting this conversation with the Azure DevOps folks! I think this would be a cool feature to have and would be very helpful in general. I'm also going to tag in @NuGet/gallery-team in case they'd also be open to implementing this on the main feed!

@phil-hodgson
Copy link

Do we need it to be configurable, or instead, can we just improve the baked-in retry strategy?

@jmyersmsft
Copy link

jmyersmsft commented Aug 23, 2021

Retry-After won't help the specific scenario that's killing @MattGal 's team though -- it's a socket error, not an HTTP error.

EDIT: Ah I see he specifically mentioned 503s here, but the most recent incident wasn't 503s at all but connection resets.

@MattGal
Copy link
Author

MattGal commented Aug 23, 2021

@jmyersmsft correct that was long before your insight; this week I'm looking into prepping a PR with an eye for being able to make both failure modes more reliable in dnceng

@jmyersmsft
Copy link

oh wow I also totally missed that the dates are all way back in July, whoops

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Area:Protocol Client/Server protocol /code around it help wanted Considered good issues for community contributions. Priority:3 Issues under consideration. With enough upvotes, will be reconsidered to be added to the backlog. Type:Feature
Projects
None yet
5 participants