-
-
Notifications
You must be signed in to change notification settings - Fork 9.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix: improve proxy handling #5893
Conversation
…oxy-Authorization on initial request
Note: there is even more worthwhile performance tuning to be done here, most notably - |
@@ -251,13 +251,25 @@ def resolve_redirects(self, resp, req, stream=False, timeout=None, | |||
url = self.get_redirect_target(resp) | |||
yield resp | |||
|
|||
def rebuild_auth(self, prepared_request, response): | |||
def rebuild_auth(self, prepared_request, response, proxies): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a functionally public API. We can't change the signature like this
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh my bad, I thought this was internal. I'll look into for an alternative approach
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can just extract this change to a different (private?) function and call it before rebuild_auth
. WDYT?
@@ -633,7 +634,8 @@ def send(self, request, **kwargs): | |||
kwargs.setdefault('stream', self.stream) | |||
kwargs.setdefault('verify', self.verify) | |||
kwargs.setdefault('cert', self.cert) | |||
kwargs.setdefault('proxies', self.rebuild_proxies(request, self.proxies)) | |||
if 'proxies' not in kwargs: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't know this is correct. You've added no tests. Why do you feel this change is necessary?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@sigmavirus24 this change is needed because without it we are calling to re-build proxies from the environment even when proxies have been set/provided.
In one use test case I have setup - the time to "retrieve" a cached connection (using the cache control adapter) 10,000 times goes from 9.741 seconds and 25212240 function calls to 9902240 function calls & 6.169 seconds after applying this change.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@sigmavirus24 the change in this line is twofold:
- Previously, if
proxies
was already set onkwargs
, therebuild_proxies
method was called, but thekwargs
were not updated (due tosetdefault
). By adding this check we're avoiding unnecessary calls. rebuild_proxies
was removing theProxy-Authorization
header, and this behavior was files as a regression here: Manually set Proxy-Authorization header is always removed in 2.26 #5888. It no longer does that.
I intend to add tests, but as per the contribution guidelines I thought I'd get feedback as soon as I can to avoid unnecessary work.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for starting on this @omermizr. I think the logic on point 1 is sound, we shouldn't have passed rebuild_proxies()
as an argument in setdefault to begin with, regardless of #5888.
As for point 2, I'm hesitant to remove the stripping behavior directly from rebuild_proxies
. Given how often people subclass Session
, someone's developed something this breaks. So, as an alternative I think we should think about breaking this into two separate functions. One that performs our proxy building and one that handles our proxy auth.
At a very high level, rebuild_proxies
and rebuild_auth
need to functionally stay the same. I'm thinking the logic for rebuild_proxies
without auth changes can get moved to a new build_proxies
method. The Proxy-Authorization
logic is already decoupled from the proxy workflow, so that can get moved into a rebuild_proxy_auth
method. For this first pass, we should leave the logic untouched unless there's something mandatory.
At that point we'd rewire things to effectively be:
def rebuild_proxies(self, prepared_request, proxies):
new_proxies = self.build_proxies(prepared_request, proxies)
self.rebuild_proxy_auth(prepared_request, new_proxies)
return new_proxies
...
def send(self, request, **kwargs):
...
if 'proxies' not in kwargs:
kwargs['proxies'] = self.build_proxies(request, self.proxies)
@sigmavirus24 please disagree with me here if you think I'm on the wrong track. We may also want to consider making these utility functions, or private on the Session. I don't know how I feel about adding new interface surface. If this is a non-starter, we may want to consider simply reverting #5681 for now and going back to the drawing board.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In order to address the performance regression, it'd be enough to change the setdefault. I didn't want to do that because it would've introduced inconsistency with how the proxy auth header is handled (stripping the header would depend on whether 'proxies' is passed to the function or not).
The way I see it we have a few options:
- Only change the setdefault and document the inconsistency or create a new issue to address it later.
- Break the code up into separate functions like @nateprewitt suggested here.
- Add a boolean parameter with a default argument to the rebuild_proxies function. That way we'll have better control over the proxy auth header handling, and we won't break the API.
WDYT?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I didn't want to do that because it would've introduced inconsistency with how the proxy auth header is handled (stripping the header would depend on whether 'proxies' is passed to the function or not).
Yep, agreed, I don't think we want to have this inconsistency in the API as it's going to surprise most users, and is likely the wrong behavior when proxies
aren't supplied. I think that eliminates option 1. The use of rebuild_proxies
where it is now is frankly just wrong to begin with.
For option 2, I've thought some more, and am leaning towards having the functionality potentially moved out to utils, so we're not adding new methods to Session. We may only need to move the "build" logic out and can leave the auth portion inline for rebuild_proxies
. If we don't do that we would need to effectively duplicate the first part of the function elsewhere which would lead to further drift as each function evolves. Session already has a number of issues with not behaving like the standard requests flow, so whatever change we make should minimize that as much as possible.
I don't know if option 3 works because we're changing the signature on a public API which is what we wanted to avoid. Anyone who's already extended this function, and is not using keyword arguments, will likely see unexpected breakage from that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've posted #5924 as a talking point for what a potential solution for #5888 may look like. Please let me know if you have any thoughts.
As for #5891, the behavior of checking the environment for proxy settings before resolving for send
was the primary purpose of #5681. That's a fairly expensive process though when there's a significant number of variables, which was an unintended side effect. The new behavior is correcting what I'd deem a bug but at the cost of performance. I'm not sure I see a straight forward way to mitigate that and maintain backwards compatibility yet.
For the time being, setting proxies
on the Session or disabling trust_env
are the two escape hatches for the performance impact. This is something we would likely want to consider revisiting a breakage in a new major version. I'm happy to review the performance idea here but we'll need significantly more testing to ensure we're not breaking functionality.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I reviewed your PR, basically LGTM.
I think you actually handled the most important performance regression (at least for my flow).
Now, while the escape hatches are ok, I don't think that's really a good way to move forward considering the default behavior is now less performant. Given that requests
is such a ubiquitous library, making the defaults less performant will surely cause a lot of degradations for a lot of python libraries and a lot of services.
Having said that, I think the simple fix where we check if proxies
is already set should be good enough for most use cases, and you already did that in your PR 😃
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think that's really a good way to move forward considering the default behavior is now less performant. Given that requests is such a ubiquitous library, making the defaults less performant will surely cause a lot of degradations for a lot of python libraries and a lot of services.
Thanks for taking a look! Completely agree, Requests has a pretty broad and diverse user base. It's always a difficult balancing act of usability and correctness that works for everyone. Hopefully we've arrived at one of the better outcomes for this case 😄
I don't believe this refactoring is what nate or I had in mind and it breaks a public API without adding additional testing |
As a user of the library, I found similar & also noticed that setting |
Like I said, I'll try to avoid breaking the public API (I obviously don't want to introduce breaking changes). |
|
||
proxy = environ_proxies.get(scheme, environ_proxies.get('all')) | ||
environ_proxies = get_environ_proxies(url, no_proxy=no_proxy) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The changes here were done mainly to avoid calling should_bypass_proxies
if not necessary because it is more of a bottleneck than get_environ_proxies
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IMHO this should be inside / under the self.trust_env
code path.
…y not rebuilding proxies if proxies have been supplied." This reverts commit 3c32ae8.
I would like to point out a solution for the proxy problem. I have been applying this solution every time I need to build an venv behind proxy. I propose the following change at line: Line 530 in 913880c
Replace it by: proxies = proxies or self.proxies This change will make this hierarchy Also discussed here: #5735 (comment) |
@nateprewitt Sure thing! Thanks for addressing this 😃 |
This is my attempt to resolve #5891 as well as #5888.