-
-
Notifications
You must be signed in to change notification settings - Fork 31.6k
http: support HTTP[S]_PROXY environment variables in fetch #57165
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Review requested:
|
I believe this will be a notable change. |
The
notable-change
Please suggest a text for the release notes if you'd like to include a more detailed summary, then proceed to update the PR description with the text or a link to the notable change suggested text comment. Otherwise, the commit will be placed in the Other Notable Changes section. |
Depends on #57236 @mcollina why is this semver-major? I think if we make it behind another environment variable first (NODE_USE_HTTP_PROXY in this PR, can change to a different name if a better one comes up), this is still semver-minor, and only another PR that makes it enabled by default and have to be disabled explicitly would be semver-major? |
The HTTP_PROXY env is used also by other programs, so it's relatively possible that it's set in an environment, making it a behavior change. |
Would the intention be to limit this to just undici/fetch? That is, would this also work for |
Given how the current http.Agent work, I'm not entirely sure it's possible to enable it by default without breaking some part of the ecosystem unexpectedly(see the monkeypatching that nock does, as well as observability providers). |
I think a probable path forward would be:
3 and 4 can be done independently. In the worst case, we stop at 2, but at least it still allows users being trapped by firewalls and not always having control over the code making requests to opt into a solution that solves their headaches by setting the environment variable, which might be good enough for many already. I think 3 might not be that risky, 4 will need some investigation, but we can use 2 to let users try it out and explore any possible interop problems with existing libraries. So I expect 2 would need a bit more time to fully develop but since it's opt-in as an experimental feature it seems pretty reasonable. |
Rebased after #57236 landed. If the plan proposed in #57165 (comment) looks good I'll move on adding docs and more tests. |
setGlobalDispatcher(envHttpProxyAgent); | ||
// TODO(joyeecheung): handle http/https global agents and perhaps Agent constructor | ||
// behaviors. | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Out of curiosity... What does the error look like if the proxy is misconfigured? That is, if I set HTTP_PROXY
to an invalid value and the dispatcher is not able to actually make the connection? It would be helpful to ensure that the error gives enough indication for the user to know that the issue is the proxy config.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It depends on how invalid the value is - typically this just times out, just like what happens if you send the request to an invalid server. Or if the domain name is invalid then you get a DNS error, etc. I am not sure if it makes sense to pre-emptively check for anything, however. Sending requests to an invalid proxy server is not too different from sending requests without proxy to an invalid server, in essence. As far as I know, other command line tools that support these environment variables would not really check the validity of the proxy - if the checking process is not part of the typical protocol (in the case of HTTP proxies, I don't think there is one), then performing a separate check can also lead to a TOCTOU problem. The proxy server being invalid during process startup does not mean that it won't be valid when the application actually makes the request.
effc51c
to
a8cc0f2
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm
Landed in 5a7b7d2 |
I have a WIP in https://github.com/joyeecheung/node/tree/http-agent-proxy to support http agents (for that one I am following the behavior of curl instead of undici, I feel that we should fix the undici behavior and align with curl ultimately), though I also noticed certain things that should better be done for sanity of the code (e.g. reusing the parsing of the env var somehow with undici), going to open a tracking issue as follow-up... |
@joyeecheung for that implementation I would avoid adding this behavior to the main http.Agent and create a new one |
Does it have to be separate? I think when guarded by the environment variable, it doesn’t even matter all that much. From my testing, if guarded by an env var it works other patching and overidding. And with the way the agents are structured, keeping an optional path inside is much better than creating an extra subclass (which also is how e.g. ca and all the other agent options work, I think configuring it from the option bag would lead to way cleaner code than trying to subclass). |
I'm really skeptical about the long term maintainability of that solution, and how it would play out with the rest of the ecosystem that relies on patching the Agent etc. That's one of the reasons why we went for a completely different design in undici. |
From what I can tell, patching it directly is much simpler than subclassing, due to how the http agents are designed. Maybe it's different for undici because undici agents have a different subclassing pattern. But when I look at existing proxy agent implementations in npm, I think >50% of the code is basically fighting against Node.js's internals for the sake of inserting logic into the default behavior while maintaining some invariants in Node.js, but then it seems the extra logic put into place to mimic the invariants maintained by Node.js can still fail in various edge cases... For example, see https://github.com/TooTallNate/proxy-agents/blob/536aaa5165b888c548cbfa8095801c05478605fe/packages/http-proxy-agent/src/index.ts#L135 and almost the entire purpose of agent-base: https://github.com/TooTallNate/proxy-agents/blob/536aaa5165b888c548cbfa8095801c05478605fe/packages/agent-base/src/index.ts Whereas maintaining an optional path simply removes the need of all these patching and produces much simpler code, from what I have implemented so far (it basically just comes down to parsing the configuration + additional code to modify the requests as what RFC 7230 says, there isn't much code that I would consider as "fighting Node.js internals" when the optional paths are directly in internals) |
Ok! |
When enabled, Node.js parses the `HTTP_PROXY`, `HTTPS_PROXY` and `NO_PROXY` environment variables during startup, and tunnels requests over the specified proxy. This currently only affects requests sent over `fetch()`. Support for other built-in `http` and `https` methods is under way. PR-URL: #57165 Refs: nodejs/undici#1650 Reviewed-By: Matteo Collina <matteo.collina@gmail.com>
When enabled, Node.js parses the `HTTP_PROXY`, `HTTPS_PROXY` and `NO_PROXY` environment variables during startup, and tunnels requests over the specified proxy. This currently only affects requests sent over `fetch()`. Support for other built-in `http` and `https` methods is under way. PR-URL: #57165 Refs: nodejs/undici#1650 Reviewed-By: Matteo Collina <matteo.collina@gmail.com>
When enabled, Node.js parses the `HTTP_PROXY`, `HTTPS_PROXY` and `NO_PROXY` environment variables during startup, and tunnels requests over the specified proxy. This currently only affects requests sent over `fetch()`. Support for other built-in `http` and `https` methods is under way. PR-URL: #57165 Refs: nodejs/undici#1650 Reviewed-By: Matteo Collina <matteo.collina@gmail.com>
When enabled, Node.js parses the `HTTP_PROXY`, `HTTPS_PROXY` and `NO_PROXY` environment variables during startup, and tunnels requests over the specified proxy. This currently only affects requests sent over `fetch()`. Support for other built-in `http` and `https` methods is under way. PR-URL: #57165 Refs: nodejs/undici#1650 Reviewed-By: Matteo Collina <matteo.collina@gmail.com>
Do we want this on v22.x? I've added
baking-for-lts
|
Underrated feature |
When
NODE_USE_ENV_PROXY=1
is set, Node.js parses theHTTP_PROXY
,HTTPS_PROXY
andNO_PROXY
environment variables during startup, andtunnels requests over the specified proxy.
This currently only affects requests sent over
fetch()
. Support forother built-in
http
andhttps
methods is under way.This only tries to support
HTTP_PROXY
etc. environment variables directly instead of exposing the lower-level dispatcher setter and EnvHttpProxyAgent, but I think for many users, having the environment variables supported out of the box is already enough or more important to solve their headaches, while the lower-level utilities are less essential - many other language runtimes/command line tools only support these environment variables out of the box, without necessarily providing more utilities for further customization, and that seems enough for most people who are behind a proxy already.The idea is that we can first make this handling opt-in via another environment variable (
NODE_USE_ENV_PROXY
here, or we can use others if there are better names). And in the future we can enable this flag by default, as this is still opt-in under well-known environment variablesHTTP_PROXY
. I think some investigation is needed to make sure that enabling it by default would not step on the toes of other libraries/tools already recognizing these environment variables and do their own thing.This requires a patch in undici to make the EnvHttpProxyAgent usable from the Node.js bundle, which I included locally in this PR and a locally built bundle for demonstration purpose. I opened a separate PR to undici to see what's the best way to patch it nodejs/undici#4064Patch merged into undici and rolled into the main branch.This is only an initial implementation. The high-level plan is:
fetch
behindNODE_USE_ENV_PROXY
as semver-minor (what this PR does)http.request
etc. behindNODE_USE_ENV_PROXY
as semver-minor (likely just adding in the socket/header rewrite in the default implementation ofhttp.Agent
constructor, behindNODE_USE_ENV_PROXY
, so that custom agents also get this behavior in their default implementation when the environment variable is passed in)NODE_USE_ENV_PROXY=0
http.request
as semver-major (this has more interop risk with existing libraries patching the global agent, so we may not end up doing it and may end up always requiring NODE_USE_ENV_PROXY)Some TODOs that will need to be fixed in undici for
fetch
:Refs: nodejs/undici#1650