Skip to content

http: support HTTP[S]_PROXY environment variables in fetch #57165

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Mar 20, 2025

Conversation

joyeecheung
Copy link
Member

@joyeecheung joyeecheung commented Feb 21, 2025

When NODE_USE_ENV_PROXY=1 is set, Node.js parses the HTTP_PROXY,
HTTPS_PROXY and NO_PROXY environment variables during startup, and
tunnels requests over the specified proxy.

This currently only affects requests sent over fetch(). Support for
other built-in http and https methods is under way.


This only tries to support HTTP_PROXY etc. environment variables directly instead of exposing the lower-level dispatcher setter and EnvHttpProxyAgent, but I think for many users, having the environment variables supported out of the box is already enough or more important to solve their headaches, while the lower-level utilities are less essential - many other language runtimes/command line tools only support these environment variables out of the box, without necessarily providing more utilities for further customization, and that seems enough for most people who are behind a proxy already.

The idea is that we can first make this handling opt-in via another environment variable (NODE_USE_ENV_PROXY here, or we can use others if there are better names). And in the future we can enable this flag by default, as this is still opt-in under well-known environment variables HTTP_PROXY. I think some investigation is needed to make sure that enabling it by default would not step on the toes of other libraries/tools already recognizing these environment variables and do their own thing.

This requires a patch in undici to make the EnvHttpProxyAgent usable from the Node.js bundle, which I included locally in this PR and a locally built bundle for demonstration purpose. I opened a separate PR to undici to see what's the best way to patch it nodejs/undici#4064 Patch merged into undici and rolled into the main branch.

This is only an initial implementation. The high-level plan is:

  • Support proxy environment variables in fetch behind NODE_USE_ENV_PROXY as semver-minor (what this PR does)
  • Support it in http.request etc. behind NODE_USE_ENV_PROXY as semver-minor (likely just adding in the socket/header rewrite in the default implementation of http.Agent constructor, behind NODE_USE_ENV_PROXY, so that custom agents also get this behavior in their default implementation when the environment variable is passed in)
  • Enabling it by default for fetch as semver-major (this has very little interop risk with existing libraries, potentially), but can still be disabled using NODE_USE_ENV_PROXY=0
  • Enabling it by default for http.request as semver-major (this has more interop risk with existing libraries patching the global agent, so we may not end up doing it and may end up always requiring NODE_USE_ENV_PROXY)

Some TODOs that will need to be fixed in undici for fetch:

Refs: nodejs/undici#1650

@nodejs-github-bot
Copy link
Collaborator

Review requested:

  • @nodejs/security-wg

@nodejs-github-bot nodejs-github-bot added lib / src Issues and PRs related to general changes in the lib or src directory. needs-ci PRs that need a full CI run. labels Feb 21, 2025
@joyeecheung joyeecheung changed the title http: support http proxy for fetch under NODE_USE_ENV_PROXY http: support HTTP_PROXY environment variables Feb 21, 2025
@joyeecheung
Copy link
Member Author

cc @mcollina @ronag (maybe there are others that I am forgetting, but you held the floor most of the time in the session in the London summit about this :))

@joyeecheung joyeecheung changed the title http: support HTTP_PROXY environment variables http: support HTTP[S]_PROXY environment variables Feb 21, 2025
@legendecas
Copy link
Member

I believe this will be a notable change.

@legendecas legendecas added the notable-change PRs with changes that should be highlighted in changelogs. label Feb 21, 2025
Copy link
Contributor

The notable-change PRs with changes that should be highlighted in changelogs. label has been added by @legendecas.

Please suggest a text for the release notes if you'd like to include a more detailed summary, then proceed to update the PR description with the text or a link to the notable change suggested text comment. Otherwise, the commit will be placed in the Other Notable Changes section.

@mcollina mcollina added the semver-major PRs that contain breaking changes and should be released in the next major version. label Feb 21, 2025
@joyeecheung
Copy link
Member Author

Depends on #57236

@mcollina why is this semver-major? I think if we make it behind another environment variable first (NODE_USE_HTTP_PROXY in this PR, can change to a different name if a better one comes up), this is still semver-minor, and only another PR that makes it enabled by default and have to be disabled explicitly would be semver-major?

@mcollina
Copy link
Member

mcollina commented Mar 2, 2025

why is this semver-major? I think if we make it behind another environment variable first (NODE_USE_HTTP_PROXY in this PR, can change to a different name if a better one comes up), this is still semver-minor, and only another PR that makes it enabled by default and have to be disabled explicitly would be semver-major?

The HTTP_PROXY env is used also by other programs, so it's relatively possible that it's set in an environment, making it a behavior change.
Putting it behind a flag/API/Node-specific env variable would make it non-breaking as you proposd.

@jasnell
Copy link
Member

jasnell commented Mar 2, 2025

Would the intention be to limit this to just undici/fetch? That is, would this also work for require('http').get(...), etc. I'm perfectly fine saying that this should be limited to undici/fetch but want to clarify the intent.

@mcollina
Copy link
Member

mcollina commented Mar 2, 2025

Given how the current http.Agent work, I'm not entirely sure it's possible to enable it by default without breaking some part of the ecosystem unexpectedly(see the monkeypatching that nock does, as well as observability providers).

@joyeecheung
Copy link
Member Author

joyeecheung commented Mar 3, 2025

I think a probable path forward would be:

  1. Make it work with fetch behind NODE_USE_ENV_PROXY as semver-minor
  2. Making it work with http.request etc. behind NODE_USE_ENV_PROXY as semver-minor
  3. Enabling it by default for fetch as semver-major
  4. Enabling it by default for http.request as semver-major

3 and 4 can be done independently. In the worst case, we stop at 2, but at least it still allows users being trapped by firewalls and not always having control over the code making requests to opt into a solution that solves their headaches by setting the environment variable, which might be good enough for many already. I think 3 might not be that risky, 4 will need some investigation, but we can use 2 to let users try it out and explore any possible interop problems with existing libraries. So I expect 2 would need a bit more time to fully develop but since it's opt-in as an experimental feature it seems pretty reasonable.

@joyeecheung joyeecheung removed the semver-major PRs that contain breaking changes and should be released in the next major version. label Mar 4, 2025
@joyeecheung
Copy link
Member Author

Rebased after #57236 landed. If the plan proposed in #57165 (comment) looks good I'll move on adding docs and more tests.

setGlobalDispatcher(envHttpProxyAgent);
// TODO(joyeecheung): handle http/https global agents and perhaps Agent constructor
// behaviors.
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Out of curiosity... What does the error look like if the proxy is misconfigured? That is, if I set HTTP_PROXY to an invalid value and the dispatcher is not able to actually make the connection? It would be helpful to ensure that the error gives enough indication for the user to know that the issue is the proxy config.

Copy link
Member Author

@joyeecheung joyeecheung Mar 4, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It depends on how invalid the value is - typically this just times out, just like what happens if you send the request to an invalid server. Or if the domain name is invalid then you get a DNS error, etc. I am not sure if it makes sense to pre-emptively check for anything, however. Sending requests to an invalid proxy server is not too different from sending requests without proxy to an invalid server, in essence. As far as I know, other command line tools that support these environment variables would not really check the validity of the proxy - if the checking process is not part of the typical protocol (in the case of HTTP proxies, I don't think there is one), then performing a separate check can also lead to a TOCTOU problem. The proxy server being invalid during process startup does not mean that it won't be valid when the application actually makes the request.

@joyeecheung joyeecheung added the semver-minor PRs that contain new features and should be released in the next minor version. label Mar 6, 2025
@joyeecheung joyeecheung force-pushed the http-proxy branch 2 times, most recently from effc51c to a8cc0f2 Compare March 10, 2025 18:21
@joyeecheung
Copy link
Member Author

Updated to reuse the existing certificate (agent8) instead of adding a new fixture. Can you take a look again? @jasnell @mcollina

@joyeecheung joyeecheung added the request-ci Add this label to start a Jenkins CI on a PR. label Mar 18, 2025
@github-actions github-actions bot removed the request-ci Add this label to start a Jenkins CI on a PR. label Mar 18, 2025
@nodejs-github-bot
Copy link
Collaborator

Copy link
Member

@mcollina mcollina left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

@mcollina mcollina added the commit-queue Add this label to land a pull request using GitHub Actions. label Mar 20, 2025
@nodejs-github-bot nodejs-github-bot removed the commit-queue Add this label to land a pull request using GitHub Actions. label Mar 20, 2025
@nodejs-github-bot nodejs-github-bot merged commit 5a7b7d2 into nodejs:main Mar 20, 2025
58 checks passed
@nodejs-github-bot
Copy link
Collaborator

Landed in 5a7b7d2

@joyeecheung
Copy link
Member Author

joyeecheung commented Mar 21, 2025

I have a WIP in https://github.com/joyeecheung/node/tree/http-agent-proxy to support http agents (for that one I am following the behavior of curl instead of undici, I feel that we should fix the undici behavior and align with curl ultimately), though I also noticed certain things that should better be done for sanity of the code (e.g. reusing the parsing of the env var somehow with undici), going to open a tracking issue as follow-up...

@mcollina
Copy link
Member

@joyeecheung for that implementation I would avoid adding this behavior to the main http.Agent and create a new one http.ProxyAgent.

@joyeecheung
Copy link
Member Author

for that implementation I would avoid adding this behavior to the main http.Agent and create a new one http.ProxyAgent.

Does it have to be separate? I think when guarded by the environment variable, it doesn’t even matter all that much. From my testing, if guarded by an env var it works other patching and overidding. And with the way the agents are structured, keeping an optional path inside is much better than creating an extra subclass (which also is how e.g. ca and all the other agent options work, I think configuring it from the option bag would lead to way cleaner code than trying to subclass).

@mcollina
Copy link
Member

Does it have to be separate? I think when guarded by the environment variable, it doesn’t even matter all that much. From my testing, if guarded by an env var it works other patching and overidding. And with the way the agents are structured, keeping an optional path inside is much better than creating an extra subclass (which also is how e.g. ca and all the other agent options work, I think configuring it from the option bag would lead to way cleaner code than trying to subclass)

I'm really skeptical about the long term maintainability of that solution, and how it would play out with the rest of the ecosystem that relies on patching the Agent etc. That's one of the reasons why we went for a completely different design in undici.

@joyeecheung
Copy link
Member Author

joyeecheung commented Mar 21, 2025

I'm really skeptical about the long term maintainability of that solution, and how it would play out with the rest of the ecosystem that relies on patching the Agent etc. That's one of the reasons why we went for a completely different design in undici.

From what I can tell, patching it directly is much simpler than subclassing, due to how the http agents are designed. Maybe it's different for undici because undici agents have a different subclassing pattern. But when I look at existing proxy agent implementations in npm, I think >50% of the code is basically fighting against Node.js's internals for the sake of inserting logic into the default behavior while maintaining some invariants in Node.js, but then it seems the extra logic put into place to mimic the invariants maintained by Node.js can still fail in various edge cases...

For example, see https://github.com/TooTallNate/proxy-agents/blob/536aaa5165b888c548cbfa8095801c05478605fe/packages/http-proxy-agent/src/index.ts#L135 and almost the entire purpose of agent-base: https://github.com/TooTallNate/proxy-agents/blob/536aaa5165b888c548cbfa8095801c05478605fe/packages/agent-base/src/index.ts

Whereas maintaining an optional path simply removes the need of all these patching and produces much simpler code, from what I have implemented so far (it basically just comes down to parsing the configuration + additional code to modify the requests as what RFC 7230 says, there isn't much code that I would consider as "fighting Node.js internals" when the optional paths are directly in internals)

@mcollina
Copy link
Member

Ok!

@aduh95 aduh95 added the dont-land-on-v23.x PRs that should not land on the v23.x-staging branch and should not be released in v23.x. label Mar 25, 2025
RafaelGSS pushed a commit that referenced this pull request Apr 1, 2025
When enabled, Node.js parses the `HTTP_PROXY`, `HTTPS_PROXY` and
`NO_PROXY` environment variables during startup, and tunnels requests
over the specified proxy.

This currently only affects requests sent over `fetch()`. Support for
other built-in `http` and `https` methods is under way.

PR-URL: #57165
Refs: nodejs/undici#1650
Reviewed-By: Matteo Collina <matteo.collina@gmail.com>
RafaelGSS pushed a commit that referenced this pull request Apr 1, 2025
When enabled, Node.js parses the `HTTP_PROXY`, `HTTPS_PROXY` and
`NO_PROXY` environment variables during startup, and tunnels requests
over the specified proxy.

This currently only affects requests sent over `fetch()`. Support for
other built-in `http` and `https` methods is under way.

PR-URL: #57165
Refs: nodejs/undici#1650
Reviewed-By: Matteo Collina <matteo.collina@gmail.com>
RafaelGSS pushed a commit that referenced this pull request May 1, 2025
When enabled, Node.js parses the `HTTP_PROXY`, `HTTPS_PROXY` and
`NO_PROXY` environment variables during startup, and tunnels requests
over the specified proxy.

This currently only affects requests sent over `fetch()`. Support for
other built-in `http` and `https` methods is under way.

PR-URL: #57165
Refs: nodejs/undici#1650
Reviewed-By: Matteo Collina <matteo.collina@gmail.com>
RafaelGSS pushed a commit that referenced this pull request May 2, 2025
When enabled, Node.js parses the `HTTP_PROXY`, `HTTPS_PROXY` and
`NO_PROXY` environment variables during startup, and tunnels requests
over the specified proxy.

This currently only affects requests sent over `fetch()`. Support for
other built-in `http` and `https` methods is under way.

PR-URL: #57165
Refs: nodejs/undici#1650
Reviewed-By: Matteo Collina <matteo.collina@gmail.com>
@aduh95 aduh95 added the baking-for-lts PRs that need to wait before landing in a LTS release. label May 6, 2025
@aduh95
Copy link
Contributor

aduh95 commented May 6, 2025

Do we want this on v22.x? I've added baking-for-lts PRs that need to wait before landing in a LTS release. in case we get some feedback on 24.x first

@realtimetodie
Copy link

Underrated feature

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
baking-for-lts PRs that need to wait before landing in a LTS release. dont-land-on-v23.x PRs that should not land on the v23.x-staging branch and should not be released in v23.x. lib / src Issues and PRs related to general changes in the lib or src directory. needs-ci PRs that need a full CI run. notable-change PRs with changes that should be highlighted in changelogs. semver-minor PRs that contain new features and should be released in the next minor version.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants