Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow http_archive and http_file to use a credential-helper executable #15013

Closed
adam-azarchs opened this issue Mar 10, 2022 · 19 comments
Closed
Labels
P2 We'll consider working on this in future. (Assignee optional) team-ExternalDeps External dependency handling, remote repositiories, WORKSPACE file. type: feature request

Comments

@adam-azarchs
Copy link
Contributor

Description of the problem / feature request:

It would be very useful if, in addition to the existing .netrc support, the http_archive and http_file repository rules could be configured to use a credential helper executable.

Feature requests: what underlying problem are you trying to solve with this feature?

The most important use case for this, at least for our team, but I suspect for others as well, would be downloading release artifacts from private git repositories.

Assuming one has git credentials set up properly, one can run e.g.

$ printf 'protocol=https\nhost=github.com\n' | git credential fill | sed -n 's/^password=//p'

to get the required authorization token (which may be a PAT, or an oauth token, depending on configuration; either will work).

Alternatively, if one is using the gh cli tool,

$ gh auth status  -t |& sed -n 's/.*Token: //p'

Other examples of tools which might be useful for this sort of thing:

In most of these cases it's possible to take the provided token and put it in one's .netrc, but that's neither convenient nor particularly secure.

What operating system are you running Bazel on?

linux

What's the output of bazel info release?

release 5.0.0

Have you found anything relevant by searching the web?

This is maybe tangentially related to #14372

@oquenchil oquenchil added type: feature request untriaged team-OSS Issues for the Bazel OSS team: installation, release processBazel packaging, website labels Mar 16, 2022
@meteorcloudy meteorcloudy added P3 We're not considering working on this, but happy to review a PR. (No assignee) team-ExternalDeps External dependency handling, remote repositiories, WORKSPACE file. and removed untriaged team-OSS Issues for the Bazel OSS team: installation, release processBazel packaging, website labels Mar 22, 2022
@guw
Copy link
Contributor

guw commented May 16, 2022

I think this support is fundamental from a security perspective because it allows to avoid storing security credentials in a file. I'd appreciate if this can be prioritized a bit higher and become part of the roadmap.

FWIW, any solution should be deeply integrated into Bazel so that other functionality such as #13709 doesn't require additional work to enable it.

@adam-azarchs
Copy link
Contributor Author

To be a little bit more clear about my feature request, I wasn't suggesting that arbitrary shell snippets should be supported necessarily (because that might be complicated) - more complicated stuff like I was showing for git credential fill support would, I'd expect, be in a wrapper script of some kind.

API can be a little tricky to work out the ideal form. My first idea would be simply some executable path (either absolute or workspace-relative like --workspace_status_command) and given the target hostname as the argument. However, a couple of issues, and a couple of possible solutions to those issues:

  1. There are some cases where you might need the full URL. Not most cases, however, so I'd be somewhat reluctant to force the more common cases to introduce a wrapper script to deal with extracting the hostname from the URL.
    1. Pass both hostname and full URL as arguments, to simplify wrapper script implementation by allowing it to just choose whichever one it's interested in. It's likely that whatever API there is, the vast majority of use cases will require at least a simple wrapper script to adapt whatever credential helper to it.
  2. "workspace relative" gets a muddy if the call to the repository rule is via a repository macro defined in another workspace (as is common for the vast majority of repos which haven't started transitioning to bzlmod).
    1. Always treat the workspace-relative path as being relative to the root workspace. If you're using transitive imports which require authorization, it's very likely you control both repos and can figure out what to do about it.
    2. Introduce a concept of "named" credential helpers, which can be registered globally, similar to toolchains. They might then have starlark implementation functions. Then e.g. netrc handling becomes just another credential helper (though it doesn't use an executable helper). It would also allow users flexibility in e.g. some using gh auth status while others might use git credential fill. This would be the most general solution, but also probably a much bigger project than simply adding a parameter to repository_ctx.download.

@guw
Copy link
Contributor

guw commented May 16, 2022

FWIW, I would be happy if Bazel just implements the same protocol that git credentials helper is using. That would open the door for a lot of secure storage options.

@Yannic
Copy link
Contributor

Yannic commented Jun 8, 2022

@adam-azarchs
Copy link
Contributor Author

I think there's some tweaks that can make it work, but I'm not sure it makes sense to have a single "jack of all trades" --credential_helper, for a few reasons:

  1. The way we'd want to go about obtaining credentials for downloading repository artifacts from https might be quite different from how we'd do so for RBE/BES, so merging them into a single tool might not make sense. It might be best to have separate flags for those use cases (though see point 3 below, which would also maybe resolve that issue).
  2. Many (possibly even most) URLs might not need any kind of authentication. If you have a lot of dependencies (which you do, if you import a few rule sets and their dependencies), invoking the credential helper before every download might result in a non-trivial performance impact. I think we'd need to allow workspace rules to have at least a boolean auth_required parameter (defaulting to false) for repository_ctx.download and related methods.
  3. I can imagine a lot of situations where someone might provide a bazel-compatible credential helper binary that knows how to handle authentication for a particular vendor, e.g. someone might make a github auth helper. If there's only a single --credential_helper flag, then the user would need to wrap that in a "merged" helper that could delegate to the others, and pretty soon you end up with a long chain of subprocess calls.
    1. This could be mitigated by allowing multiple accumulated flags like e.g. --credential_helper=https://github.com/=github_helper.sh --credential_helper=https://s3-us-west-2.amazonaws.com/=s3_helper.sh or something like that, based on prefix.
      1. or maybe regexp. Prefix would be safer, but on the other hand one might want to be able to match ^https://s3-.*\.amazonaws\.com/.
      2. A compromise would be prefix glob, e.g. https://s3-*.amazonaws.com/.
    2. This would also mitigate the risk that a buggy credential helper implementation might pass credentials to the wrong service, e.g. handing a GitHub token over to AWS or something like that.

Ideally, the protocol should be

  1. Sufficiently simple that the helper can be implemented with a shell script without being too easy to accidentally introduce security problems. git credential fill is a pretty simple protocol that meets that criteria. Anything involving json, less so, especially because there is a temptation to do something like echo "{\"password\": \"$(thing)\"} without sanitizing the output of thing by e.g. passing it through jq.
  2. Ideally, it should match exactly with the protocol for some existing commonly-used credential helper, so that in that common use case the user wouldn't need to write their own helper. git credential fill I've already mentioned. Your proposal more closely mirrors docker's, which might also be fine, but it's important to recognize that any deviation from that protocol immediately introduces a requirement for a custom wrapper.

@adam-azarchs
Copy link
Contributor Author

I would also probably drop the part about non-absolute paths foo being converted to foo-bazel-credential-helper. I would allow (root) workspace-relative paths, and also searching in PATH, but I wouldn't arbitrarily append a suffix onto what the user passed in. Verbosity for that command line shouldn't be a big deal, as I'd expect that in almost every case the user would be configuring it in some .bazelrc file rather than passing it on the command line explicitly.

@guw
Copy link
Contributor

guw commented Jun 9, 2022

  1. Many (possibly even most) URLs might not need any kind of authentication. If you have a lot of dependencies (which you do, if you import a few rule sets and their dependencies), invoking the credential helper before every download might result in a non-trivial performance impact. I think we'd need to allow workspace rules to have at least a boolean auth_required parameter (defaulting to false) for repository_ctx.download and related methods.

In an enterprise environment it should never be a rule author deciding if something is put behind a protected URL or not. It needs to be at the discretion of the enterprise users/customers deciding whether to protect URLs or not. That's why I'm a big supporter for a central solution in Bazel without any opt-in/opt-out from rules.

I was thinking of the credential helpers to be more globally scoped (eg., an Mac OS Keychain credential helper) rather than being target scoped (AWS vs. GitHub). But I can see the benefit of allowing multiple for different domains.

+1 on keeping the protocol simple and allowing for re-use of existing credential helpers.

@adam-azarchs
Copy link
Contributor Author

In an enterprise environment it should never be a rule author deciding if something is put behind a protected URL or not. It needs to be at the discretion of the enterprise users/customers deciding whether to protect URLs or not. That's why I'm a big supporter for a central solution in Bazel without any opt-in/opt-out from rules.

To be more clear, my assumption would be that rules would pass that parameter in from an attribute, e.g.

http_archive(
    name = "foo",
    requires_auth = True,
    urls = ["https://secret.example.com"],
)

So not the rule author's decision, but the decision of whomever is selecting the URL from which to download, which is the appropriate place to be making such a decision. My note about making it a parameter to repository_ctx.download is simply because that would have to be how it was implemented. There's definitely a reasonable argument for turning it on by default in the repository_ctx call (so we don't need every rule author to plumb it through immediately), though I'd still probably leave it off by default in the corresponding rule attribute for the same reason. My main point was that from a performance perspective it's important to permit an opt-out mechanism.

While in an enterprise context (like the one I'm operating in) there will certainly be a significant number of private dependencies, I'd expect most organizations would either use public archives where possible (e.g. for anything under https://github.com/bazelbuild, protocolbuffers, grpc, etc) or would simply vendor everything in. I can't see many organizations maintaining private forks of e.g. bazel-skylib, rules_go, etc., and rules archives like that pull in a lot of other public transitive dependencies. A reasonably large repo can easily end up fetching hundreds of remote repos, and so a few tens of milliseconds for each credential helper process invocation can add up. For context here, our org's primary monorepo build fetches close to 500 archives from public URLs, plus less than 10 from private sources. That's not considering the repos which would be fetched by bazel fetch but which aren't fetched during a normal build, nor is it counting dependencies fetched by yarn_install or npm_install rules. Right now we're fetching the private sources we need via git, because that at least does work with credential helpers, though doing so not ideal in other ways,

I wouldn't want to limit scoping to just being by domain. For example with github you'd probably want to scope credentials to orgs your enterprise manages, while not consuming your account API rate limits for public repos. Likewise for s3/gcs, you'd only need auth on private buckets.

Also worth noting for the record, bazel should never be sending credentials to an unauthenticated/unencrypted URL. Not that it's a good idea to download from such places anyway.

@Yannic
Copy link
Contributor

Yannic commented Jun 10, 2022

Adjusted the proposal to allow workspace-relative credential helpers and specifying which domains to apply a helper to. PTAL (bazelbuild/proposals#264).

While I agree that re-using the protocol of an existing credential helper like git credential fill or docker would be nice, I don't think these protocols are sufficiently flexible. At least these two are primarily focused on providing a user-name/password pair for authentication, and I haven't found any existing protocol that would allow more. That seems to be sufficient for their use case, but we've seen cases where people were using custom headers (e.g., x-my-corp-auth: foo) or more than one header for authentication. Existing protocols seems to not support that, so I don't think re-using is a good idea. I can definitely see someone providing adapters for common helpers though (maybe even shipped as part of Bazel).

@Yannic
Copy link
Contributor

Yannic commented Jun 22, 2022

To be more clear, my assumption would be that rules would pass that parameter in from an attribute, e.g.

http_archive(
    name = "foo",
    requires_auth = True,
    urls = ["https://secret.example.com"],
)

I don't think that's feasible or desirable. There are companies that require that all dependencies are downloaded from an internal mirror, and that requires authentication. Requiring auth to be opt-in per-target seems very much counter-productive for that use case (not that it's easy to achieve today, but we shouldn't make it harder).

@adam-azarchs
Copy link
Contributor Author

I don't think that's feasible or desirable. There are companies that require that all dependencies are downloaded from an internal mirror, and that requires authentication. Requiring auth to be opt-in per-target seems very much counter-productive for that use case (not that it's easy to achieve today, but we shouldn't make it harder).

If you try to download from somewhere other than that internal mirror, you definitely wouldn't want bazel to send your authentication credentials there.

Also I'm not suggesting require opt-in per target. I'm suggesting that if you really do want a helper to match everything, you need to be explicit about that by providing an empty or wildcard url match for that helper. If we're imagining a situation where some proxy server does url rewriting along the way to redirect things to the internal mirror, then the worst case scenario is your build fails because you didn't provide authentication headers and you have to fix your flags appropriately, which is a much less bad scenario than accidentally sending your credentials to some unexpected URL.

@Yannic
Copy link
Contributor

Yannic commented Jun 22, 2022

Ok, I think we can call that resolved then? The proposal now allows to specify helpers that only match certain domains (optionally including all subdomains).

@guw
Copy link
Contributor

guw commented Jun 23, 2022

@Yannic Bazel has built in support for downloader rewriter config. Should there be a word in the spec that this is supported and the credential helper applies to the rewritten (final) URL target?

@adam-azarchs
Copy link
Contributor Author

Recently, for our internal use cases, I implemented a git_http_archive rule. It's more or less a copy/paste of the bundled http_archive rule with a modified _get_auth:

Implementation of get_auth
load(
    "@bazel_tools//tools/build_defs/repo:utils.bzl",
    "read_netrc",
    "read_user_netrc",
    "use_netrc",
)

def _get_git_auth(ctx, host):
    result = ctx.execute(
        [ctx.path(ctx.attr._git_auth_script), host],
        quiet = True,
    )
    if result.return_code:
        fail("Failed to get git credentials: " + result.stderr)
    return result.stdout.strip()

def _use_git_auth(ctx, urls, auth):
    """Augment an auth dictionary with values from the git credential helper.

    Args:
      ctx: The repository context object.
      urls: a list of URLs.
      auth: The auth dictionary produced from the .netrc by `use_netrc`.

    Returns:
      dict suitable as auth argument for ctx.download; more precisely, the dict
      will map all URLs where the netrc file provides login and password to a
      dict containing the corresponding login, password and optional authorization pattern,
      as well as the mapping of "type" to "basic" or "pattern".
    """
    # See the implementation for use_netrc, which produces the input dict:
    # https://github.com/bazelbuild/bazel/blob/2e8458b7810eab7829fc7d28af5c45b9af91ed7c/tools/build_defs/repo/utils.bzl#L339-L387

    patterns = ctx.attr.auth_patterns
    git_creds = {}
    for url in urls:
        schemerest = url.split("://", 1)
        if len(schemerest) < 2:
            continue
        if schemerest[0] != "https":
            # For other protocols, bazel currently does not support
            # authentication. So ignore them.  Also, don't send
            # credentials over unencrypted channels.
            continue
        host = schemerest[1].split("/")[0].split(":")[0]
        if host in patterns:
            if url in auth:
                auth_dict = auth[url]
            else:
                auth_dict = {
                    "type": "pattern",
                    "pattern": patterns[host],
                }
            pattern = auth_dict["pattern"]
            if "<password>" in pattern and "password" not in auth_dict:
                if host in git_creds:
                    auth_dict["password"] = git_creds[host]
                else:
                    pw = _get_git_auth(ctx, host)
                    auth_dict["password"] = pw
                    git_creds = pw
                auth[url] = auth_dict
    return auth

def get_auth(ctx, urls):
    """Given the list of URLs obtain the correct auth dict.

    See [bazel source][] for how this is used.

    [bazel source]: https://github.com/bazelbuild/bazel/blob/5.2.0/src/main/java/com/google/devtools/build/lib/bazel/repository/starlark/StarlarkRepositoryContext.java#L1186-L1195

    Args:
        ctx: The repository context.
        urls: A list of URLs.

    Returns:
        A dictionary of auth information.
    """

    # Mostly copied from https://github.com/bazelbuild/bazel/blob/5.2.0/tools/build_defs/repo/http.bzl

    if ctx.attr.netrc:
        netrc = read_netrc(ctx, ctx.attr.netrc)
    else:
        netrc = read_user_netrc(ctx)
    return _use_git_auth(ctx, urls, use_netrc(netrc, urls, ctx.attr.auth_patterns))

Implementing this helped to clarify in my mind what I wanted out of this feature. The approach above works pretty well, but has two main downsides:

  1. The rule needs to execute git credential fill even if the archive is
    already present in the repository cache, which has a performance cost.
  2. Credentials cannot be shared between invocations of the rule. This is actually not a big deal if the first issue is addressed, since for most builds all of the archives will already be cached.

My primary objective with this feature request is solving the first problem, in a way that would barely affect the current starlark API at all, by allowing the auth dictionary to take a delegate function. That is, instead of

pw = _get_git_auth(ctx, host)
auth_dict["password"] = pw

allow something like

auth_dict["password"] = _get_git_auth

so that the downloader would only execute _get_git_auth in the event that it was actually required.

@Yannic 's proposal is a good one, because it also solves a couple of other issues:

  1. Passing credentials in headers other than Authorization.
  2. Setting other required headers (e.g. Content-Type, which is necessary for example when downloading release artifacts from github APIs).

That said, there's something to be said for a less invasive change. There is, of course, room for both.

@robin-wayve
Copy link
Contributor

@Yannic have the credentialHelper changes in Bazel 5.3 been hooked up to http_archive and http_file? I am trying to find out for myself in the meantime.

@robin-wayve
Copy link
Contributor

maybe this comment answers my question #15856 (comment)

@tjgq
Copy link
Contributor

tjgq commented Nov 23, 2022

To state it clearly: credential helpers aren't wired up to repository fetching yet. We are working on adding it in time for 6.0.0.

@tjgq tjgq added P2 We'll consider working on this in future. (Assignee optional) and removed P3 We're not considering working on this, but happy to review a PR. (No assignee) labels Mar 21, 2023
@polasek
Copy link

polasek commented May 12, 2023

@tjgq is there any update on this? I am assuming this hasn’t made it into 6.0.0 given the status of the issue?

@tjgq
Copy link
Contributor

tjgq commented May 12, 2023

Coincidentally, I'm working on this as we speak. I'm planning to submit #18173 or something close to it today or early next week.

tjgq pushed a commit to tjgq/bazel that referenced this issue May 12, 2023
After this change, credential helpers will be used to fetch credentials for
repository fetching (rctx.download and rctx.download_and_extract), taking
precedence over a .netrc file or an `auth` parameter provided from Starlark.

Also improve the documentation for credential helper related flags.

Fixes bazelbuild#15013.
tjgq pushed a commit to tjgq/bazel that referenced this issue May 12, 2023
After this change, credential helpers will be used to fetch credentials for
repository fetching (rctx.download and rctx.download_and_extract), which
take precedence over the `auth` parameter.

Also improve the documentation for credential helper related flags.

Fixes bazelbuild#15013.
tjgq pushed a commit to tjgq/bazel that referenced this issue May 15, 2023
After this change, credential helpers will be used to fetch credentials for
repository fetching (rctx.download and rctx.download_and_extract), which
take precedence over the `auth` parameter.

Also improve the documentation for credential helper related flags.

Fixes bazelbuild#15013.
tjgq pushed a commit to tjgq/bazel that referenced this issue May 15, 2023
After this change, credential helpers will be used to fetch credentials for
repository fetching (rctx.download and rctx.download_and_extract), which
take precedence over the `auth` parameter.

Also improve the documentation for credential helper related flags.

Fixes bazelbuild#15013.
tjgq pushed a commit to tjgq/bazel that referenced this issue May 15, 2023
After this change, credential helpers will be used to fetch credentials for
repository fetching (rctx.download and rctx.download_and_extract), which
take precedence over the `auth` parameter.

Also improve the documentation for credential helper related flags.

Fixes bazelbuild#15013.
tjgq pushed a commit to tjgq/bazel that referenced this issue May 15, 2023
After this change, credential helpers will be used to fetch credentials for
repository fetching (rctx.download and rctx.download_and_extract), which
take precedence over the `auth` parameter.

Also improve the documentation for credential helper related flags.

Fixes bazelbuild#15013.
tjgq pushed a commit to tjgq/bazel that referenced this issue May 15, 2023
After this change, credential helpers will be used to fetch credentials for
repository fetching (rctx.download and rctx.download_and_extract), which
take precedence over the `auth` parameter.

Also improve the documentation for credential helper related flags.

Fixes bazelbuild#15013.
tjgq pushed a commit to tjgq/bazel that referenced this issue May 15, 2023
After this change, credential helpers will be used to fetch credentials for
repository fetching (rctx.download and rctx.download_and_extract), which
take precedence over the `auth` parameter.

Also improve the documentation for credential helper related flags.

Fixes bazelbuild#15013.
tjgq pushed a commit to tjgq/bazel that referenced this issue May 15, 2023
After this change, credential helpers will be used to fetch credentials for
repository fetching (rctx.download and rctx.download_and_extract), which
take precedence over the `auth` parameter.

Also improve the documentation for credential helper related flags.

Fixes bazelbuild#15013.
tjgq added a commit to tjgq/bazel that referenced this issue May 15, 2023
After this change, credential helpers will be used to fetch credentials for
repository fetching (rctx.download and rctx.download_and_extract), which
take precedence over the `auth` parameter.

Also improve the documentation for credential helper related flags.

Fixes bazelbuild#15013.
tjgq added a commit to tjgq/bazel that referenced this issue May 15, 2023
After this change, credential helpers will be used to fetch credentials for
repository fetching (rctx.download and rctx.download_and_extract), which
take precedence over the `auth` parameter.

Tests that need a credential helper are skipped on Windows for now, as
otherwise the credential helper would have to be reimplemented in Batch
or Powershell.

Also improve the documentation for credential helper related flags.

Fixes bazelbuild#15013.
tjgq added a commit to tjgq/bazel that referenced this issue May 15, 2023
After this change, credential helpers will be used when available to obtain
credentials for repository fetching, taking precedence over the `auth`
parameter to rctx.download and rctx.download_and_extract.

Tests that need a credential helper are skipped on Windows for now, as
otherwise the credential helper would have to be reimplemented in Batch
or Powershell.

Also improve the documentation for credential helper related flags.

Fixes bazelbuild#15013.
tjgq added a commit to tjgq/bazel that referenced this issue May 16, 2023
After this change, credential helpers will be used when available to obtain
credentials for repository fetching, taking precedence over the `auth`
parameter to rctx.download and rctx.download_and_extract.

Tests that need a credential helper are skipped on Windows for now, as
otherwise the credential helper would have to be reimplemented in Batch
or Powershell.

Also improve the documentation for credential helper related flags.

Fixes bazelbuild#15013.
tjgq added a commit to tjgq/bazel that referenced this issue May 17, 2023
After this change, credential helpers will be used when available to obtain
credentials for repository fetching, taking precedence over the `auth`
parameter to rctx.download and rctx.download_and_extract.

Tests that need a credential helper are skipped on Windows for now, as
otherwise the credential helper would have to be reimplemented in Batch
or Powershell.

Also improve the documentation for credential helper related flags.

Fixes bazelbuild#15013.

Closes bazelbuild#18173.

PiperOrigin-RevId: 532454935
Change-Id: Ia3be8c21e001a36160f3d1df858593f8fb846055
tjgq added a commit to tjgq/bazel that referenced this issue May 19, 2023
After this change, credential helpers will be used when available to obtain
credentials for repository fetching, taking precedence over the `auth`
parameter to rctx.download and rctx.download_and_extract.

Tests that need a credential helper are skipped on Windows for now, as
otherwise the credential helper would have to be reimplemented in Batch
or Powershell.

Also improve the documentation for credential helper related flags.

Fixes bazelbuild#15013.

Closes bazelbuild#18173.

PiperOrigin-RevId: 532454935
Change-Id: Ia3be8c21e001a36160f3d1df858593f8fb846055
iancha1992 pushed a commit that referenced this issue May 19, 2023
After this change, credential helpers will be used when available to obtain
credentials for repository fetching, taking precedence over the `auth`
parameter to rctx.download and rctx.download_and_extract.

Tests that need a credential helper are skipped on Windows for now, as
otherwise the credential helper would have to be reimplemented in Batch
or Powershell.

Also improve the documentation for credential helper related flags.

Fixes #15013.

Closes #18173.

PiperOrigin-RevId: 532454935
Change-Id: Ia3be8c21e001a36160f3d1df858593f8fb846055
fweikert pushed a commit to fweikert/bazel that referenced this issue May 25, 2023
After this change, credential helpers will be used when available to obtain
credentials for repository fetching, taking precedence over the `auth`
parameter to rctx.download and rctx.download_and_extract.

Tests that need a credential helper are skipped on Windows for now, as
otherwise the credential helper would have to be reimplemented in Batch
or Powershell.

Also improve the documentation for credential helper related flags.

Fixes bazelbuild#15013.

Closes bazelbuild#18173.

PiperOrigin-RevId: 532454935
Change-Id: Ia3be8c21e001a36160f3d1df858593f8fb846055
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
P2 We'll consider working on this in future. (Assignee optional) team-ExternalDeps External dependency handling, remote repositiories, WORKSPACE file. type: feature request
Projects
None yet
Development

Successfully merging a pull request may close this issue.

8 participants