Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bzlmod: allow repos to be visible to other module extensions #19301

Closed
tyler-french opened this issue Aug 22, 2023 · 38 comments
Closed

Bzlmod: allow repos to be visible to other module extensions #19301

tyler-french opened this issue Aug 22, 2023 · 38 comments
Assignees
Labels
area-Bzlmod Bzlmod-specific PRs, issues, and feature requests P1 I'll work on this now. (Assignee required) team-ExternalDeps External dependency handling, remote repositiories, WORKSPACE file. type: feature request

Comments

@tyler-french
Copy link

Description of the feature request:

In certain situations, there exists a need to have a module extension be able to have visibility of repositories that are created by other module extensions.

Let's take go_sdk module extension for example in rules_go.

I have some repo declared locally in the main module, say for example @go_sdk_linux_amd64 is the name of the repo.

Now, I want to have the go_sdk module extension load this with something that is similar to the WORKSPACE repository rule go_wrap_sdk: (https://github.com/bazelbuild/rules_go/blob/master/go/private/sdk.bzl#L372-L391)

The issue here, is that if I make a tag (for example go_sdk.wrap(name = "@go_sdk_linux_amd64"), the build will fail because the module extension go_deps doesn't see the repo @go_sdk_linux_amd64

ERROR: An error occurred during the fetch of repository 'rules_go~override~go_sdk~go_sdk':
   Traceback (most recent call last):
        File "/home/user/.cache/bazel/_bazel_tfrench/b97476d719d716accead0f2d5b93104f/external/rules_go~override/go/private/sdk.bzl", line 366, column 26, in _go_wrap_sdk_impl
                goroot = str(ctx.path(root_file).dirname)
Error in path: Unable to load package for @go_sdk_linux_amd64//:README.md: The repository '@go_sdk_linux_amd64' could not be resolved: Repository '@go_sdk_linux_amd64' is not defined
ERROR: <builtin>: fetching go_wrap_sdk_rule rule //:rules_go~override~go_sdk~go_sdk: Traceback (most recent call last):
        File "/home/user/.cache/bazel/_bazel_tfrench/b97476d719d716accead0f2d5b93104f/external/rules_go~override/go/private/sdk.bzl", line 366, column 26, in _go_wrap_sdk_impl
                goroot = str(ctx.path(root_file).dirname)
Error in path: Unable to load package for @go_sdk_linux_amd64//:README.md: The repository '@go_sdk_linux_amd64' could not be resolved: Repository '@go_sdk_linux_amd64' is not defined
...
FAILED: Build did NOT complete successfully (77 packages loaded, 786 targets configured)

Which category does this issue belong to?

No response

What underlying problem are you trying to solve with this feature?

The ability to use a repo that is declared in the main module in a call to a module extension that is externally declared.

Which operating system are you running Bazel on?

Linux

What is the output of bazel info release?

release 6.3.1

If bazel info release returns development version or (@non-git), tell us how you built Bazel.

No response

What's the output of git remote get-url origin; git rev-parse master; git rev-parse HEAD ?

N/A non-public code

Have you found anything relevant by searching the web?

No response

Any other information, logs, or outputs that you want to share?

No response

@shahms
Copy link

shahms commented Aug 25, 2023

FWIW, I encountered a similar situation when attempting to modularize Kythe. We depend on https://github.com/google/brotli from both C++ and Go. The Go module also depends on the C++ library, but that library can't be handled directly by Go. In order to accomplish this, I had to patch the BUILD file to use the canonical repo path to the non-modular C++ repository. Were that repository to be a modular dependency, this gets really fiddly and flaky.

Similarly, any kind of shared dependency that needs to bridge external package managers becomes challenging without someway of making that dependency known and available to the language-specific module extension.

@fmeum
Copy link
Collaborator

fmeum commented Aug 25, 2023

While this doesn't help in the general case, in the Go case you can just add the Go module with patches to the BCR as a Bazel module and go_deps will pick it up as a Go dependency automatically.

But I fully agree we need something more flexible and generally applicable to add the repo mapping entries for patches.

@Wyverald
Copy link
Member

The issue here, is that if I make a tag (for example go_sdk.wrap(name = "@go_sdk_linux_amd64"), the build will fail because the module extension go_deps doesn't see the repo @go_sdk_linux_amd64

This can work as long as the name attribute is of type label, not string. If the MODULE.bazel in which you're writing that tag has visibility to a repo with the apparent name "go_sdk_linux_amd64", that label would be properly translated into one with a canonical repo name which the go_sdk extension can freely use. I recommend (re-)reading https://bazel.build/external/overview#concepts (especially the two items about repo names) to understand more fully how this works.

Similar for @shahms, as long as you make sure the tag attribute has type "label", you should be able to pass that label around (in a capability-passing fashion, almost).

@fmeum
Copy link
Collaborator

fmeum commented Aug 27, 2023

Similar for @shahms, as long as you make sure the tag attribute has type "label", you should be able to pass that label around (in a capability-passing fashion, almost).

@shahms use case, which is similar to bazelbuild/rules_python#48 (comment), also requires a way to reference this label from (patched) BUILD files.

I can think of three ways to make this work:

  1. The brotli dependency is patched into gazelle as a bazel_dep so that the repositories generated by go_deps can reference it under a stable, friendly name. This requires patching both gazelle as well as the go_deps-generated repo. If brotli were fetched by some other module extension, the required bazel_dep and extension usage for that repo would also need to be patched into gazelle. Compared to WORKSPACE, this requires patching an additional repository.
  2. go_deps needs to be extended with some kind of a label_keyed_string_dict attribute that matches canonical repo names to user-provided stable identifiers and textually replaces the latter with the former in all patch files (e.g. replace @brotli with @@<brotli's canonical name>). This would make existing patches work with Bzlmod, but textually replacing labels in patch files may turn out to be brittle.
  3. Instead of textually replacing apparent with canonical labels, use the label_keyed_string_dict to generate a .bzl file with a patch_label function that is visible to all go_deps-generated repos and that patches can wrap their apparent labels into to get a canonical label string. Compared to WORKSPACE, this would make patches longer as they would also need to patch in a load, but isn't brittle like 2. Users would still need to add tags such as go_deps.patch_dependencies(repo_names = {"@brotli": "brotli"}).

@fmeum
Copy link
Collaborator

fmeum commented Aug 28, 2023

I found a fourth option that I currently like best:
4. Provide a repo_override tag on the module extension that accepts a repo_name and a list of targets and creates special repo named repo_name that contains aliases to all the targets under the same respective packages. Since this is an actual repo called repo_name, all other extension repos will see it under this name. Only downside is that the user has to manually list all the relevant targets.

@meteorcloudy meteorcloudy added P2 We'll consider working on this in future. (Assignee optional) and removed untriaged labels Sep 5, 2023
@Wyverald Wyverald added the area-Bzlmod Bzlmod-specific PRs, issues, and feature requests label Sep 6, 2023
@Wyverald
Copy link
Member

Wyverald commented Sep 6, 2023

So IIUC the capability is there (using label-typed attributes), just the usability is really bad because the user needs to somehow translate a BUILD file patch containing single-@ labels to a patch containing double-@ labels. Is that correct?

Assuming that it is, I'm sympathetic to the concern and agree that we should have a better story for this use case.

But first I wanted to make sure we're aware of a specific technical challenge, which is that any "repo mapping injection" we do is subject to conflicts. Suppose foo and bar are both using gazelle to bring in go-brotli, and would like to inject cc-brotli, we could have a conflict on our hands if foo and bar present different cc-brotlis to gazelle. We could of course say that only the root module can perform injections, or that conflicts are the user's business, but it's something we need to acknowledge.


Now, going over @fmeum's suggestions:

  1. This seems extremely ugly to me, and then again could only be done by the root module, and it can only work if you use a non-registry override on gazelle (single_version_override wouldn't work because we don't apply the patch until we fetch the repo, and the MODULE.bazel file of gazelle is obtained from the registry, not from the [patched] fetch).

  2. This is indeed too brittle IMO (and suffers from the injection conflict problem).

  3. Makes writing a BUILD patch very unintuitive (and suffers from the injection conflict problem).

  4. This is essentially a "repo alias" in its most ideal form, which ends up being a really obvious way to work around the deficiencies of the current repo mapping rules -- oh I can't map A to B? let me just create a C which is an alias of B and map A to C instead... at that point I'd rather fix it from Bazel's side. (And... you guessed it... this suffers from the injection conflict problem...)

The "reverse use_repo"/"make_visible_to" idea brought up in the other thread also suffers from the injection conflict problem, but has the advantage of being more ergonomic -- no funny label business.


To mitigate the injection conflict problem, I wonder if we can limit the scope of an injected repo mapping to select repos generated by the extension. Essentially this would be bringing back the repo_mapping faux-attribute on repo rules to all extension impl functions, but with semantics more akin to additional_repo_mappings. Then the user can pass labels into the extension, and the extension can selectively make those available to repos it generates (and decide what to do when two injections conflict).

But wait, isn't that #17493 (comment), where you said this was infeasible?? Well, not quite -- the repo_mapping attribute proposed in that issue changes the repo mapping of the Bazel modules using the extension, not the repos generated by the extension. By contrast, additional_repo_mappings only affects the repo mapping of extension-generated repos, which is already only computable after extensions run, so slapping some extra entries on there isn't a huge deal.

This approach still requires the user to "expose" a repo to an extension via a label (a capability), but then the extension can decide smartly what to do with that capability. Admittedly usability still isn't great; we could repurpose the "make_visible_to" idea to improve usability (module_ctx.modules[0].exports?), but I'd like to hear more thoughts first.

@meteorcloudy
Copy link
Member

In the internal team discussion we had on how to pin a module extension generated repo, instead of using --override_repository, I had the following proposal of a repo_override directive in MODULE.bazel which can override the repospec and repo mapping of a given repo.

# Assuming I want to pin repo foo and bar in the following MODULE.bazel file

# MODULE.bazel
bazel_dep(name = "foo", version = "1.0")

ext = use_extension("//:extension.bzl", "my_extension")
ext.bar(name = "bar", ...)
use_repo(ext, "bar")

# After vendoring foo and bar, copy then out to <source dir>/third_party (or some arbitrary dir name)

# Introduce repo_override directive in MODULE.bazel and append the following content to MODULE.bazel

repo_override(
  repo_name = "foo", # The apparent repo name that's introduced by bazel_dep or use_repo
  repo_class = "local_repository",
  # Can it just be `path = "./third_party/foo",` ?
  attrs = {"path" = "./third_party/foo"},
)

repo_override(
  repo_name = "bar",
  repo_class = "local_repository",
  attrs = {"path": "./third_party/bar"},
  # If the overridden version requires additional repo mappings, users can fix them here.
  # The key is the apparent repo name seen by bar, and the value is the apparent repo name seen by the root module.
  additional_repo_mapping = {"baz": "my_barz"}
)

# repo_override will only be available in the root module like other overrides.
# Besides solving the pinning problem for vendor mode, it also solves the problem of overriding a module extension introduced repo.

If we only use the repo mapping functionality alone, for the go use case, it would be:

repo_override(
  repo_name = "go_sdk",
  # The first `go_sdk_linux_amd64` is the apparent repo name seen by `go_sdk`
  # The second `go_sdk_linux_amd64` is the **apparent repo name** seen by the root module
  additional_repo_mapping = {"go_sdk_linux_amd64": "go_sdk_linux_amd64"}
)

@Wyverald Wyverald added this to the 7.1.0 release blockers milestone Nov 16, 2023
@iancha1992 iancha1992 removed this from the 7.1.0 release blockers milestone Nov 29, 2023
@iancha1992
Copy link
Member

@bazel-io fork 7.1.0

@Wyverald
Copy link
Member

This still needs some design work, so it's unlikely to happen for 7.1.0. Punting to 7.2.0 for now.

@Wyverald
Copy link
Member

Wyverald commented Aug 5, 2024

OK, so it looks like the biggest "hole" in this new idea is that the overridden repo is still accessible through its original canonical name.

This hole is rather hard to plug, since we'd need some sort of repo-level alias (#19927) for this to work. (The idea being that the overridden repo's canonical name aliases to the overriding repo's canonical name.) While this is easy to do on a surface level (just make a symlink), the proper fix for this is nigh impossible (involves making labels compare equal while cognizant of repo-level aliases).

We could work around this issue in the "use_repo from rules_foo" case by making use_repo "smart" (as in, have it be aware of override_repo directives). This will probably be good for 90% of cases. For the remaining 10% where people splice repo names, we could fix #19055 and call it a day.


This technique can't add to the repo mapping of a module.

While that's true, I'm still kind of sympathetic to the "patching MODULE.bazel with single_version_override" use case. Assuming we ever support that, it would be a "feature not a bug" that we have a single way to do this.

@fmeum
Copy link
Collaborator

fmeum commented Aug 6, 2024

Does that mean we have this plan?

  1. Resolve Provide module extensions with a way to reference extension repos #19055 and document that canonical repo names should never be constructed by hand. If rulesets don't adopt it right away, it's not the end of the world as it only leads to issues when users repo_override repos that have canonical labels constructed for them by other repos in the extension.
  2. Implement repo_override including smart use_repo.
  3. Make single_version_override able to patch module files, either by fetching patched modules eagerly or by magically patching only the module file.

This looks good to me (and doable in time for 7.4.0/8.0.0)

@Wyverald
Copy link
Member

Wyverald commented Aug 6, 2024

SGTM.

One last bikeshed for the road: I'd prefer the name override_repo over repo_override. override_repo has a nice symmetry with use_repo, and repo_override is too similar to archive_override & co.

@fmeum
Copy link
Collaborator

fmeum commented Aug 8, 2024

Some edge cases that we should keep in mind when implementing this:

ext = use_repo("@other_module//:ext.bzl", "ext")
use_repo(ext, "foo")
# Overriding one repo in an extension with another repo from that extension.
override_repo(ext, bar = "foo")
ext = use_repo("//:ext.bzl", "ext")
use_repo(ext, "foo")
ext2 = use_repo("//:ext2.bzl", "ext2")
use_repo(ext2, "bar")
# Overriding a repo in an extension defined by the main repo.
override_repo(ext, foo = "bar")
ext = use_repo("@other_module//:ext.bzl", "ext")
use_repo(ext, "foo")
ext2 = use_repo("@other_module//:ext2.bzl", "ext2")
use_repo(ext2, "bar")
ext3 = use_repo("@other_module//:ext3.bzl", "ext3")
use_repo(ext3, "baz")
# Overriding a repo with an overridden repo.
override_repo(ext2, bar = "foo")
override_repo(ext3, baz = "bar")
ext = use_repo("@other_module//:ext.bzl", "ext")
use_repo(ext, "foo")
ext2 = use_repo("@other_module//:ext2.bzl", "ext2")
use_repo(ext2, "bar")
ext3 = use_repo("@other_module//:ext3.bzl", "ext3")
use_repo(ext3, "baz")
# Overriding a repo with an overridden repo in a way that forms a cycle.
override_repo(ext2, bar = "foo")
override_repo(ext3, baz = "bar")
override_repo(ext, foo = "baz")

override_repo can also be used to create situations in which different apparent names map to the same canonical name in an extension's repo mapping, which is not something that could happen before. We could make this an error or just live with it.

@fmeum fmeum self-assigned this Aug 21, 2024
@meteorcloudy meteorcloudy added P1 I'll work on this now. (Assignee required) and removed P2 We'll consider working on this in future. (Assignee optional) labels Aug 27, 2024
copybara-service bot pushed a commit that referenced this issue Sep 24, 2024
Work towards #19301
Fixes #23580

RELNOTES: `override_repo` and `inject_repo` can be used to override and inject repos in module extensions.

Closes #23534.

PiperOrigin-RevId: 678139661
Change-Id: Iea7caca949c00e701f056c1037e273fee9740e93
copybara-service bot pushed a commit that referenced this issue Sep 24, 2024
Work towards #19301

RELNOTES: Patches to the module file in `single_version_override` are now effective as long as the patch file lies in the root module.

Closes #23536.

PiperOrigin-RevId: 678347809
Change-Id: I6a3c4664e55cff90c8e42795f95d8ef211348ca9
@Wyverald
Copy link
Member

@fmeum can this be closed now?

@fmeum fmeum closed this as completed Sep 24, 2024
fmeum added a commit to fmeum/bazel that referenced this issue Sep 25, 2024
Work towards bazelbuild#19301

RELNOTES: Patches to the module file in `single_version_override` are now effective as long as the patch file lies in the root module.

Closes bazelbuild#23536.

PiperOrigin-RevId: 678347809
Change-Id: I6a3c4664e55cff90c8e42795f95d8ef211348ca9
fmeum added a commit to fmeum/bazel that referenced this issue Sep 25, 2024
Work towards bazelbuild#19301

RELNOTES: Patches to the module file in `single_version_override` are now effective as long as the patch file lies in the root module.

Closes bazelbuild#23536.

PiperOrigin-RevId: 678347809
Change-Id: I6a3c4664e55cff90c8e42795f95d8ef211348ca9
fmeum added a commit to fmeum/bazel that referenced this issue Sep 25, 2024
Work towards bazelbuild#19301

RELNOTES: Patches to the module file in `single_version_override` are now effective as long as the patch file lies in the root module.

Closes bazelbuild#23536.

PiperOrigin-RevId: 678347809
Change-Id: I6a3c4664e55cff90c8e42795f95d8ef211348ca9
github-merge-queue bot pushed a commit that referenced this issue Sep 25, 2024
Work towards #19301

RELNOTES: Patches to the module file in `single_version_override` are
now effective as long as the patch file lies in the root module.

Closes #23536.

PiperOrigin-RevId: 678347809
Change-Id: I6a3c4664e55cff90c8e42795f95d8ef211348ca9

Closes #23605
@fmeum
Copy link
Collaborator

fmeum commented Sep 26, 2024

There are some aspects of override_repo and inject_repo that I'm not happy with yet. Maybe we can improve their API in time for the release. Both override_repo and inject_repo can result in multiple apparent names resolving to the same repo. This is not the case with Bzlmod if you don't use them and can be confusing.

  1. For inject_repo, we should probably disallow use_repoing the inject repo altogether:
local_repository(
    name = "zlib",
    ...
)

foo_deps = use_extension(..., "foo_deps")
inject_repo(foo_deps, com_github_zlib = "zlib")
# Disallow this:
use_repo(foo_deps, "com_github_zlib")

In this example, instead of having zlib and com_github_zlib in scope as aliases of each other, it would be cleaner to just refer to zlib throughout.

  1. For override_repo, the situation is more complicated (I'm using googleapis to stand in for some more complex module that contains targets for multiple languages):
local_repository(
    name = "googleapis",
    ...
)

bar_deps = use_extension(..., "bar_deps")
override_repo(bar_deps, "googleapis")
use_repo(bar_deps, bar_googleapis = "googleapis")

foo_deps = use_extension(..., "foo_deps")
override_repo(foo_deps, com_github_google_googleapis = "googleapis")
use_repo(foo_deps, "com_github_googleapis")

Assuming that googleapis is considered a direct dep by both bar_deps and foo_deps, bazel mod tidy would enforce an import of googleapis from bar_deps, which leads to an error that the user can resolve manually by using the keyword form.

Since bar_googleapis and com_github_googleapis are just aliases for googleapis, it would be much cleaner if we only had googleapis in scope and users would consistently reference that (the overriding repo). We could teach bazel mod tidy to ignore or even remove overridden repos, resulting in this module file:

bazel_dep(
    name = "googleapis",
    dev_dependency = True,
    ...
)

bar_deps = use_extension(..., "bar_deps")
override_repo(bar_deps, "googleapis")

foo_deps = use_extension(..., "foo_deps")
override_repo(foo_deps, com_github_google_googleapis = "googleapis")

This looks much nicer, but it's flawed in a subtle way: When the module is not the root module, the overrides won't be applied and all references to @googleapis will fail, even if the override isn't essential to any functionality provided by the module to its consumers. Note that this wasn't a problem with inject_repo since the injected repo can't be referenced in any way in a setting where the module isn't the root module.

I first thought that we could prohibit use_repo for overridden repos just as for injected repos, but additionally turn override_repo(bar_deps, A = "B") into use_repo(bar_deps, B = "A") in a non-root module context. But that fails in the situation above as googleapis overrides two different repos.

I don't see a better solution than having bazel mod tidy neither add nor remove use_repo arguments for overridden repos. Users can then either use the overriding repo throughout for a cleaner repo mapping or, if needed in more complex cases, can use_repo to get a name that can also be referenced in a non-root module context.

Curious to hear your thoughts about this!

fmeum added a commit to fmeum/bazel that referenced this issue Oct 10, 2024
Work towards bazelbuild#19301
Fixes bazelbuild#23580

RELNOTES: `override_repo` and `inject_repo` can be used to override and inject repos in module extensions.

Closes bazelbuild#23534.

PiperOrigin-RevId: 678139661
Change-Id: Iea7caca949c00e701f056c1037e273fee9740e93

(cherry picked from commit 46341b1)
fmeum added a commit to fmeum/bazel that referenced this issue Oct 10, 2024
Work towards bazelbuild#19301
Fixes bazelbuild#23580

RELNOTES: `override_repo` and `inject_repo` can be used to override and inject repos in module extensions.

Closes bazelbuild#23534.

PiperOrigin-RevId: 678139661
Change-Id: Iea7caca949c00e701f056c1037e273fee9740e93

(cherry picked from commit 46341b1)
fmeum added a commit to fmeum/bazel that referenced this issue Oct 10, 2024
Work towards bazelbuild#19301
Fixes bazelbuild#23580

RELNOTES: `override_repo` and `inject_repo` can be used to override and inject repos in module extensions.

Closes bazelbuild#23534.

PiperOrigin-RevId: 678139661
Change-Id: Iea7caca949c00e701f056c1037e273fee9740e93

(cherry picked from commit 46341b1)
fmeum added a commit to fmeum/bazel that referenced this issue Oct 10, 2024
Work towards bazelbuild#19301
Fixes bazelbuild#23580

RELNOTES: `override_repo` and `inject_repo` can be used to override and inject repos in module extensions.

Closes bazelbuild#23534.

PiperOrigin-RevId: 678139661
Change-Id: Iea7caca949c00e701f056c1037e273fee9740e93

(cherry picked from commit 46341b1)
fmeum added a commit to fmeum/bazel that referenced this issue Oct 10, 2024
Work towards bazelbuild#19301
Fixes bazelbuild#23580

RELNOTES: `override_repo` and `inject_repo` can be used to override and inject repos in module extensions.

Closes bazelbuild#23534.

PiperOrigin-RevId: 678139661
Change-Id: Iea7caca949c00e701f056c1037e273fee9740e93

(cherry picked from commit 46341b1)
github-merge-queue bot pushed a commit that referenced this issue Oct 10, 2024
Work towards #19301
Fixes #23580

RELNOTES: `override_repo` and `inject_repo` can be used to override and
inject repos in module extensions.

Closes #23534.

PiperOrigin-RevId: 678139661
Change-Id: Iea7caca949c00e701f056c1037e273fee9740e93

(cherry picked from commit
46341b1)

Fixes #23724
Fixes #23799

Also includes:
* Disallow importing injected repos
(8472c9d)

---------

Co-authored-by: Xùdōng Yáng <wyverald@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area-Bzlmod Bzlmod-specific PRs, issues, and feature requests P1 I'll work on this now. (Assignee required) team-ExternalDeps External dependency handling, remote repositiories, WORKSPACE file. type: feature request
Projects
Status: Done
Development

No branches or pull requests