Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature request: --show-rulekey #7962

Open
linzhp opened this issue Apr 5, 2019 · 53 comments
Open

Feature request: --show-rulekey #7962

linzhp opened this issue Apr 5, 2019 · 53 comments
Labels
not stale Issues or PRs that are inactive but not considered stale P2 We'll consider working on this in future. (Assignee optional) team-Core Skyframe, bazel query, BEP, options parsing, bazelrc type: feature request

Comments

@linzhp
Copy link
Contributor

linzhp commented Apr 5, 2019

Description of the problem / feature request:

Buck has the concept of rule keys, which we can obtain by running buck targets --show-rulekey //.... We'd love to see similar feature in Bazel.

Feature requests: what underlying problem are you trying to solve with this feature?

We need to see what targets are changed, thus need to rebuilt and test, from one revision to another. In order to keep master green at scale, we need to build and test diffs in parallel. Getting the rule key for each target help us decide what diffs are independent and safe to build and test in parallel.

@irengrig irengrig added team-Rules-Server Issues for serverside rules included with Bazel type: feature request untriaged labels Apr 8, 2019
@vitarb
Copy link

vitarb commented Apr 15, 2019

To give a little more context, this feature would be extremely useful and will help to produce a list of changed targets between two commits on CI allowing correctly scoped builds (instead of //...) and also allowing other pre-land optimizations. As previous discussion suggests there is no easy way to detect changes in the build graph when it comes to modifications in starlark files (WORKSPACE, BUILD, *.bzl, etc)
Although it's possible to implement this logic outside of bazel, it would be prone to either false positives or false negatives and can also be rather slow.

For example I've built reasonably working prototype that:

  1. Runs bazel query deps(//...), parses it's output and builds in-memory representation of the dependency graph.
  2. Finds a list of changed files (using git) and for each:
    a) for source files - finds all reverse dependencies in the build graph (easy case).
    b) for starlark files - parses the file using bazel's starlark parser at both revisions, hashes all expressions and finds which ones changed.
  3. Translates each changed call expression or assignment into bazel path that it can be referenced from (e.g. //src/foo:foo for BUILD files) and (@foo for WORKSPACE file) and find all reverse dependencies of that marking them as changed.
  4. Native rules are just assignments in the starlark file and are not represented as rule-inputs in the build graph so they require special logic to handle.

Although it works fast and requires only one bazel query it may lead to possible false negatives:

  1. It might be not able to handle some edge cases when there is no connection between starlark definitions and dependency graph (for example if you change go version defined in the go toolchain it won't detect any changes as toolchain info is not present in the dependency graph, work-around for that would be extracting go version into a variable and using it as a tag in the io_bazel_rules_go which comes as a dependency for all go rules)
  2. Identifying if given starlark rule has changed requires handling various scenarios in the starlark syntax such as macros, assignments and call statements. It's relatively easy to miss some edge case.

We've also considered another approach, which relies on query of rbuildfiles for each changed starlark file but what makes it not acceptable is that any change to the WORKSPACE file would trigger pretty much full repo rebuild as all third-party dependencies are defined there. In addition to false positives it also has scalability issues as it needs to issues a separate query for each changed starlark file which runs ~0.5 sec, meaning that reasonably sized refactoring affecting 1000s of build files can be computing changed targets for 10s of minutes.

Are there any reasons preventing us from adding this feature into bazel core and printing checksum next to each rule in the dependency graph?

@Globegitter
Copy link

I agree imo that should be part of bazel core - @vitarb is there any chance you can share the prototype you have mentioned? As you may have seen we are interested in a performant solution to this and apart from including this into bazel core itself (which I hope will happen at some point) this imo seems like the most general version and least prone to errors, even if it still has some edge cases.

@vitarb
Copy link

vitarb commented Apr 17, 2019

I agree imo that should be part of bazel core - @vitarb is there any chance you can share the prototype you have mentioned? As you may have seen we are interested in a performant solution to this and apart from including this into bazel core itself (which I hope will happen at some point) this imo seems like the most general version and least prone to errors, even if it still has some edge cases.

@Globegitter, let me refine it a little bit and I will try to share something.
Meanwhile one my colleagues suggested alternative implementation that computes target hashes by hashing workspace source files and bazel's query output which has all rule parameters in the proto format. This approach currently covers all major scenarios including changes to files in the repo, WORKSPACE changes, BUILD file changes and changes to native rules/starlark files.
What I like about this approach is that it doesn't require processing starlark files and evaluating hashes for them, which is the most complex and error prone part of the solution suggested above.

On the other hand downsides are:

  • It requires at least two queries (on old and new revision)
  • Requires hashing of the entire source tree twice.
  • Changes to non-starlark rules can not be detected easily.

Since most of these issues can be either mitigated by proper cashing or other improvements it seems to be a pretty good approach overall and can lead to a concise and VCS independent solution.

At the same time having bazel do same would probably be more efficient.

@lberki lberki added team-Core Skyframe, bazel query, BEP, options parsing, bazelrc and removed team-Rules-Server Issues for serverside rules included with Bazel labels May 6, 2019
@lberki
Copy link
Contributor

lberki commented May 6, 2019

Bouncing over to the Core people. However, given that rule keys are not how Bazel works (analysis phase caching relies on Java object identity alone and it's only actions that have hashes, which are a function of the contents of their input files), I wouldn't hold my breath.

@purkhusid
Copy link

@vitarb You don't happen to have anything that you can share? I've been looking for something that solves this problem but I've found no concrete solutions. Pants has a flag --changed-parent=<git hash here> that takes care of this but something similar is very much needed for Bazel

@purkhusid
Copy link

@lberki Would it be possible to get someone from the core team to comment on this issue? I'm wondering if there are some major technical hurdles that need to be solved to be able to add a digest for each target in either query or aquery.

aquery already has a action key in it's output. I wonder if it would be possible to add a digest per action that as well? I'm willing to take a stab at it if it's something that could possibly be accepted into Bazel.

@lberki
Copy link
Contributor

lberki commented Nov 11, 2019

/cc @ericfelly

I would love this to happen, but it requires a lot of starts to align

@ericfelly
Copy link
Contributor

It's sounds like this is a general request for finding the affected targets given a change to the repo. Is that right? Is the rule key stuff an actual requirement, or just one possible approach?

@haxorz

@haxorz
Copy link
Contributor

haxorz commented Nov 11, 2019

[@purkhusid ] a digest for each target in either query or aquery... aquery already has a action key in it's output

to elaborate on @lberki 's response, buck's https://buck.build/concept/rule_keys.html are transitive but bazel's keys are non-transitive.

the usefulness of transitive digests is, i presume, that you'd have this useful property for deciding when to rebuild test targets:
If TRANSITIVE_DIGEST(//foo:bar) is v1 at source state A
And TRANSITIVE_DIGEST(//foo:bar) is v2 at source state B
Then v1 = v2 iff "//foo:bar doesn't need to be rebuilt at source state B"

[@ericfelly ] It's sounds like this is a general request for finding the affected targets given a change to the repo. Is that right? Is the rule key stuff an actual requirement, or just one possible approach?

yes, that's my assessment too.

@purkhusid
Copy link

@ericfelly @haxorz In my case it's to figure out what artifacts changed between two commits so that we can deploy only changed artifacts. So the rule key is just one way to do this, but I think there are multiple ways to achieve this.

I'm not familiar enough with Bazel internals to find the most idiomatic way to achieve this within bazel but I can suggest a few.

  • Include the digest for each action in aquery
    • This way you could do a aquery at 2 revisions and compre each action in each target
  • Include the digest for for all actions in BEP
    • This way you could just output the BEP events and compare at two revisions. But actions that are cached would also have to be shown in the output.
  • Add the digests to bazel dump
  • Add some way to dump the whole build graph including the digests
    • Then you could just compare the graphs at 2 revisions

Some way to achieve this would fill a much needed gap for Bazel. In my case it would solve what deployment targets I should run and for large repos where bazel build //... is too slow it would provide a way to only run changed targets.

@haxorz
Copy link
Contributor

haxorz commented Nov 12, 2019

@purkhusid but can you please confirm/deny the transitive part of my previous comment?

i can see two general approaches to a "determine affected targets at source version A" oracle that lives outside* of bazel:

(1) use specially-crafted repo-scale bazel query invocations at versions pred(A) and A, driven by the source diff between pred(A) and A, to determine all the affected targets at version A.

(2) have bazel dump a transitive hash of every target in the repo at versions pred(A) and A, then compare hashes pairwise. treat any target T whose hash has changed as affected at version A.

am i missing something, or have i concisely summarized the two approaches?

notes:

  • @vitarb's first comment describes an attempt to do (1).
  • a good implementation of (1) uses a non-transitive hash to compute "directly affected targets". the idea is to hash all of the target's attribute values and also its .bzl rule implementation -- this way comparisons are cheaper.
  • i am attempting to confirm that @purkhusid's et al. has some something like (2) in mind.

there are tradeoffs between these two general approaches. (2) unconditionally does repo-scale work, since it unconditionally computes and dumps a [transitive] hash of every target in the repo. contrast that with (1), which does basically no work for trivial changes. but the downside of (1) is the worst-case amount of work is perhaps higher since a repo-scale bazel query is perhaps more expensive than a mythical transitive hash dump of every target [eh, not necessarily].

* i say "lives outside bazel" because bazel's incrementality engine is ofc its own oracle but in @vitarb's first comment they say they don't want to unconditionally run bazel test //... at every source version. this is presumably because bazel doesn't scale infinitely.

@purkhusid
Copy link

purkhusid commented Nov 12, 2019

@haxorz These 2 options do pretty much summarize it.

Option (2) does sound like the most user friendly way of doing this. All attempts at (1) that I've seen so far require that you jump through various hoops and usually end up being very complicated as can be seen here for example: https://groups.google.com/forum/#!msg/bazel-discuss/I9udqWIcEdI/iczVgWLOBQAJ

Going the (2) route would make this more native to bazel and a whole lot easier to do. But my guess is that creating transitive keys for each action/rule is a non-trivial task?

@haxorz
Copy link
Contributor

haxorz commented Nov 12, 2019

@purkhusid But my guess is that creating transitive keys for each action/rule is a non-trivial task?

correct, both in terms of amount of code that would need to be written and also in terms of the runtime cost of that code. expanding on the latter, i think we'd not want this code to run by default (and when it runs we don't want to store the full results inside of the bazel server either); e.g. going with your aquery idea, we could add a flag --compute_and_dump_transitive_hashes to aquery.

@robbertvanginkel
Copy link
Contributor

I work with @linzhp and @vitarb, for some context on why we use use the list of changed targets between revisions and how it is implemented:

  • Don't run bazel test //..., but a subset. Running less targets is useful while we're still working to get features like Build without the bytes and remote execution up and running.
  • Figure out artifacts that have changed between revisions that need to be published (containers, documentation, etc).
  • A service that merges requests into the master branch uses the changed targets (and graph) information to determine which changes don't intersect, and therefore can be merged independently.

There's some more ideas, such as figuring out which targets changed directly vs which ones changed because a dependency changed, and test directly affected targets with more expensive features like msan or a race detector etc.

Our current implementation is like the mentioned 2):

  • For each revision, run a bazel query --output=proto --order_output=full "//external:all-targets + deps(//...:all-targets)
  • Compute hashes for each target. The hash contains: target's attributes (filter a few that contain machine specific paths) and hashes of the target's inputs (so the hashes are transitive, which because of using --order_output=full is actually a pretty straightforward loop over the output proto). Hashes for file dependencies either come from disk or an input like git ls-tree --full-tree -r. The proto output contains a $rule_implementation_hash attribute, so it takes into account rule changes.
  • Diff the hashes as needed and write out lists/graphs of filtered changed targets. Depending on what use case the list would feed into, we'd filter by tags, target type, etc.

We chose this approach over 1) as it was more straightforward to implement and predict the performance and accuracy. The approach mentioned in 1) gets particularly tricky when dealing with changes to the workspace and rule implementations.

We ended up choosing query over aquery because a few reasons: aquery was still experimental at the time of implementation, performance of query being better than aquery and internal systems that needed to know about a graph of unconfigured targets. Although this has mostly been fine for us, we have run into some edge cases where valid bazel build graphs were not easily hash-able at the unconfigured target level. Specifically we had a dependency where target //a would depend on target //b for darwin, but target //b would depend on //a when configured for linux. While there was some debate and ideas for how to account for that, we ended up just fixing the "cycle" in the dependency. Going with an aquery based solution is likely the more correct approach.

I'd love to see this supported in bazel at some point. One caveat of our approach is that our list of targets is getting rather long and unfortunately bazel doesn't have support for invocation with argfiles, so we use a bit of a bazelrc workaround to feed the list into a build step (#8609 (comment)). Meanwhile I'll see if we can make the separate go code we use for the above open source.

@ericfelly
Copy link
Contributor

Thank you for that context.

One relevant fact that I failed to mention is that we do have nascent / experimental code in place for top-down caching : c5c078c

I imagine we could come up with some interface to expose the action sketches if this is the sort of thing that would facilitate your work here.

@purkhusid
Copy link

@robbertvanginkel It would be pretty awesome if your approach could be open sourced while Bazel does not have the tools needed available. We are in the early stages of our Bazel adoption and this is our biggest pain point at the moment. We would gladly help with making it more robust.

@ericfelly Is this some form of the transitive keys that @haxorz talked about?

@ericfelly
Copy link
Contributor

Yes these are a form of transitive keys. We don't have an interface which exposes them directly. What would you like it to look like?

@purkhusid
Copy link

Is it possible to expose it in aquery via a flag?

@ericfelly
Copy link
Contributor

@meisterT do you think the action sketches could be exposed in aquery? how difficult would that be?

@meisterT
Copy link
Member

I am not yet familiar with action sketches. When are they computed?
cc @joeleba who is picking up some aquery work

@ericfelly
Copy link
Contributor

They are currently computed when you have top-down caching (experimental feature) enabled. See ActionSketchFunction.

What you could do is, when you run the analysis phase, launch the ActionSketchFunction for each action you come across. Then you could expose the action sketch in the output of aquery.

@mmlac
Copy link
Contributor

mmlac commented Dec 9, 2019

We have a similar use-case. The question we want to answer is simply “what targets changed since the last master commit” to determine which need to be deployed.

What form factor that output has is not as important (diff a set of hashes between two git commits straight in bazel, a list of targets + hash as build artifact and then a manual diff, etc).

While hashing the deployment files works as a workaround, a query answering the question “what targets changed between commit A and B” seems very useful.

@purkhusid
Copy link

@robbertvanginkel @linzhp @vitarb Could you elaborate on what part of the query output you use to calculate the hash for each target? I'm taking a look at doing something similar but I'm not so sure what parts of the proto output I should be interested in.

@tinder-maxwellelliott
Copy link

I feel like Google Search engine is pushing all of us to this very post to talk about how to fix something that Bazel should probably have 🤣

I have taken a stab at the SHA generation for the whole graph based on the Proto output (as described by @robbertvanginkel) in the following Gist. I did have some questions though about this approach

  1. What attributes from the Rule should be part of the SHA we create? Right now I use all of them (except location), not sure if that is right

Looping back here one last time! The algorithm in this Gist. is working well in our CI systems have not seen it miss anything yet. Good luck to anyone else trying to implement this by hand.

@purkhusid
Copy link

@robbertvanginkel @linzhp Has there been any luck with open sourcing what your have at Uber? I took a stab at this myself but I have a feeling that I might have missed some edge cases: https://github.com/purkhusid/biff

@rohansingh
Copy link

@purkhusid I tried out biff just now and I'm impressed, it seems to work out of the box. I haven't thrown any weird edge cases at it but it seems to be working for most "normal" commits.

@purkhusid
Copy link

@purkhusid I tried out biff just now and I'm impressed, it seems to work out of the box. I haven't thrown any weird edge cases at it but it seems to be working for most "normal" commits.

@rohansingh Awesome! I've been meaning to put some more time into it and add some tests to validate that it does the right thing. If you have any improvements you would like to add to it then don't hesitate to send a PR/create an issue.

@janakdr
Copy link
Contributor

janakdr commented Sep 15, 2020

Internally at Google, we use something more like @haxorz 's (1). For that reason, I think it's unlikely that we'll prioritize exposing such a transitive hash. However, the action sketch mentioned by @ericfelly seems like it could be a good option in concert with aquery. It's basically all open-sourced already, so PRs to integrate it seem reasonable, either from Google or other contributors.

@janakdr janakdr added P2 We'll consider working on this in future. (Assignee optional) and removed untriaged labels Sep 15, 2020
@tinder-maxwellelliott
Copy link

I finally got around to open sourcing our target selection system, seen here https://github.com/Tinder/bazel-diff. It is a ready to go CLI system to allow you to perform target selection on massive codebases (handles massive Bazel query argument lists, and massive Bazel Protobuf's via the streamed-proto output option)

@haxorz
Copy link
Contributor

haxorz commented Sep 24, 2020

@tinder-maxwellelliott Cool! I skimmed through your code and it LGTM!

Fyi: You would have been bit by #12086 (specifically at https://github.com/Tinder/bazel-diff/blob/master/src/main/java/com/bazel-diff/BazelRule.java#L24), so I just wanted to make sure you're aware of that bug (and its fix!).

@tinder-maxwellelliott
Copy link

@tinder-maxwellelliott Cool! I skimmed through your code and it LGTM!

Fyi: You would have been bit by #12086 (specifically at https://github.com/Tinder/bazel-diff/blob/master/src/main/java/com/bazel-diff/BazelRule.java#L24), so I just wanted to make sure you're aware of that bug (and its fix!).

Basically we cannot rely on rule_implementation_hash until a new Bazel release? Unsure what action I can take right now in the repo to resolve this

@haxorz
Copy link
Contributor

haxorz commented Sep 24, 2020

Yes, using rule_implementation_hash (aka skylark_environment_hash_code in build.proto-land) will give you false negatives wrt the "has this target been directly affected?" decision (using my lingo in #7962 (comment)). There's no action you can take in your code (I guess you could be super-duper conversative and pretend that all targets of bzl-defined rules are always "directly affected" but that seems bad). We were bitten by the bug internally at Google too :(

The bug occurs only in the situation described in the issue title. Maybe that doesn't happen in the codebase in which you use your bazel-diff tool? It certainly happened in @linzhp 's codebase (hence why they filed the issue), and it certainly happened in Google's internal codebase.

The bug was introduced in commit 9f2cab5 on 12 May. I don't know offhand which Bazel release first included that commit, nor do I know which release will first include the fix. I can look up that info for you if you want.

@rohansingh
Copy link

Bazel v3.3.0 is the first release that included the bug.

@linzhp
Copy link
Contributor Author

linzhp commented Sep 24, 2020

Yes, the rule_implementation_hash issue is still affecting Uber's code base. Any time line when the fix can be released?

@meisterT
Copy link
Member

cc @aiuto for the 3.5.1 patch release and @laurentlb for the 3.6.0 release

@codesuki
Copy link

codesuki commented Apr 6, 2021

To save people from digging, it seems the issue @haxorz mentioned is fixed in 3.7.
#12086 (comment)

@github-actions
Copy link

Thank you for contributing to the Bazel repository! This issue has been marked as stale since it has not had any activity in the last 1+ years. It will be closed in the next 14 days unless any other activity occurs or one of the following labels is added: "not stale", "awaiting-bazeler". Please reach out to the triage team (@bazelbuild/triage) if you think this issue is still relevant or you are interested in getting the issue resolved.

@github-actions github-actions bot added the stale Issues or PRs that are stale (no activity for 30 days) label May 10, 2023
@github-actions
Copy link

This issue has been automatically closed due to inactivity. If you're still interested in pursuing this, please reach out to the triage team (@bazelbuild/triage). Thanks!

@github-actions github-actions bot closed this as not planned Won't fix, can't repro, duplicate, stale May 24, 2023
@keithl-stripe
Copy link

this would still be useful. can we reopen?

@keithl-stripe
Copy link

cc @bazelbuild/triage

@sgowroji sgowroji removed the stale Issues or PRs that are stale (no activity for 30 days) label May 24, 2023
@sgowroji sgowroji reopened this May 24, 2023
@keertk keertk added the not stale Issues or PRs that are inactive but not considered stale label Feb 28, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
not stale Issues or PRs that are inactive but not considered stale P2 We'll consider working on this in future. (Assignee optional) team-Core Skyframe, bazel query, BEP, options parsing, bazelrc type: feature request
Projects
None yet
Development

No branches or pull requests