Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SkyframeExecutor (7.x): Cannot invoke "java.lang.Throwable.getMessage()" because "cause" is null #23170

Open
jeffalder opened this issue Jul 31, 2024 · 9 comments
Assignees
Labels
P2 We'll consider working on this in future. (Assignee optional) team-Configurability platforms, toolchains, cquery, select(), config transitions team-ExternalDeps External dependency handling, remote repositiories, WORKSPACE file. type: bug

Comments

@jeffalder
Copy link

Description of the bug:

I added a load statement into one of my .bzl files:

load("@contrib_rules_jvm//java:defs.bzl", "java_junit5_test")

and running info gave me this:

FATAL: bazel crashed due to an internal error. Printing stack trace:
java.lang.NullPointerException: Cannot invoke "java.lang.Throwable.getMessage()" because "cause" is null
	at com.google.devtools.build.lib.analysis.config.InvalidConfigurationException.<init>(InvalidConfigurationException.java:53)
	at com.google.devtools.build.lib.skyframe.SkyframeExecutor.createBuildConfigurationKey(SkyframeExecutor.java:1794)
	at com.google.devtools.build.lib.skyframe.SkyframeExecutor.getConfiguration(SkyframeExecutor.java:1733)
	at com.google.devtools.build.lib.runtime.commands.InfoCommand.lambda$exec$0(InfoCommand.java:158)
	at com.google.common.base.Suppliers$NonSerializableMemoizingSupplier.get(Suppliers.java:181)
	at com.google.devtools.build.lib.runtime.commands.InfoCommand.exec(InfoCommand.java:215)
	at com.google.devtools.build.lib.runtime.BlazeCommandDispatcher.execExclusively(BlazeCommandDispatcher.java:664)
	at com.google.devtools.build.lib.runtime.BlazeCommandDispatcher.exec(BlazeCommandDispatcher.java:244)
	at com.google.devtools.build.lib.server.GrpcServerImpl.executeCommand(GrpcServerImpl.java:573)
	at com.google.devtools.build.lib.server.GrpcServerImpl.lambda$run$1(GrpcServerImpl.java:644)
	at io.grpc.Context$1.run(Context.java:566)
	at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
	at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
	at java.base/java.lang.Thread.run(Unknown Source)

In 8.0-pre, this was fixed here: c3c21d9 , but doesn't appear to have been backported into 7.x. (The problematic code is still present in 7.3.0)

The commit in which it was fixed doesn't link a PR or issue, so I can't tell why it was fixed this way. I'm willing to attempt backporting the change but I don't know if it's the right answer.

I wish I could provide a repro case, but this project is overwhelming and it's my first real exposure to bazel.

Which category does this issue belong to?

Java Rules

What's the simplest, easiest way to reproduce this bug? Please provide a minimal example if possible.

I really don't know, I'm sorry. I can work on it, but I wouldn't know what of my build is required to trigger this, and this is my first day working on anything Bazel-related. I'm hoping the commit where it was already fixed can provide some context.

Which operating system are you running Bazel on?

MacOS 14.6

What is the output of bazel info release?

release 7.0.2

If bazel info release returns development version or (@non-git), tell us how you built Bazel.

No response

What's the output of git remote get-url origin; git rev-parse HEAD ?

This is a private repo, unfortunately, so this doesn't provide useful debugging information.

If this is a regression, please try to identify the Bazel commit where the bug was introduced with bazelisk --bisect.

No response

Have you found anything relevant by searching the web?

Just the commit where it was fixed.

Any other information, logs, or outputs that you want to share?

I'm willing to troubleshoot on my end, or provide more detailed logs, but I'm unclear how to get that information. Running with --verbose_failures and --batch didn't provide anything different.

@jeffalder
Copy link
Author

As a follow-up, the fixed code isn't super helpful because e can only be null by that point in time. If e were not null, it would have been thrown at the latest by line 1964. Is there anything else in the evaluation result that could be logged, or could an exception be created from the values in the evaluation result that are non-null?

@fmeum
Copy link
Collaborator

fmeum commented Jul 31, 2024

@katre

@katre
Copy link
Member

katre commented Jul 31, 2024

I agree with your analysis about the exception always being null (and I'm sorry I didn't notice that earlier). It probably means that there is some sort of skyframe error present: the other type of error that creates ErrorInfo appears to be a skyframe dependency cycle.

You can try checking for this and calling CyclesReporter.reportCycles, see other uses for examples.

I'm happy to review a PR for either master or the 7.3 branch to address this.

@jeffalder
Copy link
Author

@katre Thanks for the quick response! I'll try to poke at some local mods maybe this afternoon (I'm on US/Pacific time, 11:30am at the moment).

Cycles are about, say, rule A that depends (transitively) on rule B, and rule B (transitively) depends on rule A? I am not thinking there should be a cycle present - the added load statement is loading a contrib package, and this was added to a deps.bzl file in our code. It doesn't seem like that should create a cycle, but I could easily be wrong!

Also - do you accept changes to 7.x? What's the base branch I should use for that?

@katre
Copy link
Member

katre commented Jul 31, 2024

Skyframe cycle detection is about skyframe nodes: ConfiguredTargetValue is one, but there are also nodes for things like ToolchainResolutionValue and PackageLookupValue.

Loaded bzl files do become Starlark values (see BzlLoadValue), so this is plausible.

@jeffalder
Copy link
Author

OK, It was indeed marked as a cycle. Here is some custom logging I put in to identify the issue:

key class com.google.devtools.build.lib.packages.WorkspaceFileValue$WorkspaceFileKey is [/Users/jeffalder/repos/the-project]/[WORKSPACE], 5
key class com.google.devtools.build.lib.skyframe.BzlLoadValue$KeyForWorkspace is KeyForWorkspace{label=//bazel:deps.bzl, isBuildPrelude=false}
key class com.google.devtools.build.lib.skyframe.BzlLoadValue$KeyForWorkspace is KeyForWorkspace{label=//bazel/java:deps.bzl, isBuildPrelude=false}
key class com.google.devtools.build.lib.skyframe.BzlLoadValue$KeyForWorkspace is KeyForWorkspace{label=@@contrib_rules_jvm//java:defs.bzl, isBuildPrelude=false}
key class com.google.devtools.build.lib.skyframe.BzlLoadValue$KeyForWorkspace is KeyForWorkspace{label=@@contrib_rules_jvm//java/private:spotbugs.bzl, isBuildPrelude=false}
key class com.google.devtools.build.lib.skyframe.BzlLoadValue$KeyForWorkspace is KeyForWorkspace{label=@@apple_rules_lint//lint:defs.bzl, isBuildPrelude=false}
key class com.google.devtools.build.lib.skyframe.BzlLoadValue$KeyForWorkspace is KeyForWorkspace{label=@@apple_linters//:defs.bzl, isBuildPrelude=false}
key class com.google.devtools.build.lib.skyframe.ContainingPackageLookupValue$Key is CONTAINING_PACKAGE_LOOKUP:@@apple_linters//
key class com.google.devtools.build.lib.skyframe.PackageLookupValue$Key is PACKAGE_LOOKUP:@@apple_linters//
key class com.google.devtools.build.lib.rules.repository.RepositoryDirectoryValue$Key is REPOSITORY_DIRECTORY:@@apple_linters

I use IDEA on this project, which allows clicking into the various bzl files called out above. The problem is the last file -- @@apple_linters//:defs.bzl, which IDEA cannot find. When I explore the /tmp directory where @@apple_rules_lint//lint:defs.bzl is, I see it in external/apple_rules_lint/lint/defs.bzl. I don't see any folder for external/apple_linters.

However, I had previously read the section on repository names, so I figured it was an alias of some sort. Indeed:

apple_rules_lint/MODULE.bazel:

linter = use_extension("//lint:extensions.bzl", "linter")

use_repo(linter, "apple_linters")

I read the README; when I tried to add the steps into WORKSPACE, the build still failed, just with a shorter cycle:

key class com.google.devtools.build.lib.packages.WorkspaceFileValue$WorkspaceFileKey is [/Users/jeffalder/repos/the-project]/[WORKSPACE], 2
key class com.google.devtools.build.lib.skyframe.BzlLoadValue$KeyForWorkspace is KeyForWorkspace{label=@@apple_rules_lint//lint:repositories.bzl, isBuildPrelude=false}
key class com.google.devtools.build.lib.skyframe.ContainingPackageLookupValue$Key is CONTAINING_PACKAGE_LOOKUP:@@apple_rules_lint//lint
key class com.google.devtools.build.lib.skyframe.PackageLookupValue$Key is PACKAGE_LOOKUP:@@apple_rules_lint//lint
key class com.google.devtools.build.lib.rules.repository.RepositoryDirectoryValue$Key is REPOSITORY_DIRECTORY:@@apple_rules_lint

It's at this point where I'm kind of stuck. I know that I can load this file elsewhere, so it's a timing issue of some sort.

apple-rules-lint is on latest, 0.3.2 (but that is from 2022). rules-jvm is one release back, 0.26.0.

@katre
Copy link
Member

katre commented Aug 1, 2024

@Wyverald Can you help figure what's happening during loading these bzl files?

@Wyverald
Copy link
Member

Wyverald commented Aug 1, 2024

From the first cycle report, it looks like you never defined the apple_linters repo in your WORKSPACE. Probably missing some macro call.

The second cycle report is similar -- you didn't define the apple_rules_lint repo before trying to load from it.

The MODULE.bazel file in apple_rules_lint is irrelevant, because you don't seem to be loading these using Bzlmod at all. (I'm guessing MODULE.bazel file is empty, or at least doesn't mention any of the repos here.)

@jeffalder
Copy link
Author

Thanks, @Wyverald, that largely pointed me in the right direction. I couldn't find any MODULE.bazel file in the repo.

The apple_linters repo is created and initialized by calling lint_setup. And, unfortunately, it doesn't appear that that method can be called more than once.

The junit5.bzl file in contrib rules imports java_test for its definition of java_junit5_test; this version of java_test appends all configured linting rules to test invocations, and to do so, linting must already be set up.

Unfortunately, the junit5.bzl file also defines the artifacts we need. If I could import just the artifact definitions, this would fix my problem. Alas, to the copy-paste I go.

apple_rules_lint should probably have a README entry saying that if @@apple_linters is not defined, you must first call lint_setup(); Unfortunately, due to the lack of any apparent maintenance on the project for the last two years, I'm not sure that would ever happen.

It would be nice if we could backport the logging fix into the 7.x branch, in case someone else runs into this. Would you accept a PR to do that?

Even better, it would be nice if the cycle reporter reported the entire chain to where it was required.

For this code:

        error.getCycleInfo().forEach((CycleInfo cycleInfo) -> {
          cycleInfo.getCycle().forEach((SkyKey key) -> {
            System.err.println("class: " + key.getClass() + " value: " + key);
          });
        });

I get this output:

class: class com.google.devtools.build.lib.packages.WorkspaceFileValue$WorkspaceFileKey value: [/Users/jeffalder/repos/the-project]/[WORKSPACE], 5
class: class com.google.devtools.build.lib.skyframe.BzlLoadValue$KeyForWorkspace value: KeyForWorkspace{label=//bazel:deps.bzl, isBuildPrelude=false}
class: class com.google.devtools.build.lib.skyframe.BzlLoadValue$KeyForWorkspace value: KeyForWorkspace{label=//bazel/java:deps.bzl, isBuildPrelude=false}
class: class com.google.devtools.build.lib.skyframe.BzlLoadValue$KeyForWorkspace value: KeyForWorkspace{label=@@contrib_rules_jvm//java/private:junit5.bzl, isBuildPrelude=false}
class: class com.google.devtools.build.lib.skyframe.BzlLoadValue$KeyForWorkspace value: KeyForWorkspace{label=@@contrib_rules_jvm//java/private:library.bzl, isBuildPrelude=false}
class: class com.google.devtools.build.lib.skyframe.BzlLoadValue$KeyForWorkspace value: KeyForWorkspace{label=@@contrib_rules_jvm//java/private:spotbugs.bzl, isBuildPrelude=false}
class: class com.google.devtools.build.lib.skyframe.BzlLoadValue$KeyForWorkspace value: KeyForWorkspace{label=@@apple_rules_lint//lint:defs.bzl, isBuildPrelude=false}
class: class com.google.devtools.build.lib.skyframe.BzlLoadValue$KeyForWorkspace value: KeyForWorkspace{label=@@apple_linters//:defs.bzl, isBuildPrelude=false}
class: class com.google.devtools.build.lib.skyframe.ContainingPackageLookupValue$Key value: CONTAINING_PACKAGE_LOOKUP:@@apple_linters//
class: class com.google.devtools.build.lib.skyframe.PackageLookupValue$Key value: PACKAGE_LOOKUP:@@apple_linters//
class: class com.google.devtools.build.lib.rules.repository.RepositoryDirectoryValue$Key value: REPOSITORY_DIRECTORY:@@apple_linters

But the cycle reporter only tells me this much:

ERROR: Failed to load Starlark extension '@@apple_linters//:defs.bzl'.
Cycle in the workspace file detected. This indicates that a repository is used prior to being defined.
The following chain of repository dependencies lead to the missing definition.
 - @@apple_linters
This could either mean you have to add the '@@apple_linters' repository with a statement like `http_archive` in your WORKSPACE file (note that transitive dependencies are not added automatically), or move an existing definition earlier in your WORKSPACE file.

Could the cycle reporter be improved to report the trace of how we got to expecting that repository definition? Like Exception stack traces, knowing the full context is extremely useful.

@hvadehra hvadehra added team-ExternalDeps External dependency handling, remote repositiories, WORKSPACE file. and removed team-Rules-Java Issues for Java rules labels Aug 5, 2024
shs96c pushed a commit to apple/apple_rules_lint that referenced this issue Aug 7, 2024
In troubleshooting bazelbuild/bazel#23170, I had to dig deep into this code to understand what was happening. Bazel's default error does not apply to this situation, and Bazel's own debugging is also unhelpful. In fact, per the issue, in 7.x, you get `java.lang.NullPointerException: Cannot invoke "java.lang.Throwable.getMessage()" because "cause" is null`.

I think this change would at least make it easier for folks to identify this issue in their own builds.
@meteorcloudy meteorcloudy added P2 We'll consider working on this in future. (Assignee optional) team-Configurability platforms, toolchains, cquery, select(), config transitions and removed untriaged P2 We'll consider working on this in future. (Assignee optional) labels Aug 13, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
P2 We'll consider working on this in future. (Assignee optional) team-Configurability platforms, toolchains, cquery, select(), config transitions team-ExternalDeps External dependency handling, remote repositiories, WORKSPACE file. type: bug
Projects
None yet
Development

No branches or pull requests

9 participants