Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

NPE while collecting worker metrics #18545

Open
pettermahlen opened this issue May 31, 2023 · 0 comments
Open

NPE while collecting worker metrics #18545

pettermahlen opened this issue May 31, 2023 · 0 comments
Labels
help wanted Someone outside the Bazel team could own this P3 We're not considering working on this, but happy to review a PR. (No assignee) team-Local-Exec Issues and PRs for the Execution (Local) team type: bug

Comments

@pettermahlen
Copy link

Description of the bug:

We occasionally see this error in CI:

java.lang.NullPointerException: Null lastCallTime
      at com.google.devtools.build.lib.worker.AutoValue_WorkerMetric_WorkerStat.<init>(AutoValue_WorkerMetric_WorkerStat.java:21)
      at com.google.devtools.build.lib.worker.WorkerMetric$WorkerStat.create(WorkerMetric.java:51)
      at com.google.devtools.build.lib.worker.WorkerMetricsCollector.collectMetrics(WorkerMetricsCollector.java:224)
      at com.google.devtools.build.lib.profiler.CollectLocalResourceUsage.run(CollectLocalResourceUsage.java:179)

It appears to be caused by a race between this method:

  public void registerWorker(WorkerMetric.WorkerProperties properties) {
    int workerId = properties.getWorkerId();

    workerIdToWorkerProperties.putIfAbsent(workerId, properties);
    workerLastCallTime.put(workerId, Instant.ofEpochMilli(clock.currentTimeMillis()));
  }

And the iteration over the keys in the workerIdToWorkerProperties map in collectMetrics, where it's possible for the workerLastCallTime map to not yet have an entry for a worker ID that is present in the workerIdToWorkerProperties map.

Since d31dd09, this should no longer be a problem on master, but we're running an older version of Bazel.

What's the simplest, easiest way to reproduce this bug? Please provide a minimal example if possible.

No easy repro.

Which operating system are you running Bazel on?

linux, windows, macos

What is the output of bazel info release?

release 6.1.0-ec97d6a

If bazel info release returns development version or (@non-git), tell us how you built Bazel.

We apply some local patches, they should not affect this behaviour.

What's the output of git remote get-url origin; git rev-parse master; git rev-parse HEAD ?

No response

Is this a regression? If yes, please try to identify the Bazel commit where the bug was introduced.

d233c89 introduced the race.

Have you found anything relevant by searching the web?

No response

Any other information, logs, or outputs that you want to share?

This patch probably fixes the issue by ensuring that the workerLastCallTime map gets populated before the workerIdToWorkerProperties map:

diff --git a/src/main/java/com/google/devtools/build/lib/worker/WorkerMetricsCollector.java b/src/main/java/com/google/devtools/build/lib/worker/WorkerMetricsCollector.java
index b301f6db93..ecee7b4965 100644
--- a/src/main/java/com/google/devtools/build/lib/worker/WorkerMetricsCollector.java
+++ b/src/main/java/com/google/devtools/build/lib/worker/WorkerMetricsCollector.java
@@ -263,8 +263,8 @@ public class WorkerMetricsCollector {
   public void registerWorker(WorkerMetric.WorkerProperties properties) {
     int workerId = properties.getWorkerId();

-    workerIdToWorkerProperties.putIfAbsent(workerId, properties);
     workerLastCallTime.put(workerId, Instant.ofEpochMilli(clock.currentTimeMillis()));
+    workerIdToWorkerProperties.putIfAbsent(workerId, properties);
   }

   private synchronized MetricsWithTime updateLastCollectMetrics(

We're going include that in our local patches, and should be able to confirm whether that fixes the issue

@Pavank1992 Pavank1992 added the team-Local-Exec Issues and PRs for the Execution (Local) team label May 31, 2023
@wilwell wilwell self-assigned this Jun 6, 2023
@wilwell wilwell removed the untriaged label Jun 6, 2023
@meisterT meisterT added P3 We're not considering working on this, but happy to review a PR. (No assignee) help wanted Someone outside the Bazel team could own this labels Aug 3, 2023
@wilwell wilwell removed their assignment Aug 28, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
help wanted Someone outside the Bazel team could own this P3 We're not considering working on this, but happy to review a PR. (No assignee) team-Local-Exec Issues and PRs for the Execution (Local) team type: bug
Projects
None yet
Development

No branches or pull requests

5 participants