-
Notifications
You must be signed in to change notification settings - Fork 4.3k
Description
Description of the bug:
We occasionally see this error in CI:
java.lang.NullPointerException: Null lastCallTime
at com.google.devtools.build.lib.worker.AutoValue_WorkerMetric_WorkerStat.<init>(AutoValue_WorkerMetric_WorkerStat.java:21)
at com.google.devtools.build.lib.worker.WorkerMetric$WorkerStat.create(WorkerMetric.java:51)
at com.google.devtools.build.lib.worker.WorkerMetricsCollector.collectMetrics(WorkerMetricsCollector.java:224)
at com.google.devtools.build.lib.profiler.CollectLocalResourceUsage.run(CollectLocalResourceUsage.java:179)
It appears to be caused by a race between this method:
public void registerWorker(WorkerMetric.WorkerProperties properties) {
int workerId = properties.getWorkerId();
workerIdToWorkerProperties.putIfAbsent(workerId, properties);
workerLastCallTime.put(workerId, Instant.ofEpochMilli(clock.currentTimeMillis()));
}
And the iteration over the keys in the workerIdToWorkerProperties
map in collectMetrics
, where it's possible for the workerLastCallTime
map to not yet have an entry for a worker ID that is present in the workerIdToWorkerProperties
map.
Since d31dd09, this should no longer be a problem on master, but we're running an older version of Bazel.
What's the simplest, easiest way to reproduce this bug? Please provide a minimal example if possible.
No easy repro.
Which operating system are you running Bazel on?
linux, windows, macos
What is the output of bazel info release
?
release 6.1.0-ec97d6a
If bazel info release
returns development version
or (@non-git)
, tell us how you built Bazel.
We apply some local patches, they should not affect this behaviour.
What's the output of git remote get-url origin; git rev-parse master; git rev-parse HEAD
?
No response
Is this a regression? If yes, please try to identify the Bazel commit where the bug was introduced.
d233c89 introduced the race.
Have you found anything relevant by searching the web?
No response
Any other information, logs, or outputs that you want to share?
This patch probably fixes the issue by ensuring that the workerLastCallTime
map gets populated before the workerIdToWorkerProperties
map:
diff --git a/src/main/java/com/google/devtools/build/lib/worker/WorkerMetricsCollector.java b/src/main/java/com/google/devtools/build/lib/worker/WorkerMetricsCollector.java
index b301f6db93..ecee7b4965 100644
--- a/src/main/java/com/google/devtools/build/lib/worker/WorkerMetricsCollector.java
+++ b/src/main/java/com/google/devtools/build/lib/worker/WorkerMetricsCollector.java
@@ -263,8 +263,8 @@ public class WorkerMetricsCollector {
public void registerWorker(WorkerMetric.WorkerProperties properties) {
int workerId = properties.getWorkerId();
- workerIdToWorkerProperties.putIfAbsent(workerId, properties);
workerLastCallTime.put(workerId, Instant.ofEpochMilli(clock.currentTimeMillis()));
+ workerIdToWorkerProperties.putIfAbsent(workerId, properties);
}
private synchronized MetricsWithTime updateLastCollectMetrics(
We're going include that in our local patches, and should be able to confirm whether that fixes the issue