Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

"Worker process returned an unparseable WorkResponse!" when worker runs out of memory #5767

Closed
jirkadanek opened this issue Aug 4, 2018 · 6 comments
Labels
P3 We're not considering working on this, but happy to review a PR. (No assignee) team-Local-Exec Issues and PRs for the Execution (Local) team type: support / not a bug (process)

Comments

@jirkadanek
Copy link

Description of the problem / feature request:

$ bazel build //... --define NIX=
DEBUG: /home/jdanek/.cache/bazel/_bazel_jdanek/2e6bf0550407faabfb7b338e84352554/external/bazel_tools/tools/cpp/lib_cc_configure.bzl:115:5: 
Auto-Configuration Warning: Cannot find gcov or GCOV; either correct your path or set the GCOV environment variable
INFO: Analysed 996 targets (1 packages loaded).
INFO: Found 996 targets...
ERROR: /home/jdanek/Work/repos/activemq-artemis/tests/integration-tests/BUILD:67:1: Building tests/integration-tests/libtestslib.jar (689 source files) and running annotation processors (LoggingToolsProcessor) failed: Worker process returned an unparseable WorkResponse!

Did you try to print something to stdout? Workers aren't allowed to do this, as it breaks the protocol between Bazel and the worker process.

---8<---8<--- Exception details ---8<---8<---
com.google.protobuf.InvalidProtocolBufferException: While parsing a protocol message, the input ended unexpectedly in the middle of a field.  This could mean either that the input has been truncated or that an embedded message misreported its own length.
        at com.google.protobuf.InvalidProtocolBufferException.truncatedMessage(InvalidProtocolBufferException.java:86)
        at com.google.protobuf.CodedInputStream$StreamDecoder.refillBuffer(CodedInputStream.java:2729)
        at com.google.protobuf.CodedInputStream$StreamDecoder.skipRawBytesSlowPath(CodedInputStream.java:3015)
        at com.google.protobuf.CodedInputStream$StreamDecoder.skipRawBytes(CodedInputStream.java:2989)
        at com.google.protobuf.CodedInputStream$StreamDecoder.skipField(CodedInputStream.java:2087)
        at com.google.protobuf.GeneratedMessageV3.parseUnknownFieldProto3(GeneratedMessageV3.java:303)
        at com.google.devtools.build.lib.worker.WorkerProtocol$WorkResponse.<init>(WorkerProtocol.java:1866)
        at com.google.devtools.build.lib.worker.WorkerProtocol$WorkResponse.<init>(WorkerProtocol.java:1830)
        at com.google.devtools.build.lib.worker.WorkerProtocol$WorkResponse$1.parsePartialFrom(WorkerProtocol.java:2420)
        at com.google.devtools.build.lib.worker.WorkerProtocol$WorkResponse$1.parsePartialFrom(WorkerProtocol.java:2415)
        at com.google.protobuf.AbstractParser.parsePartialFrom(AbstractParser.java:221)
        at com.google.protobuf.AbstractParser.parsePartialDelimitedFrom(AbstractParser.java:262)
        at com.google.protobuf.AbstractParser.parseDelimitedFrom(AbstractParser.java:275)
        at com.google.protobuf.AbstractParser.parseDelimitedFrom(AbstractParser.java:280)
        at com.google.protobuf.AbstractParser.parseDelimitedFrom(AbstractParser.java:49)
        at com.google.protobuf.GeneratedMessageV3.parseDelimitedWithIOException(GeneratedMessageV3.java:347)
        at com.google.devtools.build.lib.worker.WorkerProtocol$WorkResponse.parseDelimitedFrom(WorkerProtocol.java:2082)
        at com.google.devtools.build.lib.worker.WorkerSpawnRunner.execInWorker(WorkerSpawnRunner.java:313)
        at com.google.devtools.build.lib.worker.WorkerSpawnRunner.actuallyExec(WorkerSpawnRunner.java:154)
        at com.google.devtools.build.lib.worker.WorkerSpawnRunner.exec(WorkerSpawnRunner.java:112)
        at com.google.devtools.build.lib.exec.AbstractSpawnStrategy.exec(AbstractSpawnStrategy.java:95)
        at com.google.devtools.build.lib.exec.AbstractSpawnStrategy.exec(AbstractSpawnStrategy.java:63)
        at com.google.devtools.build.lib.exec.SpawnActionContextMaps$ProxySpawnActionContext.exec(SpawnActionContextMaps.java:362)
        at com.google.devtools.build.lib.analysis.actions.SpawnAction.internalExecute(SpawnAction.java:287)
        at com.google.devtools.build.lib.analysis.actions.SpawnAction.execute(SpawnAction.java:294)
        at com.google.devtools.build.lib.skyframe.SkyframeActionExecutor.executeActionTask(SkyframeActionExecutor.java:960)
        at com.google.devtools.build.lib.skyframe.SkyframeActionExecutor.prepareScheduleExecuteAndCompleteAction(SkyframeActionExecutor.java:891)
        at com.google.devtools.build.lib.skyframe.SkyframeActionExecutor.access$900(SkyframeActionExecutor.java:115)
        at com.google.devtools.build.lib.skyframe.SkyframeActionExecutor$ActionRunner.call(SkyframeActionExecutor.java:746)
        at com.google.devtools.build.lib.skyframe.SkyframeActionExecutor$ActionRunner.call(SkyframeActionExecutor.java:700)
        at java.util.concurrent.FutureTask.run(FutureTask.java:266)
        at com.google.devtools.build.lib.skyframe.SkyframeActionExecutor.executeAction(SkyframeActionExecutor.java:442)
        at com.google.devtools.build.lib.skyframe.ActionExecutionFunction.checkCacheAndExecuteIfNeeded(ActionExecutionFunction.java:503)
        at com.google.devtools.build.lib.skyframe.ActionExecutionFunction.compute(ActionExecutionFunction.java:224)
        at com.google.devtools.build.skyframe.AbstractParallelEvaluator$Evaluate.run(AbstractParallelEvaluator.java:382)
        at com.google.devtools.build.lib.concurrent.AbstractQueueVisitor$WrappedRunnable.run(AbstractQueueVisitor.java:355)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
        at java.lang.Thread.run(Thread.java:748)
---8<---8<--- End of exception details ---8<---8<---

---8<---8<--- Start of log ---8<---8<---
#
# There is insufficient memory for the Java Runtime Environment to continue.
# Native memory allocation (mmap) failed to map 187170816 bytes for committing reserved memory.
# An error report file with more information is saved as:
# /home/jdanek/.cache/bazel/_bazel_jdanek/2e6bf0550407faabfb7b338e84352554/execroot/__main__/hs_err_pid18135.log
 the Java Runtime Environment to continue.
# Native memory allocation (mmap) failed to map 187170816 bytes for committing reserved memory.
# An error report file with more information is saved as:
# /home/jdanek/.cache/bazel/_bazel_jdanek/2e6bf0550407faabfb7b338e84352554/execroot/__main__/hs_err_pid18135.log
---8<---8<--- End of log ---8<---8<---
INFO: Elapsed time: 219.090s, Critical Path: 197.55s
INFO: 67 processes: 27 linux-sandbox, 40 worker.
FAILED: Build did NOT complete successfully

Feature requests: what underlying problem are you trying to solve with this feature?

I'd like to see a better error message which points out the problem straight away, instead of the "unparsable WorkResponse" message, which seems to be targeted at bazel developers (who built the worker).

Having a suggestion about a possible solution of the memory problem could be also useful. Perhaps what to change about -Xmx java parameter?

What operating system are you running Bazel on?

NixOS Linux

What's the output of bazel info release?

release 0.15.2- (@non-git)

@jirkadanek
Copy link
Author

hs_err_pid18135.log

@eLvErDe
Copy link

eLvErDe commented Aug 28, 2018

Same issue here, reverting OpenJDK from 10 to 8 make it build again.

@ittaiz
Copy link
Member

ittaiz commented Aug 29, 2018 via email

@eLvErDe
Copy link

eLvErDe commented Aug 29, 2018

Building TF is a non-determinisic challenge you'll have to deal with alone. Don't expect any help, I have dozen of tickets with no response. Good luck !

@philwo
Copy link
Member

philwo commented Oct 17, 2018

This error message is maybe one of the most detailed and useful ones in Bazel. It has a helpful hint at the top (targeted at worker developers - arguably if a worker crashes hard, it's rarely the user's fault, but a dev has to fix it and this also mostly happens during development of a worker), it has a stack trace that helps Bazel devs actually understand what happened here, it has the last output of the worker that gives the developer of the worker and you as the user a hint what might have gone wrong (OOM in this case).

I think this is about as good as it gets. Bazel can't tell you how to fix the worker that crashed, because there's no structured information about why it crashed. Bazel doesn't know that the worker runs in a JVM, nor that the crash was due to OOM, nor which flags you might have to tweak in which way in order to avoid it.

A hint like "Try to bump -Xmx" would have to come from the OpenJDK that printed the OOM error message, it's nothing that Bazel can infer on its own.

@philwo philwo added P3 We're not considering working on this, but happy to review a PR. (No assignee) type: support / not a bug (process) category: local execution / caching and removed type: bug category: sandboxing labels Oct 17, 2018
@jin jin added team-Local-Exec Issues and PRs for the Execution (Local) team and removed team-Execution labels Jan 14, 2019
@jmmv
Copy link
Contributor

jmmv commented May 14, 2020

Agree with @philwo's assessment on that there is nothing we can do about workers misbehaving... so will have to close this.

@jmmv jmmv closed this as completed May 14, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
P3 We're not considering working on this, but happy to review a PR. (No assignee) team-Local-Exec Issues and PRs for the Execution (Local) team type: support / not a bug (process)
Projects
None yet
Development

No branches or pull requests

8 participants