Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

There is insufficient memory for the Java Runtime Environment to continue #159

Closed
ianmilligan1 opened this issue Jan 4, 2018 · 11 comments
Closed
Labels

Comments

@ianmilligan1
Copy link
Member

Running locally on an Azure 16 core, 55GB machine, Ubuntu 16; analyzing a 293GB collection that has some large WARCs (i.e. 7GB). This machine has previously happily processed ~4TB collections. There's a lot of space on the machine which is a fairly vanilla VM.

This command is used to start spark-shell:

./spark-2.1.1-bin-hadoop2.6/bin/spark-shell --driver-memory 52G --packages "io.archivesunleashed:aut:0.12.1"

And it failed on this relatively straightforward script to count domains:

import io.archivesunleashed.spark.matchbox.{ExtractDomain, ExtractLinks, RemoveHTML, RecordLoader, WriteGEXF}
import io.archivesunleashed.spark.rdd.RecordRDD._
val r = RecordLoader.loadArchives("/data2/toronto-mayor/*.gz", sc).keepValidPages().map(r => ExtractDomain(r.getUrl)).countItems().saveAsTextFile("/data2/toronto-mayor-data/all-domains")

The error is:

OpenJDK 64-Bit Server VM warning: INFO: os::commit_memory(0x00007f62d3100000, 4703911936, 0) failed; error='Cannot allocate memory' (errno=12)
#
# There is insufficient memory for the Java Runtime Environment to continue.
# Native memory allocation (mmap) failed to map 4703911936 bytes for committing reserved memory.
# An error report file with more information is saved as:
# /home/ubuntu/aut/hs_err_pid2492.log

Error Logs

Here are the error logs: the terminal output and the hs_err_pid2492.log .

I will continue to tackle tomorrow, but any thoughts or guidance greatly appreciated.

@ruebot
Copy link
Member

ruebot commented Jan 4, 2018

screenshot from 2018-01-04 10-44-24

Dumping this in here for posterity.

@ianmilligan1
Copy link
Member Author

fyi for me: https://www.digitalocean.com/community/tutorials/how-to-add-swap-space-on-ubuntu-16-04

@ianmilligan1
Copy link
Member Author

New error when re-running with swap space (I lowered the driver memory to 45GB).

.SparkException: Job aborted due to stage failure: Task 13 in stage 0.0 failed 1 times, most recent failure: Lost task 13.0 in stage 0.0 (TID 13, localhost, executor driver): java.lang.OutOfMemoryError: Java heap space
        at java.util.Arrays.copyOf(Arrays.java:3332)
        at java.lang.StringCoding.safeTrim(StringCoding.java:89)
        at java.lang.StringCoding.access$100(StringCoding.java:50)
        at java.lang.StringCoding$StringDecoder.decode(StringCoding.java:154)
        at java.lang.StringCoding.decode(StringCoding.java:193)
        at java.lang.StringCoding.decode(StringCoding.java:254)
        at java.lang.String.<init>(String.java:546)
        at java.lang.String.<init>(String.java:566)
        at io.archivesunleashed.data.WarcRecordUtils.getWarcResponseMimeType(WarcRecordUtils.java:102)
        at io.archivesunleashed.spark.archive.io.ArchiveRecord.<init>(ArchiveRecord.scala:74)
        at io.archivesunleashed.spark.matchbox.RecordLoader$$anonfun$2.apply(RecordLoader.scala:37)
        at io.archivesunleashed.spark.matchbox.RecordLoader$$anonfun$2.apply(RecordLoader.scala:37)
        at scala.collection.Iterator$$anon$11.next(Iterator.scala:409)
        at scala.collection.Iterator$$anon$13.hasNext(Iterator.scala:462)
        at scala.collection.Iterator$$anon$13.hasNext(Iterator.scala:461)
        at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:408)
        at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:408)
        at org.apache.spark.util.collection.ExternalSorter.insertAll(ExternalSorter.scala:191)
        at org.apache.spark.shuffle.sort.SortShuffleWriter.write(SortShuffleWriter.scala:63)
        at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:96)
        at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:53)
        at org.apache.spark.scheduler.Task.run(Task.scala:99)
        at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:322)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
        at java.lang.Thread.run(Thread.java:748)

Driver stacktrace:))

@ruebot
Copy link
Member

ruebot commented Jan 4, 2018

@ianmilligan1 what happened with the swap? Did it all get used?

@ianmilligan1
Copy link
Member Author

I'm not sure, it didn't generate a nice error log this time. On the bright side, it failed sooner.

Here is the full error log dump.

@ruebot
Copy link
Member

ruebot commented Jan 4, 2018

Can you run it again, and watch htop or something similar to see how the swap is utilized?

@ianmilligan1
Copy link
Member Author

Will do!

@ianmilligan1
Copy link
Member Author

OK failed after watching htop. Memory never went above the allocated amount, and swap space was essentially unused.. just suddenly and arbitrarily fails, with the same error message as above.

@ianmilligan1
Copy link
Member Author

ianmilligan1 commented Jan 4, 2018

I've tried the following (and spent most of the afternoon watching logs, trying different things). In particular the following doesn't work:

  • increasing swap space to 100GB; fails in same way
  • using --conf spark.memory.fraction=0.4 to increase overhead room; fails in same way
  • dramatically increasing partitions with -conf spark.default.parallelism=64, so does using 500; fails in same way
  • trying different driver-memory configurations always seems to fail (short of getting an even-beefier machine, which I guess we could try);

We've had this issue since September 2016. lintool/warcbase#246

@ianmilligan1
Copy link
Member Author

ianmilligan1 commented Jan 5, 2018

List of failed attempts:

./spark-2.1.1-bin-hadoop2.6/bin/spark-shell --driver-memory 45G --packages "io.archivesunleashed:aut:0.12.1"
./spark-2.1.1-bin-hadoop2.6/bin/spark-shell --conf spark.memory.fraction=0.4 --driver-memory 45G --packages "io.archivesunleashed:aut:0.12.1"
./spark-2.1.1-bin-hadoop2.6/bin/spark-shell --conf spark.memory.fraction=0.8 --driver-memory 45G --packages "io.archivesunleashed:aut:0.12.1"
./spark-2.1.1-bin-hadoop2.6/bin/spark-shell --conf spark.default.parallelism=64 --driver-memory 45G --packages "io.archivesunleashed:aut:0.12.1"
./spark-2.1.1-bin-hadoop2.6/bin/spark-shell --conf spark.default.parallelism=500 --driver-memory 45G --packages "io.archivesunleashed:aut:0.12.1"
./spark-2.1.1-bin-hadoop2.6/bin/spark-shell --conf spark.memory.offHeap.enabled=true --conf spark.memory.offHeap.size=100G --driver-memory 45G --packages "io.archivesunleashed:aut:0.12.1"
./spark-2.1.1-bin-hadoop2.6/bin/spark-shell --driver-memory 45G --executor-memory 10G --packages "io.archivesunleashed:aut:0.12.1"
./spark-2.1.1-bin-hadoop2.6/bin/spark-shell --driver-memory 45G --executor-memory 45G --packages "io.archivesunleashed:aut:0.12.1"

Here's a full Spark log on the failure.

@ianmilligan1
Copy link
Member Author

ianmilligan1 commented Jan 6, 2018

I got it to work!

./spark-2.1.1-bin-hadoop2.6/bin/spark-shell --master local[12] --driver-memory 45G --packages "io.archivesunleashed:aut:0.12.1"

Limiting the number of cores - default was to go to all 16 worker threads, by reducing it to 12 it worked. Going to try to run all derivatives, and if it works, will close.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants