There is insufficient memory for the Java Runtime Environment to continue #159

ianmilligan1 · 2018-01-04T04:34:11Z

Running locally on an Azure 16 core, 55GB machine, Ubuntu 16; analyzing a 293GB collection that has some large WARCs (i.e. 7GB). This machine has previously happily processed ~4TB collections. There's a lot of space on the machine which is a fairly vanilla VM.

This command is used to start spark-shell:

./spark-2.1.1-bin-hadoop2.6/bin/spark-shell --driver-memory 52G --packages "io.archivesunleashed:aut:0.12.1"

And it failed on this relatively straightforward script to count domains:

import io.archivesunleashed.spark.matchbox.{ExtractDomain, ExtractLinks, RemoveHTML, RecordLoader, WriteGEXF}
import io.archivesunleashed.spark.rdd.RecordRDD._
val r = RecordLoader.loadArchives("/data2/toronto-mayor/*.gz", sc).keepValidPages().map(r => ExtractDomain(r.getUrl)).countItems().saveAsTextFile("/data2/toronto-mayor-data/all-domains")

The error is:

OpenJDK 64-Bit Server VM warning: INFO: os::commit_memory(0x00007f62d3100000, 4703911936, 0) failed; error='Cannot allocate memory' (errno=12)
#
# There is insufficient memory for the Java Runtime Environment to continue.
# Native memory allocation (mmap) failed to map 4703911936 bytes for committing reserved memory.
# An error report file with more information is saved as:
# /home/ubuntu/aut/hs_err_pid2492.log

Error Logs

Here are the error logs: the terminal output and the hs_err_pid2492.log .

I will continue to tackle tomorrow, but any thoughts or guidance greatly appreciated.

The text was updated successfully, but these errors were encountered:

ruebot · 2018-01-04T15:44:12Z

Dumping this in here for posterity.

ianmilligan1 · 2018-01-04T15:46:37Z

fyi for me: https://www.digitalocean.com/community/tutorials/how-to-add-swap-space-on-ubuntu-16-04

ianmilligan1 · 2018-01-04T16:05:45Z

New error when re-running with swap space (I lowered the driver memory to 45GB).

.SparkException: Job aborted due to stage failure: Task 13 in stage 0.0 failed 1 times, most recent failure: Lost task 13.0 in stage 0.0 (TID 13, localhost, executor driver): java.lang.OutOfMemoryError: Java heap space
        at java.util.Arrays.copyOf(Arrays.java:3332)
        at java.lang.StringCoding.safeTrim(StringCoding.java:89)
        at java.lang.StringCoding.access$100(StringCoding.java:50)
        at java.lang.StringCoding$StringDecoder.decode(StringCoding.java:154)
        at java.lang.StringCoding.decode(StringCoding.java:193)
        at java.lang.StringCoding.decode(StringCoding.java:254)
        at java.lang.String.<init>(String.java:546)
        at java.lang.String.<init>(String.java:566)
        at io.archivesunleashed.data.WarcRecordUtils.getWarcResponseMimeType(WarcRecordUtils.java:102)
        at io.archivesunleashed.spark.archive.io.ArchiveRecord.<init>(ArchiveRecord.scala:74)
        at io.archivesunleashed.spark.matchbox.RecordLoader$$anonfun$2.apply(RecordLoader.scala:37)
        at io.archivesunleashed.spark.matchbox.RecordLoader$$anonfun$2.apply(RecordLoader.scala:37)
        at scala.collection.Iterator$$anon$11.next(Iterator.scala:409)
        at scala.collection.Iterator$$anon$13.hasNext(Iterator.scala:462)
        at scala.collection.Iterator$$anon$13.hasNext(Iterator.scala:461)
        at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:408)
        at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:408)
        at org.apache.spark.util.collection.ExternalSorter.insertAll(ExternalSorter.scala:191)
        at org.apache.spark.shuffle.sort.SortShuffleWriter.write(SortShuffleWriter.scala:63)
        at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:96)
        at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:53)
        at org.apache.spark.scheduler.Task.run(Task.scala:99)
        at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:322)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
        at java.lang.Thread.run(Thread.java:748)

Driver stacktrace:))

ruebot · 2018-01-04T16:07:32Z

@ianmilligan1 what happened with the swap? Did it all get used?

ianmilligan1 · 2018-01-04T16:10:00Z

I'm not sure, it didn't generate a nice error log this time. On the bright side, it failed sooner.

Here is the full error log dump.

ruebot · 2018-01-04T16:11:45Z

Can you run it again, and watch htop or something similar to see how the swap is utilized?

ianmilligan1 · 2018-01-04T16:11:55Z

Will do!

ianmilligan1 · 2018-01-04T16:33:22Z

OK failed after watching htop. Memory never went above the allocated amount, and swap space was essentially unused.. just suddenly and arbitrarily fails, with the same error message as above.

ianmilligan1 · 2018-01-04T20:11:54Z

I've tried the following (and spent most of the afternoon watching logs, trying different things). In particular the following doesn't work:

increasing swap space to 100GB; fails in same way
using --conf spark.memory.fraction=0.4 to increase overhead room; fails in same way
dramatically increasing partitions with -conf spark.default.parallelism=64, so does using 500; fails in same way
trying different driver-memory configurations always seems to fail (short of getting an even-beefier machine, which I guess we could try);

We've had this issue since September 2016. lintool/warcbase#246

ianmilligan1 · 2018-01-05T13:43:51Z

List of failed attempts:

./spark-2.1.1-bin-hadoop2.6/bin/spark-shell --driver-memory 45G --packages "io.archivesunleashed:aut:0.12.1"
./spark-2.1.1-bin-hadoop2.6/bin/spark-shell --conf spark.memory.fraction=0.4 --driver-memory 45G --packages "io.archivesunleashed:aut:0.12.1"
./spark-2.1.1-bin-hadoop2.6/bin/spark-shell --conf spark.memory.fraction=0.8 --driver-memory 45G --packages "io.archivesunleashed:aut:0.12.1"
./spark-2.1.1-bin-hadoop2.6/bin/spark-shell --conf spark.default.parallelism=64 --driver-memory 45G --packages "io.archivesunleashed:aut:0.12.1"
./spark-2.1.1-bin-hadoop2.6/bin/spark-shell --conf spark.default.parallelism=500 --driver-memory 45G --packages "io.archivesunleashed:aut:0.12.1"
./spark-2.1.1-bin-hadoop2.6/bin/spark-shell --conf spark.memory.offHeap.enabled=true --conf spark.memory.offHeap.size=100G --driver-memory 45G --packages "io.archivesunleashed:aut:0.12.1"
./spark-2.1.1-bin-hadoop2.6/bin/spark-shell --driver-memory 45G --executor-memory 10G --packages "io.archivesunleashed:aut:0.12.1"
./spark-2.1.1-bin-hadoop2.6/bin/spark-shell --driver-memory 45G --executor-memory 45G --packages "io.archivesunleashed:aut:0.12.1"

Here's a full Spark log on the failure.

ianmilligan1 · 2018-01-06T02:13:31Z

I got it to work!

./spark-2.1.1-bin-hadoop2.6/bin/spark-shell --master local[12] --driver-memory 45G --packages "io.archivesunleashed:aut:0.12.1"

Limiting the number of cores - default was to go to all 16 worker threads, by reducing it to 12 it worked. Going to try to run all derivatives, and if it works, will close.

ianmilligan1 added the bug label Jan 4, 2018

ianmilligan1 closed this as completed Jan 6, 2018

This was referenced Jan 8, 2018

Changing keepDate to allow multiple dates, would close #108 #161

Merged

Driver memory archivesunleashed/archivesunleashed.org#15

Merged

ruebot mentioned this issue Jan 22, 2018

Unparseable date error #163

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

There is insufficient memory for the Java Runtime Environment to continue #159

There is insufficient memory for the Java Runtime Environment to continue #159

ianmilligan1 commented Jan 4, 2018

ruebot commented Jan 4, 2018

ianmilligan1 commented Jan 4, 2018

ianmilligan1 commented Jan 4, 2018

ruebot commented Jan 4, 2018

ianmilligan1 commented Jan 4, 2018

ruebot commented Jan 4, 2018

ianmilligan1 commented Jan 4, 2018

ianmilligan1 commented Jan 4, 2018

ianmilligan1 commented Jan 4, 2018 •

edited

Loading

ianmilligan1 commented Jan 5, 2018 •

edited

Loading

ianmilligan1 commented Jan 6, 2018 •

edited

Loading

There is insufficient memory for the Java Runtime Environment to continue #159

There is insufficient memory for the Java Runtime Environment to continue #159

Comments

ianmilligan1 commented Jan 4, 2018

Error Logs

ruebot commented Jan 4, 2018

ianmilligan1 commented Jan 4, 2018

ianmilligan1 commented Jan 4, 2018

ruebot commented Jan 4, 2018

ianmilligan1 commented Jan 4, 2018

ruebot commented Jan 4, 2018

ianmilligan1 commented Jan 4, 2018

ianmilligan1 commented Jan 4, 2018

ianmilligan1 commented Jan 4, 2018 • edited Loading

ianmilligan1 commented Jan 5, 2018 • edited Loading

ianmilligan1 commented Jan 6, 2018 • edited Loading

ianmilligan1 commented Jan 4, 2018 •

edited

Loading

ianmilligan1 commented Jan 5, 2018 •

edited

Loading

ianmilligan1 commented Jan 6, 2018 •

edited

Loading